sebastian-raschka

我理解 LLM 架构的工作流

My Workflow for Understanding LLM Architectures

二〇二六年五月三日 · 英文原文

摘要

作者记录其绘制 LLM architecture 草图的 workflow：先查 official technical reports，因 papers 细节不足，再查看 Hugging Face Model Hub 的 config file 与 Python transformers reference implementation，面向 open-weight models，不适用于 ChatGPT、Claude、Gemini 等 proprietary models，流程以 manual 分析为主。

过去几个月里，很多人问我，能否分享一下我在文章、演讲以及 LLM-Gallery 中构思 LLM architecture 草图和绘图的 workflow。因此我想，把我通常遵循的流程记录下来应该会有帮助。简短来说，我通常会从官方 technical reports 开始，但现在的 papers 往往不像过去那么详细，尤其是来自产业实验室的大多数 open-weight models。好的一面是，如果 weights 发布在 Hugging Face Model Hub 上，并且该 model 在 Python transformers library 中受到支持，我们通常可以直接查看 config file 和 reference implementation，以获取更多 architecture 细节。而且，“working” code 不会说谎。

Figure 1：这个 workflow 的基本动机是：如今 papers 往往不够详细，但可运行的 reference implementation 为我们提供了可以具体检查的对象。

我还应该说明，这主要是一个面向 open-weight models 的 workflow。它并不太适用于 ChatGPT、Claude 或 Gemini 这类 weights 和细节都属于 proprietary 的 models。此外，这有意设计成一个相当 manual 的流程。你可以自动化其中一部分。但如果目标是学习这些 architectures 如何工作，那么在我看来，手动做几个这样的分析仍然是最好的练习之一。

Figure 2：从高层来看，这个 workflow 是从 config files 和 code 走向 architecture insights。

阅读全文

译自 sebastian-raschka · 录于二〇二六年五月三日