simon-willison

LLM 0.32a0 是一次重大的向后兼容重构

LLM 0.32a0 is a major backwards-compatible refactor

二〇二六年五月三日 · 英文原文

摘要

LLM Python library 和 CLI tool 发布 0.32a0 alpha，新增 messages 输入抽象、typed message parts 流式输出，支持 reasoning、tool calls 等混合内容；增加 response.to_dict()/from_dict() 序列化机制，并计划升级 plugins、重设计 SQLite logging。

我刚刚发布了 LLM 0.32a0，这是我的 LLM Python library 和 CLI tool 的一个 alpha release，用于访问 LLMs，其中包含一些我已经推进了很久的重要变更。

此前版本的 LLM 以 prompts 和 responses 来建模世界。向模型发送一个文本 prompt，得到一个文本 response。

import llm model = llm . get_model ( "gpt-5.5" ) response = model . prompt ( "Capital of France?" ) print ( response . text ())

这在我 2023 年 4 月开始开发这个库时是合理的。从那以后，很多事情都变了。

LLM 通过其 plugin system 为数千个不同模型提供抽象。最初的抽象——文本输入返回文本输出——已经无法表示我需要的一切。

随着时间推移，LLM 本身逐步增加了 attachments，用于处理图像、音频和视频输入；随后增加了 schemas，用于输出结构化 JSON；再后来增加了 tools，用于执行 tool calls。

与此同时，LLMs 持续演进，增加了 reasoning support，以及返回图像和各种其他有意思能力的能力。

LLM 需要演进，以便更好地处理当今 frontier models 能够处理的输入和输出类型的多样性。

0.32a0 alpha 有两个关键变更：模型输入可以表示为一系列 messages，模型 responses 可以由一连串不同类型的 parts 组成。

Prompts 作为一系列 messages

LLMs 接受文本作为输入，但自从 ChatGPT 展示了双向对话界面的价值之后，最常见的 prompt 方式就是把输入视作一系列对话轮次。

第一轮可能是这样：

user: Capital of France?

assistant:

（然后模型负责补全 assistant 的回复。）

但每个后续轮次都需要重放截至该时刻的完整对话，像一份剧本：

user: Capital of France?

assistant: Paris

user: Germany?

assistant:

主要供应商的大多数 JSON APIs 都遵循这种模式。下面是使用 OpenAI chat completions API 表示上述内容的样子；这个 API 已被其他 providers 广泛仿照：

curl https://api.openai.com/v1/chat/completions \ -H " Authorization: Bearer $OPENAI_API_KEY " \ -H " Content-Type: application/json " \ -d ' { "model": "gpt-5.5", "messages": [ { "role": "user", "content": "Capital of France?" }, { "role": "assistant", "content": "Paris" }, { "role": "user", "content": "Germany?" } ] } '

在 0.32 之前，LLM 将这些建模为 conversations：

model = llm . get_model ( "gpt-5.5" ) conversation = model . conversation () r1 = conversation . prompt ( "Capital of France?" ) print ( r1 . text ()) # Outputs "Paris" r2 = conversation . prompt ( "Germany?" ) print ( r2 . text ()) # Outputs "Berlin"

如果你是从头开始构建与模型的 conversation，这种方式可以工作，但它没有提供一种从一开始就喂入既有 conversation 的方法。这让构建 OpenAI chat completions API 的仿真实现之类的任务变得比本应的复杂得多。

llm CLI tool 通过一种自定义机制绕过了这个问题，用 SQLite 持久化并还原 conversations，但这从未成为 LLM API 的稳定组成部分——而且在很多场景中，你可能想使用 Python library，却不想把 SQLite 作为 storage layer。

新的 alpha 现在支持这样做：

import llm from llm import user , assistant model = llm . get_model ( "gpt-5.5" ) response = model . prompt ( messages = [ user ( "Capital of France?" ), assistant ( "Paris" ), user ( "Germany?" ), ]) print ( response . text ())

llm.user() 和 llm.assistant() 函数是新的 builder functions，设计用于 messages=[] 数组中。

之前的 prompt= option 仍然可用，但 LLM 会在后台将其升级为只有一个 item 的 messages array。

现在你也可以对一个 response 进行 reply，作为构建 conversation 的替代方式：

response2 = response . reply ( "How about Hungary?" ) print ( response2 ) # Default str() calls .text()

Streaming parts

alpha 中另一个主要的新接口涉及从 prompt 流式返回结果。

此前，LLM 支持这样的 streaming：

response = model . prompt ( "Generate an SVG of a pelican riding a bicycle" ) for chunk in response : print ( chunk , end = "" )

或者这个 async 变体：

import asyncio import llm model = llm . get_async_model ( "gpt-5.5" ) response = model . prompt ( "Generate an SVG of a pelican riding a bicycle" ) async def run (): async for chunk in response : print ( chunk , end = "" , flush = True ) asyncio . run ( run ())

当今许多模型会返回混合类型的内容。对 Claude 运行的一个 prompt 可能先返回 reasoning output，然后是文本，然后是一个用于 tool call 的 JSON request，接着是更多文本内容。

有些模型甚至可以在 server-side 执行 tools，例如 OpenAI 的 code interpreter tool 或 Anthropic 的 web search。这意味着模型返回的结果可以组合 text、tool calls、tool outputs 和其他 formats。

Multi-modal output models 也开始出现，它们可以在 streaming response 中混入返回图像，甚至音频片段。

新的 LLM alpha 将这些建模为 typed message parts 的 stream。作为 Python API 使用者，它看起来是这样：

import asyncio import llm model = llm . get_model ( "gpt-5.5" ) prompt = "invent 3 cool dogs, first talk about your motivations" def describe_dog ( name : str , bio : str ) -> str : """Record the name and biography of a hypothetical dog.""" return f" { name } : { bio } " def sync_example (): response = model . prompt ( prompt , tools = [ describe_dog ], ) for event in response . stream_events (): if event . type == "text" : print ( event . chunk , end = "" , flush = True ) elif event . type == "tool_call_name" : print ( f" \n Tool call: { event . chunk } (" , end = "" , flush = True ) elif event . type == "tool_call_args" : print ( event . chunk , end = "" , flush = True ) async def async_example (): model = llm . get_async_model ( "gpt-5.5" ) response = model . prompt ( prompt , tools = [ describe_dog ], ) async for event in response . astream_events (): if event . type == "text" : print ( event . chunk , end = "" , flush = True ) elif event . type == "tool_call_name" : print ( f" \n Tool call: { event . chunk } (" , end = "" , flush = True ) elif event . type == "tool_call_args" : print ( event . chunk , end = "" , flush = True ) sync_example () asyncio . run ( async_example ())

示例输出（仅来自第一个 sync example）：

My motivation: create three memorable dogs with distinct “cool” styles—one cinematic, one adventurous, and one charmingly chaotic—so each feels like they could star in their own story.

Tool call: describe_dog({"name": "Nova Jetpaw", "bio": "A sleek silver-gray whippet who wears tiny aviator goggles and loves sprinting along moonlit beaches. Nova is fearless, elegant, and rumored to outrun drones just for fun."}

Tool call: describe_dog({"name": "Mochi Thunderbark", "bio": "A fluffy corgi with a dramatic black-and-gold bandana and the confidence of a rock star. Mochi is short, loud, loyal, and leads a neighborhood 'security patrol' made entirely of squirrels."}

Tool call: describe_dog({"name": "Atlas Snowfang", "bio": "A massive white husky with ice-blue eyes and a backpack full of trail snacks. Atlas is calm, heroic, and always knows the way home—even during blizzards, fog, or confusing camping trips."}

在 response 结束时，你可以调用 response.execute_tool_calls() 来实际运行被请求的函数，或者发送一个 response.reply()，让这些 tools 被调用，并把它们的返回值发回给模型：

print ( response . reply ( "Tell me about the dogs" ))

这种用于 streaming 不同 token types 的新机制意味着 CLI tool 现在可以用不同于最终 response 文本的颜色显示 “thinking” text。thinking text 会输出到 stderr，因此不会影响被 pipe 到其他 tools 的结果。

这个示例使用 Claude Sonnet 4.6（配合更新为 streaming event 版本的 llm-anthropic plugin），因为 Anthropic 的模型会把 reasoning text 作为 response 的一部分返回：

llm -m claude-sonnet-4.6 ' Think about 3 cool dogs then describe them ' \ -o thinking_display 1

你可以使用新的 -R/--no-reasoning flag 抑制 reasoning tokens 的输出。出乎意料的是，这最终成了此 release 中唯一面向 CLI 的变更。

一种用于序列化和反序列化 responses 的机制

如前所述，LLM 目前用于将 conversations 持久化到 SQLite 的代码相当不灵活。我在 0.32a0 中添加了一种新机制，应当能让 Python API users 以自己的方式实现替代方案：

serializable = response . to_dict () # serializable is a JSON-style dictionary # store it anywhere you like, then inflate it: response = Response . from_dict ( serializable )

它返回的 dictionary 实际上是一个 TypedDict，定义在新的 llm/serialization.py module 中。

接下来是什么？

我将它作为 alpha 发布，这样我就可以升级各种 plugins，并在真实环境中运行几天来检验这个新设计。

我预计稳定版 0.32 release 会与这个 alpha 非常相似，除非 alpha testing 暴露出我这套设计中的某些缺陷。

还剩下一项较大的任务：我想重新设计 SQLite logging system，以便更好地捕获这个新抽象返回的更细粒度细节。

理想情况下，我希望把它建模为 graph，以最好地支持类似 OpenAI 风格 chat completions API 的场景：同一组 conversations 会不断被扩展，然后在每次 prompt 时重复发送。我希望能够存储这些内容，而不在 database 中重复保存。

我还没决定这应该作为 0.32 的一个 feature，还是留到 0.33。

Tags: projects , python , ai , annotated-release-notes , generative-ai , llms , llm

译自 simon-willison · 录于二〇二六年五月三日