philschmid

如何通过 Gemini API 使用 Deep Research

How to use Deep Research with the Gemini API

二〇二六年五月九日 · 英文原文

摘要

Gemini Deep Research Agent 通过 Interactions API 异步执行长周期研究，提供 deep-research-preview-04-2026 与 max 版本，支持协作式规划、visualization、远程 MCP server、Google Search、URL Context、Code Execution、File Search、多模态 grounding、流式输出与 thought summaries。

Gemini Deep Research Agent 可以自主规划、搜索，并将长周期研究任务综合成包含详细引用的报告。

Deep Research 通过在后台执行来处理长时间运行的任务。它仅可通过 Interactions API 使用（不支持 generate_content）。

目前提供两个新版本：

deep-research-preview-04-2026：面向速度和效率设计，适合以流式方式返回到客户端 UI
deep-research-max-preview-04-2026：面向自动化上下文收集与综合，提供最高的全面性

新增内容

协作式规划： 在执行前审查并完善研究计划
原生图表与信息图： agent 生成的图表、图形和信息图
远程 MCP server： 通过 Model Context Protocol 连接外部工具
扩展工具： Google Search、URL Context、Code Execution、MCP 和 File Search
多模态研究 grounding： 将图片、PDF 和音频作为研究上下文传入

设置

安装 Python SDK：

pip install google-genai

将你的 API key 设置为环境变量。你可以在 aistudio.google.com/apikey 创建一个。

export GEMINI_API_KEY="your-api-key"

运行你的第一个 Deep Research 任务

使用 background=True 启动研究任务，并轮询获取结果。Deep Research 是异步的，因为任务可能需要数分钟才能完成。

import time
from google import genai
 
client = genai.Client()
 
interaction = client.interactions.create(
    input="Research the history of Google TPUs.",
    agent="deep-research-preview-04-2026",
    background=True,
)
 
while True:
    interaction = client.interactions.get(interaction.id)
    if interaction.status == "completed":
        print(interaction.outputs[-1].text)
        break
    elif interaction.status == "failed":
        print(f"Research failed: {interaction.error}")
        break
    time.sleep(10)

协作式规划

设置 collaborative_planning=True 可返回研究计划，而不是立即运行。使用 previous_interaction_id 迭代计划，然后设置 collaborative_planning=False 执行。

步骤 1：请求计划

import time
from google import genai
 
client = genai.Client()
 
plan = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input="Research Google TPUs vs competitor hardware.",
    agent_config={"type": "deep-research", "collaborative_planning": True},
    background=True,
)
 
while (result := client.interactions.get(id=plan.id)).status != "completed":
    time.sleep(5)
 
print(result.outputs[-1].text)

完善计划： 使用 previous_interaction_id 继续对话。保持 collaborative_planning=True 以停留在规划模式。可按需重复。

refined = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input="Add a section comparing power efficiency.",
    agent_config={"type": "deep-research", "collaborative_planning": True},
    previous_interaction_id=plan.id,
    background=True,
)
 
while (result := client.interactions.get(id=refined.id)).status != "completed":
    time.sleep(5)
 
print(result.outputs[-1].text)

批准并执行： 设置 collaborative_planning=False 以批准计划并开始研究。

重要： 你必须在最后一轮明确设置 collaborative_planning=False。仅发送 “go ahead” 而不切换该标志，不会触发报告生成。

report = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input="Plan looks good!",
    agent_config={"type": "deep-research", "collaborative_planning": False},
    previous_interaction_id=refined.id,
    background=True,
)
 
while (result := client.interactions.get(id=report.id)).status != "completed":
    time.sleep(5)
 
print(result.outputs[-1].text)

原生图表和信息图

设置 visualization="auto"，并在 prompt 中请求可视化内容。agent 会生成以 base64 编码图片形式返回的图表和信息图。

import base64
from google import genai
 
client = genai.Client()
 
interaction = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input="Analyze global semiconductor market trends. Include charts showing market share changes.",
    agent_config={"type": "deep-research", "visualization": "auto"},
    background=True,
)
 
while (result := client.interactions.get(id=interaction.id)).status != "completed":
    time.sleep(5)
 
for output in result.outputs:
    if output.type == "text":
        print(output.text)
    elif output.type == "image" and output.data:
        image_bytes = base64.b64decode(output.data)
        # display(Image(data=image_bytes))  # Jupyter

提示： 设置 visualization="auto" 会启用该能力，但要获得最佳结果，建议明确说明你想要的内容。

远程 MCP servers

连接远程 MCP servers，让 agent 能够访问外部工具。传入 server 的 name、url 以及可选的 auth headers。

interaction = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input="Research how recent geopolitical events influenced USD interest rates",
    tools=[
        {
            "type": "mcp_server",
            "name": "Finance Data Provider",
            "url": "https://finance.example.com/mcp",
            "headers": {"Authorization": "Bearer my-token"},
        }
    ],
    background=True,
)

MCP servers 支持 no-auth、bearer token 和 OAuth。对于 OAuth，可使用 google-auth 等库获取 token，并通过 headers 传入。使用 allowed_tools 限制 agent 可以从该 server 调用哪些工具。

工具配置

默认情况下，agent 使用 Google Search、URL Context 和 Code Execution。你可以像配置模型一样，通过提供 tools 列表来自定义 agent 可使用的工具。例如，这允许你只搜索 Web（通过 google_search 和 url_context）、只搜索私有来源（通过 file_search 和自定义 MCP servers），或同时搜索两者。

工具	类型	默认	说明
Google Search	`google_search`	✅	搜索公开 Web
URL Context	`url_context`	✅	阅读并总结网页
Code Execution	`code_execution`	✅	运行代码进行计算和数据分析
MCP Server	`mcp_server`	—	连接远程 MCP servers
File Search	`file_search`	—	搜索已上传的文档语料库

# Only web search allowed
interaction = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input="Latest developments in quantum computing.",
    tools=[{"type": "google_search"}],
    background=True,
)

注意： 如果完全不传入 tools，默认会启用 Search、URL Context 和 Code Execution。

多模态研究 grounding

将图片、PDF 和文档与文本 prompt 一起传入，用于为研究提供 grounding。

interaction = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input=[
        {"type": "text", "text": "What has been the impact of this research paper?"},
        {"type": "document", "uri": "https://arxiv.org/pdf/1706.03762", "mime_type": "application/pdf"},
    ],
    background=True,
)

带可视化和 thought summaries 的实时流式输出

实时流式输出研究进展。启用 thinking_summaries="auto" 可在接收文本和生成图片的同时，接收 agent 的中间推理。

import base64
from google import genai
from IPython.display import Image, display
 
client = genai.Client()
 
interaction_id = None
last_event_id = None
is_complete = False
 
def process_stream(stream):
    global interaction_id, last_event_id, is_complete
    for chunk in stream:
        if chunk.event_type == "interaction.start":
            interaction_id = chunk.interaction.id
        if chunk.event_id:
            last_event_id = chunk.event_id
        if chunk.event_type == "content.delta":
            if chunk.delta.type == "text":
                print(chunk.delta.text, end="", flush=True)
            elif chunk.delta.type == "thought_summary":
                print(f"\n💭 {chunk.delta.content.text}", flush=True)
            elif chunk.delta.type == "image" and chunk.delta.data:
                image_bytes = base64.b64decode(chunk.delta.data)
                display(Image(data=image_bytes))
        elif chunk.event_type in ("interaction.complete", "error"):
            is_complete = True
            if chunk.event_type == "interaction.complete":
                print("\n✅ Research Complete")
 
stream = client.interactions.create(
    input="Research AI chip market trends. Include charts comparing vendors.",
    agent="deep-research-preview-04-2026",
    background=True,
    stream=True,
    agent_config={
        "type": "deep-research",
        "thinking_summaries": "auto",
        "visualization": "auto",
    },
)
process_stream(stream)
 
# Reconnect if the connection drops
while not is_complete and interaction_id:
    status = client.interactions.get(interaction_id)
    if status.status != "in_progress":
        break
    stream = client.interactions.get(
        id=interaction_id, stream=True, last_event_id=last_event_id,
    )
    process_stream(stream)

接下来可以做什么

译自 philschmid · 录于二〇二六年五月九日