deepseek-ai/DeepSeek-V4-Pro

二〇二六年六月六日 · 英文原文

摘要

DeepSeek-AI 发布 DeepSeek-V4 preview，包括 1.6T/49B 激活的 V4-Pro 与 284B/13B 激活的 V4-Flash，均支持 1M token 上下文。模型采用 MoE、Hybrid Attention、mHC 和 Muon optimizer，在 32T tokens 上预训练，并通过 SFT、GRPO RL 与 on-policy distillation 后训练，权重以 MIT License 提供。

DeepSeek-V4：迈向高效的百万 token 上下文智能

简介

我们发布 DeepSeek-V4 系列的 preview 版本，包括两个强大的 Mixture-of-Experts (MoE) 语言模型——拥有 1.6T 参数（49B 激活）的 DeepSeek-V4-Pro，以及拥有 284B 参数（13B 激活）的 DeepSeek-V4-Flash；二者均支持 一百万 token 的上下文长度。

DeepSeek-V4 系列在架构和优化方面包含几项关键升级：

Hybrid Attention Architecture： 我们设计了一种 hybrid attention 机制，结合 Compressed Sparse Attention (CSA) 与 Heavily Compressed Attention (HCA)，以显著提升长上下文效率。在 1M-token 上下文设置下，相比 DeepSeek-V3.2，DeepSeek-V4-Pro 仅需 27% 的单 token 推理 FLOPs 和 10% 的 KV cache。
Manifold-Constrained Hyper-Connections (mHC)： 我们引入 mHC 来增强传统 residual connections，提升跨层信号传播的稳定性，同时保持模型表达能力。
Muon Optimizer： 我们采用 Muon optimizer，以实现更快收敛和更高训练稳定性。

我们在超过 32T 的多样化高质量 tokens 上对两个模型进行预训练，随后执行完整的 post-training 流程。post-training 采用两阶段范式：先独立培养特定领域专家（通过 SFT 和基于 GRPO 的 RL），再通过 on-policy distillation 进行统一模型整合，将不同领域的专项能力融合到单一模型中。

DeepSeek-V4-Pro-Max 是 DeepSeek-V4-Pro 的最大 reasoning effort 模式，显著推进了开源模型的知识能力，稳居当前最佳开源模型之列。它在 coding benchmarks 上达到一流水平，并在 reasoning 和 agentic tasks 上显著缩小了与领先闭源模型的差距。与此同时，DeepSeek-V4-Flash-Max 在给予更大 thinking budget 时可达到与 Pro 版本相近的推理性能；不过其参数规模较小，因此在纯知识任务和最复杂的 agentic workflows 上自然略有落后。

模型下载

模型	#总参数	#激活参数	上下文长度	精度	下载
DeepSeek-V4-Flash-Base	284B	13B	1M	FP8 Mixed	HuggingFace \| ModelScope
DeepSeek-V4-Flash	284B	13B	1M	FP4 + FP8 Mixed*	HuggingFace \| ModelScope
DeepSeek-V4-Pro-Base	1.6T	49B	1M	FP8 Mixed	HuggingFace \| ModelScope
DeepSeek-V4-Pro	1.6T	49B	1M	FP4 + FP8 Mixed*	HuggingFace \| ModelScope

*FP4 + FP8 Mixed：MoE expert 参数使用 FP4 精度；大多数其他参数使用 FP8。

评测结果

Base Model

Benchmark（指标）	# Shots	DeepSeek-V3.2-Base	DeepSeek-V4-Flash-Base	DeepSeek-V4-Pro-Base
架构	-	MoE	MoE	MoE
# 激活参数	-	37B	13B	49B
# 总参数	-	671B	284B	1.6T
世界知识
AGIEval (EM)	0-shot	80.1	82.6	83.1
MMLU (EM)	5-shot	87.8	88.7	90.1
MMLU-Redux (EM)	5-shot	87.5	89.4	90.8
MMLU-Pro (EM)	5-shot	65.5	68.3	73.5
MMMLU (EM)	5-shot	87.9	88.8	90.3
C-Eval (EM)	5-shot	90.4	92.1	93.1
CMMLU (EM)	5-shot	88.9	90.4	90.8
MultiLoKo (EM)	5-shot	38.7	42.2	51.1
Simple-QA verified (EM)	25-shot	28.3	30.1	55.2
SuperGPQA (EM)	5-shot	45.0	46.5	53.9
FACTS Parametric (EM)	25-shot	27.1	33.9	62.6
TriviaQA (EM)	5-shot	83.3	82.8	85.6
语言与推理
BBH (EM)	3-shot	87.6	86.9	87.5
DROP (F1)	1-shot	88.2	88.6	88.7
HellaSwag (EM)	0-shot	86.4	85.7	88.0
WinoGrande (EM)	0-shot	78.9	79.5	81.5
CLUEWSC (EM)	5-shot	83.5	82.2	85.2
代码与数学
BigCodeBench (Pass@1)	3-shot	63.9	56.8	59.2
HumanEval (Pass@1)	0-shot	62.8	69.5	76.8
GSM8K (EM)	8-shot	91.1	90.8	92.6
MATH (EM)	4-shot	60.5	57.4	64.5
MGSM (EM)	8-shot	81.3	85.7	84.4
CMath (EM)	3-shot	92.6	93.6	90.9
长上下文
LongBench-V2 (EM)	1-shot	40.2	44.7	51.5

Instruct Model

DeepSeek-V4-Pro 和 DeepSeek-V4-Flash 均支持三种 reasoning effort 模式：

Reasoning Mode	特点	典型使用场景	响应格式
Non-think	快速、直觉式响应	常规日常任务、低风险决策	`</think>` 摘要
Think High	有意识的逻辑分析，速度较慢但更准确	复杂问题求解、规划	`<think>` 思考 `</think>` 摘要
Think Max	将推理推向最充分的程度	探索模型 reasoning capability 的边界	特殊 system prompt + `<think>` 思考 `</think>` 摘要

DeepSeek-V4-Pro-Max 与 Frontier Models 对比

Benchmark（指标）	Opus-4.6 Max	GPT-5.4 xHigh	Gemini-3.1-Pro High	K2.6 Thinking	GLM-5.1 Thinking	DS-V4-Pro Max
知识与推理
MMLU-Pro (EM)	89.1	87.5	91.0	87.1	86.0	87.5
SimpleQA-Verified (Pass@1)	46.2	45.3	75.6	36.9	38.1	57.9
Chinese-SimpleQA (Pass@1)	76.4	76.8	85.9	75.9	75.0	84.4
GPQA Diamond (Pass@1)	91.3	93.0	94.3	90.5	86.2	90.1
HLE (Pass@1)	40.0	39.8	44.4	36.4	34.7	37.7
LiveCodeBench (Pass@1)	88.8	-	91.7	89.6	-	93.5
Codeforces (Rating)	-	3168	3052	-	-	3206
HMMT 2026 Feb (Pass@1)	96.2	97.7	94.7	92.7	89.4	95.2
IMOAnswerBench (Pass@1)	75.3	91.4	81.0	86.0	83.8	89.8
Apex (Pass@1)	34.5	54.1	60.9	24.0	11.5	38.3
Apex Shortlist (Pass@1)	85.9	78.1	89.1	75.5	72.4	90.2
长上下文
MRCR 1M (MMR)	92.9	-	76.3	-	-	83.5
CorpusQA 1M (ACC)	71.7	-	53.8	-	-	62.0
Agentic
Terminal Bench 2.0 (Acc)	65.4	75.1	68.5	66.7	63.5	67.9
SWE Verified (Resolved)	80.8	-	80.6	80.2	-	80.6
SWE Pro (Resolved)	57.3	57.7	54.2	58.6	58.4	55.4
SWE Multilingual (Resolved)	77.5	-	-	76.7	73.3	76.2
BrowseComp (Pass@1)	83.7	82.7	85.9	83.2	79.3	83.4
HLE w/ tools (Pass@1)	53.1	52.0	51.6	54.0	50.4	48.2
GDPval-AA (Elo)	1619	1674	1314	1482	1535	1554
MCPAtlas Public (Pass@1)	73.8	67.2	69.2	66.6	71.8	73.6
Toolathlon (Pass@1)	47.2	54.6	48.8	50.0	40.7	51.8

不同模式对比

Benchmark（指标）	V4-Flash Non-Think	V4-Flash High	V4-Flash Max	V4-Pro Non-Think	V4-Pro High	V4-Pro Max
知识与推理
MMLU-Pro (EM)	83.0	86.4	86.2	82.9	87.1	87.5
SimpleQA-Verified (Pass@1)	23.1	28.9	34.1	45.0	46.2	57.9
Chinese-SimpleQA (Pass@1)	71.5	73.2	78.9	75.8	77.7	84.4
GPQA Diamond (Pass@1)	71.2	87.4	88.1	72.9	89.1	90.1
HLE (Pass@1)	8.1	29.4	34.8	7.7	34.5	37.7
LiveCodeBench (Pass@1)	55.2	88.4	91.6	56.8	89.8	93.5
Codeforces (Rating)	-	2816	3052	-	2919	3206
HMMT 2026 Feb (Pass@1)	40.8	91.9	94.8	31.7	94.0	95.2
IMOAnswerBench (Pass@1)	41.9	85.1	88.4	35.3	88.0	89.8
Apex (Pass@1)	1.0	19.1	33.0	0.4	27.4	38.3
Apex Shortlist (Pass@1)	9.3	72.1	85.7	9.2	85.5	90.2
长上下文
MRCR 1M (MMR)	37.5	76.9	78.7	44.7	83.3	83.5
CorpusQA 1M (ACC)	15.5	59.3	60.5	35.6	56.5	62.0
Agentic
Terminal Bench 2.0 (Acc)	49.1	56.6	56.9	59.1	63.3	67.9
SWE Verified (Resolved)	73.7	78.6	79.0	73.6	79.4	80.6
SWE Pro (Resolved)	49.1	52.3	52.6	52.1	54.4	55.4
SWE Multilingual (Resolved)	69.7	70.2	73.3	69.8	74.1	76.2
BrowseComp (Pass@1)	-	53.5	73.2	-	80.4	83.4
HLE w/ tools (Pass@1)	-	40.3	45.1	-	44.7	48.2
MCPAtlas (Pass@1)	64.0	67.4	69.0	69.4	74.2	73.6
GDPval-AA (Elo)	-	-	1395	-	-	1554
Toolathlon (Pass@1)	40.7	43.5	47.8	46.3	49.0	51.8

Chat Template

本次发布不包含 Jinja 格式的 chat template。作为替代，我们提供了专用的 encoding 文件夹，其中包含 Python 脚本和测试用例，演示如何将 OpenAI-compatible 格式的 messages 编码为模型输入字符串，以及如何解析模型的文本输出。完整文档请参阅 encoding 文件夹。

一个简要示例：

from encoding_dsv4 import encode_messages, parse_message_from_completion_text

messages = [
    {"role": "user", "content": "hello"},
    {"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
    {"role": "user", "content": "1+1=?"}
]

# messages -> string
prompt = encode_messages(messages, thinking_mode="thinking")

# string -> tokens
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
tokens = tokenizer.encode(prompt)

如何在本地运行

有关在本地运行 DeepSeek-V4 的详细说明，包括模型权重转换和交互式 chat demos，请参阅 inference 文件夹。

对于本地部署，我们建议将 sampling 参数设置为 temperature = 1.0, top_p = 1.0。对于 Think Max reasoning 模式，我们建议将上下文窗口设置为至少 384K tokens。

许可证

本仓库和模型权重基于 MIT License 授权。

引用

@misc{deepseekai2026deepseekv4,
      title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
      author={DeepSeek-AI},
      year={2026},
}

联系方式

如果你有任何问题，请提交 issue，或通过 service@deepseek.com 联系我们。

译自 DeepSeek · HF · 录于二〇二六年六月六日