deepseek-ai/DeepSeek-V4-Flash

二〇二六年六月六日 · 英文原文

摘要

DeepSeek-AI 发布 DeepSeek-V4 预览版，含 V4-Pro（1.6T/49B）和 V4-Flash（284B/13B）MoE 模型，支持 1M token 上下文；采用 CSA+HCA Hybrid Attention、mHC、Muon optimizer，在 32T token 上 pre-train，并经 SFT、GRPO RL 与 on-policy distillation 后训练。

DeepSeek-V4：迈向高效的百万 token 上下文智能

简介

我们发布 DeepSeek-V4 系列的预览版本，包括两个强大的 Mixture-of-Experts (MoE) language model：DeepSeek-V4-Pro（1.6T 参数，49B 激活）和 DeepSeek-V4-Flash（284B 参数，13B 激活），二者均支持 一百万 token 的上下文长度。

DeepSeek-V4 系列在架构与优化方面引入了几项关键升级：

Hybrid Attention Architecture： 我们设计了一种混合 attention 机制，结合 Compressed Sparse Attention (CSA) 与 Heavily Compressed Attention (HCA)，显著提升长上下文效率。在 1M-token 上下文设置下，与 DeepSeek-V3.2 相比，DeepSeek-V4-Pro 仅需 27% 的 single-token inference FLOPs 和 10% 的 KV cache。
Manifold-Constrained Hyper-Connections (mHC)： 我们引入 mHC 来增强传统 residual connection，在保持模型表达能力的同时，提升跨层信号传播的稳定性。
Muon Optimizer： 我们采用 Muon optimizer，以实现更快收敛和更高训练稳定性。

我们在超过 32T 多样且高质量的 token 上对两个模型进行 pre-train，随后执行完整的 post-training 流程。post-training 采用两阶段范式：先独立培养特定领域专家（通过 SFT 和结合 GRPO 的 RL），再通过 on-policy distillation 进行统一模型整合，将不同领域的能力融合到单一模型中。

DeepSeek-V4-Pro-Max 是 DeepSeek-V4-Pro 的最大 reasoning effort 模式，显著推进了开源模型的知识能力，明确成为当前最强的开源模型。它在 coding benchmark 上达到顶级表现，并在 reasoning 与 agentic 任务上显著缩小了与领先闭源模型的差距。与此同时，DeepSeek-V4-Flash-Max 在给予更大 thinking budget 时，可取得与 Pro 版本相当的 reasoning 表现；不过由于参数规模更小，它在纯知识任务和最复杂的 agentic 工作流上自然略有落后。

模型下载

模型	#总参数	#激活参数	上下文长度	精度	下载
DeepSeek-V4-Flash-Base	284B	13B	1M	FP8 Mixed	HuggingFace \| ModelScope
DeepSeek-V4-Flash	284B	13B	1M	FP4 + FP8 Mixed*	HuggingFace \| ModelScope
DeepSeek-V4-Pro-Base	1.6T	49B	1M	FP8 Mixed	HuggingFace \| ModelScope
DeepSeek-V4-Pro	1.6T	49B	1M	FP4 + FP8 Mixed*	HuggingFace \| ModelScope

*FP4 + FP8 Mixed：MoE expert 参数使用 FP4 精度；其他大多数参数使用 FP8。

评测结果

Base Model

Benchmark（指标）	# Shots	DeepSeek-V3.2-Base	DeepSeek-V4-Flash-Base	DeepSeek-V4-Pro-Base
架构	-	MoE	MoE	MoE
# 激活参数	-	37B	13B	49B
# 总参数	-	671B	284B	1.6T
世界知识
AGIEval (EM)	0-shot	80.1	82.6	83.1
MMLU (EM)	5-shot	87.8	88.7	90.1
MMLU-Redux (EM)	5-shot	87.5	89.4	90.8
MMLU-Pro (EM)	5-shot	65.5	68.3	73.5
MMMLU (EM)	5-shot	87.9	88.8	90.3
C-Eval (EM)	5-shot	90.4	92.1	93.1
CMMLU (EM)	5-shot	88.9	90.4	90.8
MultiLoKo (EM)	5-shot	38.7	42.2	51.1
Simple-QA verified (EM)	25-shot	28.3	30.1	55.2
SuperGPQA (EM)	5-shot	45.0	46.5	53.9
FACTS Parametric (EM)	25-shot	27.1	33.9	62.6
TriviaQA (EM)	5-shot	83.3	82.8	85.6
语言与推理
BBH (EM)	3-shot	87.6	86.9	87.5
DROP (F1)	1-shot	88.2	88.6	88.7
HellaSwag (EM)	0-shot	86.4	85.7	88.0
WinoGrande (EM)	0-shot	78.9	79.5	81.5
CLUEWSC (EM)	5-shot	83.5	82.2	85.2
代码与数学
BigCodeBench (Pass@1)	3-shot	63.9	56.8	59.2
HumanEval (Pass@1)	0-shot	62.8	69.5	76.8
GSM8K (EM)	8-shot	91.1	90.8	92.6
MATH (EM)	4-shot	60.5	57.4	64.5
MGSM (EM)	8-shot	81.3	85.7	84.4
CMath (EM)	3-shot	92.6	93.6	90.9
长上下文
LongBench-V2 (EM)	1-shot	40.2	44.7	51.5

Instruct Model

DeepSeek-V4-Pro 和 DeepSeek-V4-Flash 均支持三种 reasoning effort 模式：

Reasoning 模式	特点	典型使用场景	响应格式
Non-think	快速、直觉式响应	日常常规任务、低风险决策	`</think>` summary
Think High	有意识的逻辑分析，速度较慢但更准确	复杂问题求解、规划	`<think>` thinking `</think>` summary
Think Max	将 reasoning 推到最大程度	探索模型 reasoning 能力边界	特殊 system prompt + `<think>` thinking `</think>` summary

DeepSeek-V4-Pro-Max 与前沿模型对比

Benchmark（指标）	Opus-4.6 Max	GPT-5.4 xHigh	Gemini-3.1-Pro High	K2.6 Thinking	GLM-5.1 Thinking	DS-V4-Pro Max
知识与推理
MMLU-Pro (EM)	89.1	87.5	91.0	87.1	86.0	87.5
SimpleQA-Verified (Pass@1)	46.2	45.3	75.6	36.9	38.1	57.9
Chinese-SimpleQA (Pass@1)	76.4	76.8	85.9	75.9	75.0	84.4
GPQA Diamond (Pass@1)	91.3	93.0	94.3	90.5	86.2	90.1
HLE (Pass@1)	40.0	39.8	44.4	36.4	34.7	37.7
LiveCodeBench (Pass@1)	88.8	-	91.7	89.6	-	93.5
Codeforces (Rating)	-	3168	3052	-	-	3206
HMMT 2026 Feb (Pass@1)	96.2	97.7	94.7	92.7	89.4	95.2
IMOAnswerBench (Pass@1)	75.3	91.4	81.0	86.0	83.8	89.8
Apex (Pass@1)	34.5	54.1	60.9	24.0	11.5	38.3
Apex Shortlist (Pass@1)	85.9	78.1	89.1	75.5	72.4	90.2
长上下文
MRCR 1M (MMR)	92.9	-	76.3	-	-	83.5
CorpusQA 1M (ACC)	71.7	-	53.8	-	-	62.0
Agentic
Terminal Bench 2.0 (Acc)	65.4	75.1	68.5	66.7	63.5	67.9
SWE Verified (Resolved)	80.8	-	80.6	80.2	-	80.6
SWE Pro (Resolved)	57.3	57.7	54.2	58.6	58.4	55.4
SWE Multilingual (Resolved)	77.5	-	-	76.7	73.3	76.2
BrowseComp (Pass@1)	83.7	82.7	85.9	83.2	79.3	83.4
HLE w/ tools (Pass@1)	53.1	52.0	51.6	54.0	50.4	48.2
GDPval-AA (Elo)	1619	1674	1314	1482	1535	1554
MCPAtlas Public (Pass@1)	73.8	67.2	69.2	66.6	71.8	73.6
Toolathlon (Pass@1)	47.2	54.6	48.8	50.0	40.7	51.8

不同模式对比

Benchmark（指标）	V4-Flash Non-Think	V4-Flash High	V4-Flash Max	V4-Pro Non-Think	V4-Pro High	V4-Pro Max
知识与推理
MMLU-Pro (EM)	83.0	86.4	86.2	82.9	87.1	87.5
SimpleQA-Verified (Pass@1)	23.1	28.9	34.1	45.0	46.2	57.9
Chinese-SimpleQA (Pass@1)	71.5	73.2	78.9	75.8	77.7	84.4
GPQA Diamond (Pass@1)	71.2	87.4	88.1	72.9	89.1	90.1
HLE (Pass@1)	8.1	29.4	34.8	7.7	34.5	37.7
LiveCodeBench (Pass@1)	55.2	88.4	91.6	56.8	89.8	93.5
Codeforces (Rating)	-	2816	3052	-	2919	3206
HMMT 2026 Feb (Pass@1)	40.8	91.9	94.8	31.7	94.0	95.2
IMOAnswerBench (Pass@1)	41.9	85.1	88.4	35.3	88.0	89.8
Apex (Pass@1)	1.0	19.1	33.0	0.4	27.4	38.3
Apex Shortlist (Pass@1)	9.3	72.1	85.7	9.2	85.5	90.2
长上下文
MRCR 1M (MMR)	37.5	76.9	78.7	44.7	83.3	83.5
CorpusQA 1M (ACC)	15.5	59.3	60.5	35.6	56.5	62.0
Agentic
Terminal Bench 2.0 (Acc)	49.1	56.6	56.9	59.1	63.3	67.9
SWE Verified (Resolved)	73.7	78.6	79.0	73.6	79.4	80.6
SWE Pro (Resolved)	49.1	52.3	52.6	52.1	54.4	55.4
SWE Multilingual (Resolved)	69.7	70.2	73.3	69.8	74.1	76.2
BrowseComp (Pass@1)	-	53.5	73.2	-	80.4	83.4
HLE w/ tools (Pass@1)	-	40.3	45.1	-	44.7	48.2
MCPAtlas (Pass@1)	64.0	67.4	69.0	69.4	74.2	73.6
GDPval-AA (Elo)	-	-	1395	-	-	1554
Toolathlon (Pass@1)	40.7	43.5	47.8	46.3	49.0	51.8

Chat Template

本次发布不包含 Jinja 格式的 chat template。我们提供了专用的 encoding 文件夹，其中包含 Python 脚本和测试用例，演示如何将 OpenAI-compatible 格式的消息编码为模型输入字符串，以及如何解析模型的文本输出。完整文档请参阅 encoding 文件夹。

简要示例：

from encoding_dsv4 import encode_messages, parse_message_from_completion_text

messages = [
    {"role": "user", "content": "hello"},
    {"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
    {"role": "user", "content": "1+1=?"}
]

# messages -> string
prompt = encode_messages(messages, thinking_mode="thinking")

# string -> tokens
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
tokens = tokenizer.encode(prompt)

如何在本地运行

关于在本地运行 DeepSeek-V4 的详细说明，包括模型权重转换和交互式聊天 demo，请参阅 inference 文件夹。

对于本地部署，我们建议将 sampling 参数设置为 temperature = 1.0, top_p = 1.0。对于 Think Max reasoning 模式，我们建议将上下文窗口设置为至少 384K token。

许可证

本仓库和模型权重采用 MIT License 授权。

引用

@misc{deepseekai2026deepseekv4,
      title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
      author={DeepSeek-AI},
      year={2026},
}

联系方式

如有任何问题，请提交 issue，或通过 service@deepseek.com 联系我们。

译自 DeepSeek · HF · 录于二〇二六年六月六日