deepseek-ai/DeepSeek-V4-Flash
deepseek-ai/DeepSeek-V4-Flash
DeepSeek-AI 发布 DeepSeek-V4 预览版,含 V4-Pro(1.6T/49B)和 V4-Flash(284B/13B)MoE 模型,支持 1M token 上下文;采用 CSA+HCA Hybrid Attention、mHC、Muon optimizer,在 32T token 上 pre-train,并经 SFT、GRPO RL 与 on-policy distillation 后训练。
DeepSeek-V4:迈向高效的百万 token 上下文智能
简介
我们发布 DeepSeek-V4 系列的预览版本,包括两个强大的 Mixture-of-Experts (MoE) language model:DeepSeek-V4-Pro(1.6T 参数,49B 激活)和 DeepSeek-V4-Flash(284B 参数,13B 激活),二者均支持 一百万 token 的上下文长度。
DeepSeek-V4 系列在架构与优化方面引入了几项关键升级:
- Hybrid Attention Architecture: 我们设计了一种混合 attention 机制,结合 Compressed Sparse Attention (CSA) 与 Heavily Compressed Attention (HCA),显著提升长上下文效率。在 1M-token 上下文设置下,与 DeepSeek-V3.2 相比,DeepSeek-V4-Pro 仅需 27% 的 single-token inference FLOPs 和 10% 的 KV cache。
- Manifold-Constrained Hyper-Connections (mHC): 我们引入 mHC 来增强传统 residual connection,在保持模型表达能力的同时,提升跨层信号传播的稳定性。
- Muon Optimizer: 我们采用 Muon optimizer,以实现更快收敛和更高训练稳定性。
我们在超过 32T 多样且高质量的 token 上对两个模型进行 pre-train,随后执行完整的 post-training 流程。post-training 采用两阶段范式:先独立培养特定领域专家(通过 SFT 和结合 GRPO 的 RL),再通过 on-policy distillation 进行统一模型整合,将不同领域的能力融合到单一模型中。
DeepSeek-V4-Pro-Max 是 DeepSeek-V4-Pro 的最大 reasoning effort 模式,显著推进了开源模型的知识能力,明确成为当前最强的开源模型。它在 coding benchmark 上达到顶级表现,并在 reasoning 与 agentic 任务上显著缩小了与领先闭源模型的差距。与此同时,DeepSeek-V4-Flash-Max 在给予更大 thinking budget 时,可取得与 Pro 版本相当的 reasoning 表现;不过由于参数规模更小,它在纯知识任务和最复杂的 agentic 工作流上自然略有落后。
模型下载
| 模型 | #总参数 | #激活参数 | 上下文长度 | 精度 | 下载 |
|---|---|---|---|---|---|
| DeepSeek-V4-Flash-Base | 284B | 13B | 1M | FP8 Mixed | HuggingFace | ModelScope |
| DeepSeek-V4-Flash | 284B | 13B | 1M | FP4 + FP8 Mixed* | HuggingFace | ModelScope |
| DeepSeek-V4-Pro-Base | 1.6T | 49B | 1M | FP8 Mixed | HuggingFace | ModelScope |
| DeepSeek-V4-Pro | 1.6T | 49B | 1M | FP4 + FP8 Mixed* | HuggingFace | ModelScope |
*FP4 + FP8 Mixed:MoE expert 参数使用 FP4 精度;其他大多数参数使用 FP8。
评测结果
Base Model
| Benchmark(指标) | # Shots | DeepSeek-V3.2-Base | DeepSeek-V4-Flash-Base | DeepSeek-V4-Pro-Base |
|---|---|---|---|---|
| 架构 | - | MoE | MoE | MoE |
| # 激活参数 | - | 37B | 13B | 49B |
| # 总参数 | - | 671B | 284B | 1.6T |
| 世界知识 | ||||
| AGIEval (EM) | 0-shot | 80.1 | 82.6 | 83.1 |
| MMLU (EM) | 5-shot | 87.8 | 88.7 | 90.1 |
| MMLU-Redux (EM) | 5-shot | 87.5 | 89.4 | 90.8 |
| MMLU-Pro (EM) | 5-shot | 65.5 | 68.3 | 73.5 |
| MMMLU (EM) | 5-shot | 87.9 | 88.8 | 90.3 |
| C-Eval (EM) | 5-shot | 90.4 | 92.1 | 93.1 |
| CMMLU (EM) | 5-shot | 88.9 | 90.4 | 90.8 |
| MultiLoKo (EM) | 5-shot | 38.7 | 42.2 | 51.1 |
| Simple-QA verified (EM) | 25-shot | 28.3 | 30.1 | 55.2 |
| SuperGPQA (EM) | 5-shot | 45.0 | 46.5 | 53.9 |
| FACTS Parametric (EM) | 25-shot | 27.1 | 33.9 | 62.6 |
| TriviaQA (EM) | 5-shot | 83.3 | 82.8 | 85.6 |
| 语言与推理 | ||||
| BBH (EM) | 3-shot | 87.6 | 86.9 | 87.5 |
| DROP (F1) | 1-shot | 88.2 | 88.6 | 88.7 |
| HellaSwag (EM) | 0-shot | 86.4 | 85.7 | 88.0 |
| WinoGrande (EM) | 0-shot | 78.9 | 79.5 | 81.5 |
| CLUEWSC (EM) | 5-shot | 83.5 | 82.2 | 85.2 |
| 代码与数学 | ||||
| BigCodeBench (Pass@1) | 3-shot | 63.9 | 56.8 | 59.2 |
| HumanEval (Pass@1) | 0-shot | 62.8 | 69.5 | 76.8 |
| GSM8K (EM) | 8-shot | 91.1 | 90.8 | 92.6 |
| MATH (EM) | 4-shot | 60.5 | 57.4 | 64.5 |
| MGSM (EM) | 8-shot | 81.3 | 85.7 | 84.4 |
| CMath (EM) | 3-shot | 92.6 | 93.6 | 90.9 |
| 长上下文 | ||||
| LongBench-V2 (EM) | 1-shot | 40.2 | 44.7 | 51.5 |
Instruct Model
DeepSeek-V4-Pro 和 DeepSeek-V4-Flash 均支持三种 reasoning effort 模式:
| Reasoning 模式 | 特点 | 典型使用场景 | 响应格式 |
|---|---|---|---|
| Non-think | 快速、直觉式响应 | 日常常规任务、低风险决策 | </think> summary |
| Think High | 有意识的逻辑分析,速度较慢但更准确 | 复杂问题求解、规划 | <think> thinking </think> summary |
| Think Max | 将 reasoning 推到最大程度 | 探索模型 reasoning 能力边界 | 特殊 system prompt + <think> thinking </think> summary |
DeepSeek-V4-Pro-Max 与前沿模型对比
| Benchmark(指标) | Opus-4.6 Max | GPT-5.4 xHigh | Gemini-3.1-Pro High | K2.6 Thinking | GLM-5.1 Thinking | DS-V4-Pro Max |
|---|---|---|---|---|---|---|
| 知识与推理 | ||||||
| MMLU-Pro (EM) | 89.1 | 87.5 | 91.0 | 87.1 | 86.0 | 87.5 |
| SimpleQA-Verified (Pass@1) | 46.2 | 45.3 | 75.6 | 36.9 | 38.1 | 57.9 |
| Chinese-SimpleQA (Pass@1) | 76.4 | 76.8 | 85.9 | 75.9 | 75.0 | 84.4 |
| GPQA Diamond (Pass@1) | 91.3 | 93.0 | 94.3 | 90.5 | 86.2 | 90.1 |
| HLE (Pass@1) | 40.0 | 39.8 | 44.4 | 36.4 | 34.7 | 37.7 |
| LiveCodeBench (Pass@1) | 88.8 | - | 91.7 | 89.6 | - | 93.5 |
| Codeforces (Rating) | - | 3168 | 3052 | - | - | 3206 |
| HMMT 2026 Feb (Pass@1) | 96.2 | 97.7 | 94.7 | 92.7 | 89.4 | 95.2 |
| IMOAnswerBench (Pass@1) | 75.3 | 91.4 | 81.0 | 86.0 | 83.8 | 89.8 |
| Apex (Pass@1) | 34.5 | 54.1 | 60.9 | 24.0 | 11.5 | 38.3 |
| Apex Shortlist (Pass@1) | 85.9 | 78.1 | 89.1 | 75.5 | 72.4 | 90.2 |
| 长上下文 | ||||||
| MRCR 1M (MMR) | 92.9 | - | 76.3 | - | - | 83.5 |
| CorpusQA 1M (ACC) | 71.7 | - | 53.8 | - | - | 62.0 |
| Agentic | ||||||
| Terminal Bench 2.0 (Acc) | 65.4 | 75.1 | 68.5 | 66.7 | 63.5 | 67.9 |
| SWE Verified (Resolved) | 80.8 | - | 80.6 | 80.2 | - | 80.6 |
| SWE Pro (Resolved) | 57.3 | 57.7 | 54.2 | 58.6 | 58.4 | 55.4 |
| SWE Multilingual (Resolved) | 77.5 | - | - | 76.7 | 73.3 | 76.2 |
| BrowseComp (Pass@1) | 83.7 | 82.7 | 85.9 | 83.2 | 79.3 | 83.4 |
| HLE w/ tools (Pass@1) | 53.1 | 52.0 | 51.6 | 54.0 | 50.4 | 48.2 |
| GDPval-AA (Elo) | 1619 | 1674 | 1314 | 1482 | 1535 | 1554 |
| MCPAtlas Public (Pass@1) | 73.8 | 67.2 | 69.2 | 66.6 | 71.8 | 73.6 |
| Toolathlon (Pass@1) | 47.2 | 54.6 | 48.8 | 50.0 | 40.7 | 51.8 |
不同模式对比
| Benchmark(指标) | V4-Flash Non-Think | V4-Flash High | V4-Flash Max | V4-Pro Non-Think | V4-Pro High | V4-Pro Max |
|---|---|---|---|---|---|---|
| 知识与推理 | ||||||
| MMLU-Pro (EM) | 83.0 | 86.4 | 86.2 | 82.9 | 87.1 | 87.5 |
| SimpleQA-Verified (Pass@1) | 23.1 | 28.9 | 34.1 | 45.0 | 46.2 | 57.9 |
| Chinese-SimpleQA (Pass@1) | 71.5 | 73.2 | 78.9 | 75.8 | 77.7 | 84.4 |
| GPQA Diamond (Pass@1) | 71.2 | 87.4 | 88.1 | 72.9 | 89.1 | 90.1 |
| HLE (Pass@1) | 8.1 | 29.4 | 34.8 | 7.7 | 34.5 | 37.7 |
| LiveCodeBench (Pass@1) | 55.2 | 88.4 | 91.6 | 56.8 | 89.8 | 93.5 |
| Codeforces (Rating) | - | 2816 | 3052 | - | 2919 | 3206 |
| HMMT 2026 Feb (Pass@1) | 40.8 | 91.9 | 94.8 | 31.7 | 94.0 | 95.2 |
| IMOAnswerBench (Pass@1) | 41.9 | 85.1 | 88.4 | 35.3 | 88.0 | 89.8 |
| Apex (Pass@1) | 1.0 | 19.1 | 33.0 | 0.4 | 27.4 | 38.3 |
| Apex Shortlist (Pass@1) | 9.3 | 72.1 | 85.7 | 9.2 | 85.5 | 90.2 |
| 长上下文 | ||||||
| MRCR 1M (MMR) | 37.5 | 76.9 | 78.7 | 44.7 | 83.3 | 83.5 |
| CorpusQA 1M (ACC) | 15.5 | 59.3 | 60.5 | 35.6 | 56.5 | 62.0 |
| Agentic | ||||||
| Terminal Bench 2.0 (Acc) | 49.1 | 56.6 | 56.9 | 59.1 | 63.3 | 67.9 |
| SWE Verified (Resolved) | 73.7 | 78.6 | 79.0 | 73.6 | 79.4 | 80.6 |
| SWE Pro (Resolved) | 49.1 | 52.3 | 52.6 | 52.1 | 54.4 | 55.4 |
| SWE Multilingual (Resolved) | 69.7 | 70.2 | 73.3 | 69.8 | 74.1 | 76.2 |
| BrowseComp (Pass@1) | - | 53.5 | 73.2 | - | 80.4 | 83.4 |
| HLE w/ tools (Pass@1) | - | 40.3 | 45.1 | - | 44.7 | 48.2 |
| MCPAtlas (Pass@1) | 64.0 | 67.4 | 69.0 | 69.4 | 74.2 | 73.6 |
| GDPval-AA (Elo) | - | - | 1395 | - | - | 1554 |
| Toolathlon (Pass@1) | 40.7 | 43.5 | 47.8 | 46.3 | 49.0 | 51.8 |
Chat Template
本次发布不包含 Jinja 格式的 chat template。我们提供了专用的 encoding 文件夹,其中包含 Python 脚本和测试用例,演示如何将 OpenAI-compatible 格式的消息编码为模型输入字符串,以及如何解析模型的文本输出。完整文档请参阅 encoding 文件夹。
简要示例:
from encoding_dsv4 import encode_messages, parse_message_from_completion_text
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
# messages -> string
prompt = encode_messages(messages, thinking_mode="thinking")
# string -> tokens
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
tokens = tokenizer.encode(prompt)
如何在本地运行
关于在本地运行 DeepSeek-V4 的详细说明,包括模型权重转换和交互式聊天 demo,请参阅 inference 文件夹。
对于本地部署,我们建议将 sampling 参数设置为 temperature = 1.0, top_p = 1.0。对于 Think Max reasoning 模式,我们建议将上下文窗口设置为至少 384K token。
许可证
本仓库和模型权重采用 MIT License 授权。
引用
@misc{deepseekai2026deepseekv4,
title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
author={DeepSeek-AI},
year={2026},
}
联系方式
如有任何问题,请提交 issue,或通过 service@deepseek.com 联系我们。