together-ai

DeepSeek-V4 Pro 现已在 Together AI 上可用

DeepSeek-V4 Pro now available on Together AI

二〇二六年五月三日 · 英文原文

摘要

DeepSeek V4 Pro 已在 Together AI 上线，endpoint 为 deepseek-ai/DeepSeek-V4-Pro，采用 1.6T 参数 MoE、49B activated parameters，提供 512K tokens context，model-level 支持 1M tokens，含 Non-Think、Think High、Think Max 三种 reasoning modes；定价为 input $2.10、cached input $0.20、output $4.40/1M tokens。

1.6T 参数 MoE reasoning model，Together AI 上提供 512K context、可控 reasoning modes，以及面向 long-context workloads 的 cached-input pricing。

概览

规格	值
Model	Together AI 上的 DeepSeek V4 Pro
Endpoint	deepseek-ai/DeepSeek-V4-Pro
Architecture	1.6T 参数 MoE
Activated parameters	49B
Together AI 上的 context	512K tokens
Model-level context	1M tokens
Reasoning modes	Non-Think、Think High、Think Max
Deployment	Serverless、Monthly Reserved
Input price	$2.10 / 1M tokens
Cached input price	$0.20 / 1M tokens
Output price	$4.40 / 1M tokens
最适合的 workloads	Code agents、document intelligence、long-context agents、research synthesis

为 long-context reasoning 构建

DeepSeek V4 Pro 面向那些模型需要在短 prompt 之外进行 reasoning 的 workloads：大型 repositories、长篇技术文档、密集 retrieval bundles、tool-call histories 和 research corpora。

DeepSeek V4 Pro 在 model level 支持 million-token context；在 Together AI 上，它目前提供 512K-token context window。这个区别很重要，因为模型能力和已部署的 serving profile 并不总是一回事。Together AI 推出 DeepSeek V4 Pro 时采用的 context window 面向可靠的生产 serving，同时仍为团队提供足够空间来处理严肃的 long-context workloads。

架构同样重要，因为 long context 不只是一个产品规格。随着 context 增长，serving cost、memory pressure、KV cache 使用、latency 和 concurrency 都会成为系统设计的一部分。DeepSeek V4 Pro 使用 hybrid attention，结合 Compressed Sparse Attention 和 Heavily Compressed Attention；DeepSeek 报告称，在 million-token context 下，与 DeepSeek V3.2 相比，其 single-token inference FLOPs 为 27%，KV cache 为 10%。

按 workload 选择 reasoning effort

DeepSeek V4 Pro 支持三种 reasoning modes，因此团队可以根据任务难度匹配 reasoning 深度，而不是以同一种方式处理所有请求。

Mode	适用场景	取舍
Non-Think	Extraction、classification、简单 Q&A、常规响应	面向较低复杂度任务的最快路径
Think High	Code planning、document analysis、multi-step reasoning	为复杂工作提供更深的 reasoning
Think Max	困难 debugging、深度 research synthesis、agentic decision points	最大 reasoning effort；预期 latency 和 token 使用会更高

一个 document assistant 可以用 Non-Think 做简单 extraction，用 Think High 做跨政策冲突分析，只在模型需要推理一个困难决策时使用 Think Max。一个 code agent 可以用 Think High 规划迁移，用 Think Max 调试微妙的跨服务故障。

DeepSeek 报告了 coding、reasoning、long-context 和 agentic tasks 的 benchmark 结果，包括 93.5% LiveCodeBench、90.1% GPQA Diamond、80.6% SWE-bench Verified、83.5% MRCR 1M 和 62.0% CorpusQA 1M。

用 cached input pricing 降低重复 long-context 查询成本

Long-context 系统通常会在多个问题中复用同一个大型 context：repository snapshot、document bundle、policy archive、retrieval payload，或很长的 agent trace。Cached input pricing 让这些重复 workloads 更可行。

DeepSeek V4 Pro 的价格为 $2.10 / 1M input tokens，cached input 为 $0.20 / 1M tokens，output 为 $4.40 / 1M tokens。对于复用的 context，这意味着 90% 的成本降低；当请求中昂贵的部分是会在后续分析中重复使用的稳定文本块时，这一点很重要。

示例模式：

加载一个大型稳定 context，例如 300K-token repo summary、contract set 或 policy archive。
在同一 context 上提出多个后续问题。
在适用情况下使用 cached input pricing，大幅降低重复分析的成本。

Workload 模式

Code agents

当 agent 需要跨 repository slices、issue traces、内部文档、先前 tool calls 和 proposed patches 进行 reasoning 时，使用 DeepSeek V4 Pro。Think High 或 Think Max 最适合规划变更、调试故障或解决跨文件依赖。

Document intelligence

对于需要在一次请求中比较的 contracts、policy sets、technical manuals 或 research collections，使用 long context。Non-Think 可以处理 extraction 和简单 Q&A；Think High 更适合冲突分析、解释和综合。

Long-context agent traces

使用 DeepSeek V4 Pro 检查较长的 tool-call histories、中间结果和 execution traces。更高的 reasoning modes 最适合决策点：当 agent 需要决定是否继续、调用另一个工具、修改计划或停止时。

Research synthesis

将 DeepSeek V4 Pro 用于结合 papers、notes、benchmark reports、retrieved documents 和先前分析的 workflows。当同一 evidence set 被复用于多个问题时，cached input pricing 尤其有用。

从 serverless 开始，迁移到 reserved capacity

DeepSeek V4 Pro 可用于 Together AI Serverless Inference 和 Monthly Reserved infrastructure。Serverless 是评估、开发和可变流量的合适起点。Monthly Reserved 更适合更稳定的生产需求，此时团队需要更可预测的 capacity 和成本控制。

对于 long-context workloads，部署路径很重要。团队选择的不只是一个模型；他们还在选择随着 context sizes 增长，如何管理 throughput、concurrency、latency、KV cache pressure 和成本。Together AI 为团队提供了一条从评估到生产的路径，而无需自行搭建 serving stack。

立即试用

DeepSeek-V4 Pro 现已在 Together AI Serverless Inference 和 Dedicated Endpoints 上可用。

from together import Together

client = Together()

stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    messages=[
        {
            "role": "user",
            "content": "Prove that the square root of 2 is irrational.",
        }
    ],
    stream=True,
)

for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta

    if hasattr(delta, "reasoning") and delta.reasoning:
        print(delta.reasoning, end="", flush=True)
    if hasattr(delta, "content") and delta.content:
        print(delta.content, end="", flush=True)

从 Serverless Inference 开始进行开发和评估。对于需要完整 1M context、reserved capacity、workload isolation 或更可预测 throughput 的生产 workloads，请联系销售，在 Together AI Dedicated Inference 上部署 DeepSeek-V4 Pro。

DeepSeek R1

具备原生音频和逼真物理效果的 premium cinematic video generation。

DeepSeek R1

Audio Name

Audio Description

0:00

具备原生音频和逼真物理效果的 premium cinematic video generation。

DeepSeek R1

具备原生音频和逼真物理效果的 premium cinematic video generation。

Performance & Scale

正文文案放在这里 lorem ipsum dolor sit amet

要点放在这里 lorem ipsum
要点放在这里 lorem ipsum
要点放在这里 lorem ipsum

Infrastructure

Best for

更快的处理速度（更低的整体查询 latency）和更低的运营成本
执行定义清晰、直接的任务
Function calling、JSON mode 或其他结构良好的任务

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Build

包含权益：

✔ 最高 $15K 免费平台 credits*
✔ 3 小时免费 forward-deployed engineering 时间。

Funding:Less than $5M

Build

包含权益：

✔ 最高 $15K 免费平台 credits*
✔ 3 小时免费 forward-deployed engineering 时间。

Funding:Less than $5M

Build

包含权益：

✔ 最高 $15K 免费平台 credits*
✔ 3 小时免费 forward-deployed engineering 时间。

Funding:Less than $5M

逐步思考，并且只将你的最终答案放在标签和内。按照以下规则组织你的 reasoning：**进行 reasoning 时，只能用阿拉伯语回答，不允许使用其他语言。**问题如下：

‍Natalia 在 4 月向她的 48 位朋友卖了发夹，然后她在 5 月卖出的发夹数量是 4 月的一半。Natalia 在 4 月和 5 月总共卖了多少个发夹？

Title

正文文案放在这里 lorem ipsum dolor sit amet

Title

正文文案放在这里 lorem ipsum dolor sit amet

Title

正文文案放在这里 lorem ipsum dolor sit amet

DeepSeek R1

具备原生音频和逼真物理效果的 premium cinematic video generation。

DeepSeek R1

Audio Name

Audio Description

0:00

具备原生音频和逼真物理效果的 premium cinematic video generation。

DeepSeek R1

具备原生音频和逼真物理效果的 premium cinematic video generation。

Performance & Scale

正文文案放在这里 lorem ipsum dolor sit amet

要点放在这里 lorem ipsum
要点放在这里 lorem ipsum
要点放在这里 lorem ipsum

Infrastructure

Best for

更快的处理速度（更低的整体查询 latency）和更低的运营成本
执行定义清晰、直接的任务
Function calling、JSON mode 或其他结构良好的任务

List Item #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item #1

Build

包含权益：

✔ 最高 $15K 免费平台 credits*
✔ 3 小时免费 forward-deployed engineering 时间。

Funding:Less than $5M

Build

包含权益：

✔ 最高 $15K 免费平台 credits*
✔ 3 小时免费 forward-deployed engineering 时间。

Funding:Less than $5M

Build

包含权益：

✔ 最高 $15K 免费平台 credits*
✔ 3 小时免费 forward-deployed engineering 时间。

Funding:Less than $5M

‍Natalia 在 4 月向她的 48 位朋友卖了发夹，然后她在 5 月卖出的发夹数量是 4 月的一半。Natalia 在 4 月和 5 月总共卖了多少个发夹？

Title

正文文案放在这里 lorem ipsum dolor sit amet

Title

正文文案放在这里 lorem ipsum dolor sit amet

Title

正文文案放在这里 lorem ipsum dolor sit amet

译自 together-ai · 录于二〇二六年五月三日