X · 研究者一手

@drjimfan Claw 的力量，尽在机器人手掌中。Agentic 机器人学 …

@drjimfan The power of the Claw, in the palm of a robot hand. Agentic robotics …

二〇二六年五月八日 · 英文原文

摘要

NVIDIA、Berkeley、Stanford、CMU 开源 CaP-X（MIT license），用于 agentic robotics，集成 perception、control、visualization APIs，并自动合成 skill libraries。项目包含 CaP-Gym 187 个 manipulation tasks、CaP-Bench 对 12 个 LLM/VLM 的 8 层评测、CaP-Agent0 和 CaP-RL；7B OSS model 经 50 次训练迭代 success 从 20% 提至 72%。

Claw 的力量，就在机器人手掌之中。Agentic robotics 已经到来！今天，我们开源 CaP-X：vibe agents，在物理世界中“活”了起来。它们化身为机器人手臂和 humanoids，配备丰富的 perception APIs、actuation APIs，并在运行过程中自动合成 skill libraries。CaP-X 是我们旧 stack 的严格超集，因为像 VLAs 这样的 policies 也“只是”API calls。它可以 zero-shot 解决许多 learned policy 会吃力的任务。

而我们做的远不止 vibing。CaP-X 是我们迄今为止关于 agentic robotics 最系统、最科学的研究：

我们构建了一个全面的 agentic toolkit：perception（SAM3 segmentation、Molmo pointing、depth、point cloud）、control（IK solvers、grasp planner、navigation）和 visualization（EEF、mask overlays），可跨不同机器人工作。
CaP-Gym：LLM 的第一次 Physical Exam！涵盖 RoboSuite、LIBERO-PRO 和 BEHAVIOR 的 187 个 manipulation tasks。Tabletop、bimanual、mobile manipulation。Sim 和 real。等不及看到 gradients 从 CaP-Gym 流向下一波 frontier LLM releases。
CaP-Bench：我们在 8 个 evaluation tiers 上 benchmark 了 12 个 frontier LLMs/VLMs（Gemini、GPT、Opus、Qwen、DeepSeek、Kimi 等）。我们系统性地改变 API abstraction level、agentic harness 和 visual grounding methods。论文里有很多 insight。
CaP-Agent0：一个 training-free agentic harness，在 7 个任务中的 4 个上，无需 task-specific tuning，即可匹配或超过 human expert code。
CaP-RL：如果你有 gym，就会有 RL ;)。一个 7B OSS model 只经过 50 个 training iterations，success 从 20% 跳到 72%。合成的 programs 可迁移到真实机器人，sim-to-real gap 很小。

3 年前，我们团队创建了 Voyager，这是最早期的 agentic AI 之一，可以在 Minecraft 中持续游玩和学习。它的关键思想——skill libraries、self-reflection loops 和 in-context planning——此后影响了许多现代 agentic designs。

今天，这个 agent 从 Minecraft 毕业，找到了一份真正的工作。今天是愚人节，但这个 Claw 真的要动手干活了！

链接见 thread：

和往常一样，我们开源全部内容，MIT license：https://t.co/uu310bY4bT Code：https://t.co/hzDpW3Gx49 Paper：https://t.co/iChnrXCtHy

CaP-X 由 NVIDIA、Berkeley、Stanford 和 CMU 共同带来。我想感谢传奇人物 @Ken_Goldberg，他共同指导了这项工作，也感谢全身心投入其中的团队！

译自 X · 研究者一手 · 录于二〇二六年五月八日