apple-ml-research

从物体的位置到用途：多模态 LLM 空间–功能智能 benchmark

From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs

二〇二六年五月八日 · 英文原文

摘要

该内容提出 Spatial-Functional Intelligence Benchmark（SFI-Bench），用于评估多模态 agent 从几何感知到物体功能理解的高阶空间智能。该 video benchmark 基于多样化第一人称室内 video scan，包含 1700 多个问题，旨在补充 VSI-Bench 等现有几何评测的不足。

多模态 agent 的真正空间智能超越了低层次的几何感知，从知道物体在哪里，发展到理解它们用于什么。尽管现有 benchmark（如 VSI-Bench）能有效评估这一基础几何阶段，但它们不足以考察 grounded intelligence 所必需的高阶认知能力。为弥合这一差距，我们提出了 Spatial-Functional Intelligence Benchmark（SFI-Bench），这是一个基于 video 的 benchmark，包含来自多样化、第一人称室内 video scan 的 1700 多个问题。SFI-Bench 旨在……

译自 apple-ml-research · 录于二〇二六年五月八日