Hugging Face · Daily Papers

SplAttN：用 Gaussian Soft Splatting 和 Attention 连接 2D 与 3D 以实现点云补全

SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion

Zhaoyang Li, Zhichao You, Tianrui Li

来自 Southwest Jiaotong University

二〇二六年五月六日 · arXiv:2605.01466 · PDF · Code

摘要

尽管 multi-modal learning 推动了 point cloud completion 的发展，其理论机制仍不清楚。近期工作将成功归因于模态之间的连接，但我们发现，标准 hard projection 会切断这种连接：将 sparse point cloud 投影到 image plane 会产生极度稀疏的支撑，从而阻碍 visual prior 的传播。我们将这种失效模式称为 Cross-Modal Entropy Collapse。

为解决这一实际限制，我们提出 SplAttN，用 Differentiable Gaussian Splatting 替代 hard projection，以生成稠密、连续的 image-plane 表示。通过将投影重新表述为连续 density estimation，SplAttN 避免了稀疏支撑的坍缩，促进 gradient flow，并提升 cross-modal connection 的可学习性。大量实验表明，SplAttN 在 PCN 和 ShapeNet-55/34 上达到 state-of-the-art 性能。

关键的是，我们使用真实世界的 KITTI benchmark 作为 multi-modal 依赖性的压力测试。反事实评估显示，baseline 会退化为对视觉移除不敏感的 unimodal template retriever，而 SplAttN 仍能保持对 visual cue 的稳健依赖，验证了我们的方法能够建立有效的 cross-modal connection。代码见 https://github.com/zay002/SplAttN.