MolmoPoint: Open-Source Multimodal Models with Precise Visual Grounding

Mar 20, 2026
  • MolmoPoint is a new open-source multimodal model from Allen Institute for AI (AI2) that introduces advanced pointing and clicking capabilities.
  • It utilizes a novel architecture that maps visual coordinates to text tokens, allowing the model to interact with user interfaces and physical environments with high precision.
  • The model outperforms several proprietary counterparts in tasks requiring spatial awareness and fine-grained visual grounding.
  • AI2 has released the model weights, training data, and evaluation benchmarks to promote transparency and open research in the AI community.

Entities: Allen Institute for AI (AI2, Molmo, MolmoPoint