- MolmoPoint is a new open-source multimodal model from Allen Institute for AI (AI2) that introduces advanced pointing and clicking capabilities.
- It utilizes a novel architecture that maps visual coordinates to text tokens, allowing the model to interact with user interfaces and physical environments with high precision.
- The model outperforms several proprietary counterparts in tasks requiring spatial awareness and fine-grained visual grounding.
- AI2 has released the model weights, training data, and evaluation benchmarks to promote transparency and open research in the AI community.
Entities: Allen Institute for AI (AI2, Molmo, MolmoPoint