How NVIDIA Builds Open Data for AI

Mar 17, 2026
  • NVIDIA is open-sourcing massive datasets to accelerate AI development, including 10 trillion language tokens and specialized data for robotics and autonomous vehicles.
  • The “Open Data for AI” initiative provides developers with high-quality, diverse data to train foundational and physical AI models.
  • Key releases include 1,700+ hours of driving data and 500,000 robotics trajectories to address the “data wall” in physical AI.
  • These resources are hosted on Hugging Face and GitHub, supporting the creation of sovereign AI systems tailored to specific industry needs.

Entities: NVIDIA, Hugging Face