The Macro: AI Labs Are Desperate for Good Video Data
The next frontier of AI is video understanding and generation. Every major research lab is training video models, from robotics foundation models that learn from human demonstrations to video generation systems that compete with Hollywood. But training these models requires enormous amounts of video data, and not just any video. Specific, labeled, high-quality video data.
An AI lab training a robotics model needs thousands of hours of videos showing specific human activities: cooking with hand-object interactions, assembly tasks, tool use, navigation. Finding and curating these datasets is a massive operational challenge. Public video platforms have the content, but extracting, cleaning, and labeling specific categories at scale is a full-time job for multiple people.
The data licensing landscape is also complicated. Using publicly available videos for AI training raises legal questions. Private datasets from specific sources require individual licensing agreements. Assembling a large, diverse, legally clean video dataset is one of the biggest bottlenecks in video AI research.
Shofo, backed by Y Combinator, is building the infrastructure to solve this. They combine public videos from the open web with private videos aggregated from thousands of sources into a single, continuously updating index of billions of videos.
The Micro: Billions of Videos, Custom Datasets on Demand
Bryan Hong (CEO), Andre Braga (Head of AI), Braiden Dishman (COO), and Alexzendor Misra (CTO) built Shofo as a data infrastructure company rather than an AI model company. They are not training their own video AI. They are providing the fuel that other companies need to train theirs.
The pipeline works in three steps. Ingest: collecting and aggregating video from public web sources and private partners. Index: building a searchable index with metadata, content descriptions, and categorizations. Deliver: cleaning, segmenting, and labeling specific subsets to match customer requirements.
When an AI lab needs 100,000 hours of cooking videos featuring hand-object interactions, Shofo can query their index, filter to relevant content, and deliver a cleaned, labeled dataset. This eliminates months of data collection work that labs would otherwise do themselves.
The Hugging Face presence suggests they are engaging with the open-source AI research community, which is smart for credibility and discoverability.
Competitors include Scale AI (general data labeling), Labelbox (data annotation platform), and various web scraping services. But nobody is building a video-specific index at the scale Shofo describes. The video data infrastructure space is underdeveloped compared to text and image data infrastructure.
The Verdict
Shofo is positioned as essential infrastructure for the video AI wave. Every lab training video models needs data, and assembling that data manually is slow and expensive.
At 30 days: how many AI labs are actively purchasing custom datasets, and what are the typical dataset sizes?
At 60 days: what is the data quality feedback from customers? Labeled video data with errors is worse than no data.
At 90 days: is the continuous updating of the index keeping pace with the content needs of active customers?
I think Shofo is building a valuable data moat. The index of billions of videos, once built, creates a compounding advantage that is hard for competitors to replicate. As video AI becomes more important, the companies that control the training data become essential partners for every lab in the space.