Rubric AI Turns Expert Judgment Into Training Data for Frontier Models

The Macro: Post-Training Is Where the Real Value Gets Created

Pre-training gets all the attention. The billion-dollar GPU clusters, the trillion-token datasets, the scaling laws debates. But anyone paying close attention to how frontier models actually get good at specific things knows that post-training is where the magic happens. RLHF. DPO. Constitutional AI. Expert fine-tuning. Whatever technique you use, the core idea is the same: you take a base model that knows a lot about everything and shape it into something that is excellent at what you specifically need.

The problem is that post-training is incredibly labor-intensive, and the quality of the output depends entirely on the quality of the human judgment going in. You need domain experts who can evaluate model outputs and provide the training signals that push the model in the right direction. In healthcare, that means doctors and clinicians. In law, that means attorneys. In finance, that means analysts and compliance officers. These people are expensive, their time is limited, and the infrastructure for capturing their judgment efficiently barely exists.

Most companies doing post-training today are either doing it themselves with internal teams (expensive and slow) or outsourcing it to labeling companies like Scale AI, Surge, or Appen. The labeling companies provide volume but often lack the domain expertise needed for specialized applications. A general annotator can label sentiment. They cannot reliably evaluate whether a model’s medical advice is clinically appropriate.

Rubric AI, backed by Y Combinator (W25), is building the infrastructure that sits between domain experts and frontier models. They are creating the environment and tools that make it possible to capture expert judgment at scale and turn it into training signals.

The Micro: Expert Judgment as Infrastructure

Pragya Saboo founded Rubric with a background that spans product leadership at Oscar Health (through its IPO), computer vision development at Apella, and product work at Asana. The healthcare experience is particularly relevant given that Rubric’s current focus is on regulated industries, specifically healthcare and life sciences.

The product, as described on the site, provides domain-specific data infrastructure for AI post-training. The emphasis on regulated industries tells you a lot about the positioning. Unregulated verticals can get away with “good enough” training data. Healthcare cannot. When a model is being trained to assist with clinical decisions, the training data needs to come from people who understand clinical practice, and it needs to be auditable, consistent, and defensible.

This is a harder problem than it sounds. It is not just about finding doctors to label data. It is about building the workflow that presents the right model outputs for evaluation, captures the expert’s reasoning (not just their binary approval/rejection), and converts that reasoning into a training signal that actually improves the model. The entire pipeline, from task design through expert evaluation to training integration, needs to work smoothly or the quality degrades fast.

The competitive space includes Scale AI, which is the dominant player in data labeling and has expanded into RLHF data. Invisible Technologies provides expert-in-the-loop services. Labelbox offers labeling infrastructure. But Rubric’s focus on regulated industries and purpose-built training environments differentiates it from the generalist approach these larger players take.

The “purpose-built training environments” language is interesting. This suggests they are not just building labeling tools but creating the entire context in which experts interact with model outputs. In healthcare, that might mean presenting a clinical scenario, showing the model’s response, and giving the expert a structured way to evaluate accuracy, safety, and clinical appropriateness, all within an environment that meets compliance requirements.

The Verdict

The post-training market is about to explode. As more companies build specialized AI products for regulated industries, the demand for high-quality, expert-driven training data will grow dramatically. The companies that control the infrastructure for capturing that data will have significant leverage.

At 30 days: how many domain experts are actively providing training data through the platform? The supply side of the marketplace (expert availability) is often harder to build than the demand side.

At 60 days: which frontier model companies are using Rubric’s data? If any of the major labs are customers, that validates the quality of the training signals being produced.

At 90 days: can Rubric demonstrate measurable improvements in model performance on healthcare-specific tasks using their training data versus generic alternatives? Published benchmarks would be the strongest possible proof point.

I think the regulated industry focus is smart. It is harder to serve but also harder to compete with, because the domain expertise requirements create a natural moat. If Rubric can build the standard infrastructure for post-training in healthcare, the expansion to other regulated verticals (law, finance, insurance) follows naturally.

Rubric AI Turns Expert Judgment Into Training Data for Frontier Models

The Macro: Post-Training Is Where the Real Value Gets Created

The Micro: Expert Judgment as Infrastructure

The Verdict

More on this