Plexe Thinks You Should Build ML Models the Way You Order Coffee

The Macro: Machine Learning Is Still Too Hard for Most Teams

Here is a dirty secret about enterprise machine learning: most companies that say they are “using AI” are calling an API. They are sending text to a language model and getting text back. Actual predictive ML, the kind where you train a model on your own data to forecast churn, detect fraud, predict demand, or classify transactions, remains stubbornly difficult for anyone who does not employ a team of data scientists.

The traditional workflow looks something like this. A business person identifies a prediction they need. They write a requirements document. A data scientist spends two weeks exploring the data. They try four different model architectures. They evaluate performance using metrics the business person does not understand. They build a training pipeline. They deploy to an endpoint. The whole process takes six to twelve weeks, costs tens of thousands of dollars in salary, and produces a model that might need to be rebuilt in six months when the data distribution shifts.

AutoML platforms have been trying to fix this for years. DataRobot raised over $1 billion to automate model selection and training. H2O.ai does similar work in the open-source space. Obviously there is also the MLOps layer with platforms like MLflow, Weights & Biases, and Neptune tracking experiments and deployments. These tools made the process faster for experienced data scientists. They did not make it accessible to people who are not data scientists.

The LLM era has created a weird bifurcation. On one side, you have incredibly powerful general-purpose language models that anyone can use through a chat interface. On the other side, you have the entire world of predictive ML, which is arguably more commercially valuable for most businesses, still locked behind the same skill barriers it had five years ago. The churn prediction model that could save a SaaS company millions in revenue still requires someone who knows what XGBoost is.

I think the most interesting question in developer tools right now is whether AI agents can close that gap. Not by replacing data scientists, but by automating the parts of their job that are tedious and well-understood enough to be procedural.

The Micro: Describe a Problem, Get a Deployed Model

Plexe is a two-person team out of YC’s Spring 2025 batch, founded by Vaibhav Dubey (CEO) and Marcello De Bernardi (CTO). The product is open-source and the pitch is startlingly direct: describe the prediction you want to make in plain language, connect your data sources, and Plexe’s agents handle the rest. They run experiments, evaluate model performance, and deploy the winning model to an API endpoint.

That “handle the rest” is doing a lot of work, so let me unpack what that actually means. The agent connects to your data, which could be a database, a CSV, or whatever your source looks like. It examines the schema and the distributions. It formulates the ML problem (classification, regression, time series, whatever fits). It selects candidate approaches. It trains and evaluates multiple models. It picks the best one based on standard metrics. And it deploys it to an endpoint you can call from your application.

If that sounds like what a junior data scientist does during their first three months on the job, that is exactly the point. The bet is that a huge percentage of corporate ML use cases are not novel research problems. They are well-understood prediction tasks applied to company-specific data. You do not need a PhD to build a churn model. You need someone (or something) that knows the playbook and can execute it systematically.

The open-source angle matters here for the same reason it always matters in infrastructure: trust. If Plexe is going to connect to your production database and train models on your proprietary data, you need to be able to inspect what it is doing. Published source code makes the agent auditable in a way that a closed-source AutoML platform is not.

What I cannot evaluate from the website alone is the quality of the models it produces. AutoML has historically been good enough for proof-of-concept work but not always competitive with hand-tuned models for production use. If Plexe’s agents are producing models that are 85% as good as what a senior data scientist would build, that is transformative for teams without ML expertise. If they are producing models that are 60% as good, it is a demo.

The deployment-to-API-endpoint piece is the right product decision. The value of a model that lives in a Jupyter notebook is roughly zero. A model behind an endpoint that your application can call is an actual product. Collapsing the distance between “I have a trained model” and “my app uses this model” removes one of the biggest friction points in the ML workflow.

The Verdict

I think Plexe is making a smart bet on a real gap. The space between “I can call GPT” and “I have a custom predictive model trained on my data” is enormous, and the tools to cross it have not gotten meaningfully easier for non-specialists in years.

What I would want to know at 30 days: model quality. Pick five standard Kaggle competitions, run Plexe against them, and compare the results to what an experienced data scientist would produce. That tells you whether this is a shortcut or a compromise.

At 60 days: what happens when the data is messy? Clean data with clear labels is the easy case. Real corporate data has missing values, inconsistent formatting, label noise, and feature leakage. How gracefully the agent handles data quality issues will determine whether this works outside of demos.

At 90 days: retention. Do teams use Plexe once for a proof of concept and then hire a data scientist to do it properly? Or do they keep shipping models through the platform? The answer to that question determines whether this is a product or a prototype accelerator.

The two-person team is a risk and an opportunity. They can iterate fast, but the surface area of “all of ML” is vast. Focusing on the most common prediction patterns and doing those exceptionally well would be smarter than trying to handle every edge case.

I am cautiously optimistic. The agent-driven approach to ML has been tried before, but the underlying agent capabilities are genuinely better now than they were two years ago. This might be the moment it actually works.

Plexe Thinks You Should Build ML Models the Way You Order Coffee

The Macro: Machine Learning Is Still Too Hard for Most Teams

The Micro: Describe a Problem, Get a Deployed Model

The Verdict

More on this