Lilac Turns Your Idle GPUs Into a Unified Compute Fabric

AI Machine LearningCloud ComputingInfrastructure

The Macro: The GPU Waste Problem Nobody Wants to Admit

I keep hearing the same story from ML teams at mid-size companies. They fought for months to get GPU budget approved. They spun up clusters on AWS, maybe some on GCP, maybe a few on-premise machines with NVIDIA cards. And now, on any given Tuesday, half those GPUs are sitting idle while a different team is stuck in a queue waiting for compute.

This is not a theoretical problem. Cirrascale published data showing enterprise GPU utilization rates hover around 30 to 50 percent. That means for every dollar a company spends on GPU compute, thirty to seventy cents is wasted on machines doing nothing. At current GPU prices, which remain absurd despite everything, that waste adds up fast. A single H100 instance on AWS costs roughly $30 per hour. A cluster of eight sitting idle for a weekend burns $5,760 with nothing to show for it.

The reason utilization is so low is fragmentation. Different teams provision their own resources. Different projects run on different clouds. Nobody has a unified view of what is available, what is being used, and what could be shared. The infrastructure team sees the bill. The data scientists see the queue. Nobody sees both at the same time.

Run.ai, which NVIDIA acquired, tried to solve this with GPU orchestration for Kubernetes. CoreWeave built an entire cloud around GPU-native infrastructure. Lambda Labs sells GPU clusters. Anyscale (the Ray company) offers distributed compute management. But most of these solutions either require you to move everything to their platform or are designed for organizations that are already operating at massive scale. The mid-market, companies running 10 to 100 GPUs across a messy mix of clouds and on-premise hardware, is underserved.

The Micro: Brothers Building a GPU Fabric

Lilac is an open-source platform that takes scattered GPU resources across multiple clouds, on-premise machines, and edge devices, and unifies them into a single compute fabric. Data scientists submit training or inference jobs to one scheduler, and Lilac figures out where to run them based on what is available, what the job needs, and what it costs.

Ryan Ewing and Lucas Ewing are the founders. They are brothers, based in San Francisco, part of Y Combinator’s Summer 2025 batch with a four-person team. The brother dynamic in startup founding teams tends to reduce communication overhead and trust issues, which matters a lot when you are building infrastructure that touches multiple clouds and needs to be reliable.

The core architecture sits on top of Kubernetes. This is the right call. Most companies running GPU workloads at any scale are already on Kubernetes, and building a new orchestration layer that ignores Kubernetes would be asking customers to rip and replace their existing infrastructure. Lilac plugs into what is already there and adds a scheduling layer on top.

The open-source angle is important for this category. Infrastructure buyers, especially platform engineering teams, do not want to be locked into a proprietary orchestration layer for their GPU fleet. If Lilac goes down or pivots or raises prices, they need to be able to fork the code and keep running. Open source earns the trust that closed-source infrastructure tools have to buy with years of enterprise sales.

What I find interesting about Lilac is that the value proposition is not “buy more GPUs.” It is “use the GPUs you already have.” That is a much easier conversation with a CFO. You are not asking for new budget. You are asking to stop wasting the budget that was already approved. In the current environment where every company is scrutinizing AI spend, that pitch lands differently than “give us more money for compute.”

The inference monetization angle is the other side of the coin. If a company has idle GPUs, Lilac can route external inference traffic to those machines, generating revenue from hardware that would otherwise be depreciating in a rack doing nothing. Cheaper inference for the buyers, found money for the sellers. The unit economics work if the orchestration layer is reliable enough.

The Verdict

Lilac is solving a problem I have seen firsthand at multiple companies. GPU waste is real, it is expensive, and most organizations deal with it by either ignoring it or throwing more procurement at the problem. A unified scheduler that works across clouds and on-premise is the right architectural approach.

The risk is reliability. GPU workloads are not web requests. A failed training job that was 90 percent complete costs real time and money. If Lilac’s scheduler makes a bad decision, routes a job to a machine that goes offline, or introduces latency in a way that breaks a training run, the team will rip it out and go back to manually managing their cluster. Infrastructure tools get one chance to earn trust.

In thirty days, I want to know how many organizations are running Lilac in production, not just testing it. Sixty days, the question is whether the open-source community is growing or stagnant. GitHub stars and Discord activity are early signals. Ninety days, I want to see real utilization data. If companies running Lilac can demonstrate a measurable increase in GPU utilization, say from 35 percent to 70 percent, the product sells itself to every infrastructure team in the market. The team is small but the architecture is sound. GPU waste is a problem that gets more expensive every quarter, and Lilac is positioned to be the answer.

Visit Official Site →

← Back to July 9, 2026 edition

Lilac Turns Your Idle GPUs Into a Unified Compute Fabric

The Macro: The GPU Waste Problem Nobody Wants to Admit

The Micro: Brothers Building a GPU Fabric

The Verdict

More on this