← May 4, 2027 edition

runanywhere

The default way of running on-device AI at scale

RunAnywhere Deploys AI Models on Every Device With a Few Lines of Code

The Macro: The AI Industry Is Moving Back to the Device

Cloud AI is great until it is not. Latency kills real-time applications. Privacy requirements prevent data from leaving the device. Network dependency means your AI breaks when the connection drops. Cost accumulates with every API call. For a growing number of use cases, running AI models on the device itself is not just preferable. It is necessary.

The problem is that on-device AI deployment is painful. Every device class behaves differently. iPhones use Metal. Android phones use various GPU architectures. Edge devices have their own constraints. Runtimes vary. Models need to be quantized and optimized for each target. Performance collapses under memory and power constraints. Managing model updates across thousands of devices requires its own infrastructure.

The existing options are fragmented. Core ML handles Apple devices. TensorFlow Lite handles Android. ONNX Runtime covers some edge cases. But each requires device-specific optimization, and none provide a unified SDK that works across platforms. The developer who wants to deploy the same model on an iPhone, an Android phone, and an edge device has to maintain three separate codepaths.

The Micro: A Custom Inference Engine That Beats llama.cpp

Sanchit Monga and Shubham Malhotra founded RunAnywhere. Sanchit was an Intuit engineer who built mobile SDKs and developer products used by 50M+ active users. Shubham built MetalRT, a multi-modal inference engine for Apple Silicon, and previously worked at Amazon EC2 Spot and Azure. They are a two-person team from YC Winter 2026 with Diana Hu.

The core product is an SDK for deploying AI models on iOS, Android, web, and edge devices. The MetalRT inference engine runs 1.67x faster than llama.cpp on Apple Silicon, achieving 658 tokens per second for Qwen3-0.6B at 4-bit quantization on an M4 Max. Those are real performance numbers that developers can verify.

The platform handles LLM, speech-to-text, text-to-speech, and vision capabilities. A fleet management dashboard provides real-time analytics and OTA model updates. Policy-based routing allows hybrid deployment between on-device and cloud inference, automatically choosing the best execution path based on device capability and network conditions.

The open-source SDK has 10,300+ GitHub stars, which indicates strong developer interest. Multi-SDK support covers Swift, Kotlin Multiplatform, React Native, and Flutter. The claimed integration time is three minutes.

The Verdict

RunAnywhere is building the infrastructure layer for the on-device AI wave. The performance numbers are strong, the open-source traction is real, and the developer experience sounds genuinely simple. If on-device AI becomes as common as I expect, RunAnywhere could become essential infrastructure.

The risk is competition from the platform owners. If Apple and the Android team build better inference runtimes natively, the value of a third-party SDK diminishes. But platform-native solutions tend to be platform-specific, and RunAnywhere’s cross-platform advantage is hard to replicate.

In 30 days, I want to see the number of production apps using the SDK. In 60 days, the question is whether the fleet management dashboard is driving enterprise adoption. In 90 days, I want to know about the hybrid routing performance. If RunAnywhere can intelligently split inference between device and cloud based on real-time conditions, that is a powerful capability that pure on-device or pure cloud solutions cannot match.