The Macro: Everyone’s Building the AI That Watches
The interesting race in AI productivity right now is not about chatbots. It’s about the gap between seeing something and knowing what to do about it. Chatbots still require you to describe your problem in words. Vision models changed that a little. Ambient agents are trying to close it entirely.
The raw market numbers are big and frankly not that useful on their own. The AI productivity tools market sat at roughly $8.8 billion in 2024 and is projected to reach $36 billion by 2033, according to Grand View Research, at a compound annual growth rate of about 16%. Multiple sources across the productivity software space are projecting similar trajectories. The headline number is not the story. The story is where the growth is going.
It is going toward ambient and contextual. The premise is simple: the most powerful assistant is one you never have to context-switch to use. You look at something confusing, the agent explains it. You look at a form, it fills it. You look at a schematic, it walks you through it. This is the space everyone is quietly building toward.
Some are building it at the app layer, like CoChat, which is going after team-level AI agent stacks, or Viktor, which is trying to be a persistent AI employee rather than a one-off tool. Others are building at the infrastructure layer. SuperPowers AI is going somewhere more specific: the glass and the camera. Phones and wearables. The vision input you carry around all day.
Google has Lens. Apple has Visual Intelligence baked into iPhone 16. Meta has Ray-Bans with Meta AI. The incumbent pressure here is real. The question for any startup in this space is not whether the idea is good. The idea is obviously good. The question is what they can do that the platforms cannot, or will not, do fast enough.
The Micro: Agents That Generate Their Own UI at Runtime
Here is what makes SuperPowers AI actually interesting, at least on paper. It is not just vision plus a language model. According to the product description and coverage from scouts.yutori.com, the agents generate custom UIs at runtime. That is the unusual detail. The agent does not just return text. It builds the interface around the answer.
The pitch is Claude-grade reasoning running on phones or AR glasses, ambient and always available, no coding required. You look at something, the agent sees it through your camera, reasons about it, and responds, with a dynamically generated interface fitted to that specific task.
The product appears to have a “Powers” system, a marketplace of pre-built agent configurations you can deploy, plus a Power Studio where you can create your own. Based on the scraped product interface, there are scheduled Powers and a browser automation component, though the web client has automation disabled. The Android stream was listed as offline in the scrape, so the mobile layer is the primary delivery surface.
The no-coding framing matters here. It is positioning this as something closer to a consumer product than a developer tool. You browse a marketplace, pick a Power, and point your camera. That is the intended UX. Whether it actually works that cleanly is a real open question. Superset is trying to orchestrate AI coding agents with a similar build-it-without-configuring-everything philosophy, and even that has rough edges.
It did well on launch day, which suggests the concept resonates with the people who pay attention to early AI releases. The wearables angle helps differentiate it visually even if the phone use case is the one people will actually try first.
The GitHub tag on the product listing also hints at a developer-adjacent audience, even with the no-code framing. That is an interesting tension worth watching.
The Verdict
I am genuinely interested in this product and also genuinely unsure it is ready for normal people.
The runtime-generated UI idea is the most technically ambitious claim here, and it is either the thing that makes this product feel like magic or the thing that makes it feel unstable and weird depending on execution. I would want to know how reliable the visual understanding is across messy real-world inputs. Not a clean screenshot. A dark photo of a confusing lease agreement, or a blurry whiteboard in a conference room.
The wearables angle is also real but early. Most people do not own the hardware this is most useful on. The phone use case has to carry the product until that changes.
At 30 days I want to see retention data, because ambient tools live or die on whether you remember they exist. At 60 days I want to know if the marketplace has third-party Powers or if the team is still populating it themselves. At 90 days I want to know whether any wearable hardware partner is in the picture.
The concept is not overhyped. Seeing what you see and acting on it is a real and unsolved problem at the consumer level. The execution is what I cannot evaluate from the outside yet. Worth keeping a close eye on.