The Macro: The Language Learning Industry Has a Speaking Problem
I have a confession. I have been using Duolingo on and off for three years and I still cannot hold a conversation in Spanish. I can conjugate verbs. I can translate sentences about owls and apples. But the moment I am standing in front of a real person who speaks Spanish, my brain locks up and I mumble something about the library being closed on Sundays.
This is not a personal failing. This is a product design problem.
The language learning market is worth over $60 billion and growing. Duolingo has 100 million monthly active users. Babbel has millions of paying subscribers. Rosetta Stone is still somehow alive. And yet the dirty secret of the entire industry is that almost none of these products are optimized for the thing that actually matters: speaking.
The research on this is clear and has been for decades. Conversational practice is the single most effective way to achieve fluency. Comprehensible input, output practice, real-time error correction, all of it happens in conversation. Everything else, the flashcards, the matching games, the fill-in-the-blank exercises, is supplementary at best and a procrastination mechanism at worst.
The reason nobody built a conversation-first language learning product until recently is that the technology did not exist. You needed a real human tutor for real conversation. Companies like iTalki and Preply built marketplaces for human tutors, and they work, but they are expensive and require scheduling. Thirty dollars an hour, three times a week, is nearly $400 a month. Most learners cannot sustain that.
What changed is obvious. Large language models can now hold fluent, contextual conversations in dozens of languages with near-zero latency. The cost per conversation is pennies. The tutor is available at 3 AM. It never gets frustrated when you conjugate the same verb wrong for the fifteenth time.
Pingo AI is not the only company that noticed this. SpeakPal, Langotalk, and Talkpal are all circling the same insight. But the execution details matter enormously in language learning because the difference between a product that feels like practice and a product that feels like talking to a chatbot is the difference between retention and churn.
The Micro: Two Founders Who Want You to Stop Reading and Start Talking
Pingo AI is a conversation-first language learning app built around an AI speaking tutor. You open the app, pick a language, and start talking. The AI responds naturally, corrects your mistakes in context, and adapts to your level. No flashcards. No streaks. No cartoon owl guilt-tripping you for missing a day.
Michael Xing is CEO and Morrie Schonfeld is COO. They are part of Y Combinator’s Summer 2025 batch, and the founding story is straightforward. They wanted to learn languages, found existing tools inadequate for building actual speaking ability, and decided to build the product they wished existed. That is a well-worn founder origin story, but it works here because the gap they identified is real and well-documented.
The core product decision I find interesting is the commitment to conversation as the primary interface. Most AI language tools bolt a chatbot onto an existing curriculum. You still do the lessons, and the chatbot is a bonus feature. Pingo inverts that. The conversation is the lesson. Everything else is secondary.
This is a meaningful design choice because it changes what you are optimizing for. A curriculum-first product optimizes for lesson completion. A conversation-first product optimizes for time spent speaking. Those two metrics pull the product in very different directions. Curriculum products tend to be addictive but ineffective. Conversation products tend to be effective but harder to retain users on because conversation is harder than matching pictures to words.
The voice component matters. Text-based language learning chatbots already exist and they miss the point. The whole reason speaking is hard is that it requires real-time audio processing, pronunciation, intonation, and the panic of not having time to look something up. A text chat lets you cheat. A voice conversation does not.
I could not find detailed pricing information on the site, which is built on Framer and looks clean. The product appears to be in the early stages of consumer adoption, which makes sense for a Summer 2025 YC company. The question is not whether the product concept works. The question is whether they can build enough depth into the AI tutor to handle the range of proficiency levels, from absolute beginner to advanced conversationalist, without the experience feeling generic.
The Verdict
I think Pingo AI is pointed at the right problem and building in the right direction. The language learning market is massive and undertapped on the dimension that matters most. Speaking practice has always been the bottleneck, and AI removes the two biggest barriers: cost and availability.
The competitive landscape is the thing to watch. Duolingo is not stupid. They are already building AI conversation features. When the incumbent with 100 million users starts moving toward your core value proposition, you need to be significantly better at the thing that matters, not just marginally better. The advantage Pingo has is focus. Duolingo has to protect its gamification empire while bolting on conversation. Pingo can build everything around conversation from day one.
In thirty days, I want to know how long average sessions are. Conversation apps live and die on session duration. If users are talking for fifteen minutes per session, the product is working. If sessions are three minutes long, the AI tutor is not engaging enough to sustain real practice.
In sixty days, the question is retention. Language learning has brutal churn numbers across the industry. Duolingo retains about 8% of users after a year. If Pingo can beat that, even modestly, the conversation-first approach is validated.
In ninety days, I want to see the product handle advanced learners. Building an AI tutor that helps beginners is relatively straightforward. Building one that challenges and corrects intermediate and advanced speakers is a much harder problem, and that is where the long-term value lives. The beginners who stay become advanced learners, and if the product cannot grow with them, they leave.