Key Moments

Personalized AI Language Education — with Andrew Hsu, Speak

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read65 min video
Jul 11, 2025|1,953 views|52|4
Save to Pod
TL;DR

Speak is revolutionizing language learning with AI tutors, focusing on fluency and conversational skills.

Key Insights

1

Speak aims to create AI language tutors that are more effective than human teachers by leveraging advancements in speech and language models.

2

The company prioritizes functional fluency and spontaneous speaking over traditional vocabulary and grammar drills.

3

Speak's 'magic onboarding' uses AI to create a personalized conversational experience for new users.

4

The platform has evolved from a supplemental practice tool to a full-featured AI tutoring experience, significantly enhanced by LLMs and models like Whisper.

5

Speak is expanding beyond English and South Korea to offer a wider range of languages and enter new markets, including B2B offerings.

6

The long-term vision extends beyond language learning to reinventing how people learn anything using AI.

FOUNDING VISION AND EARLY STRUGGLES

Speak was founded in 2016 with a bold vision: to leverage the predicted superhuman capabilities of future speech and language models to create AI-powered language tutors. The co-founders, Andrew Hsu and his partner, spent a year deeply researching AI, convinced that software alone could revolutionize language acquisition. Despite this strong conviction, the initial years were challenging, marked by a lack of product-market fit and a longer-than-expected development cycle. This foundational belief in the future of AI's potential in education, however, kept the team motivated through the difficult phases.

EVOLUTION OF THE PRODUCT AND TECHNOLOGY

Initially, Speak focused on custom speech recognition models, especially before the widespread availability of models like Whisper and LLMs. The platform collected vast amounts of non-native English speaker data to fine-tune its models, focusing on a fast, low-latency ASR loop crucial for speaking practice. The integration of LLMs significantly enhanced the platform, enabling more open-ended tutoring, semantic feedback, and a more conversational onboarding experience, dubbed 'magic onboarding'.

FOCUS ON FUNCTIONAL FLUENCY AND USER EXPERIENCE

Speak distinguishes itself by prioritizing functional fluency and spontaneous speaking over rote memorization of vocabulary and grammar. The app emphasizes role-playing scenarios, mimicking real-world interactions like ordering from an Uber driver. This approach is designed to make speaking automatic and natural, akin to practice in a gym. The company also critically re-evaluated its product strategy, moving from a free, unfocused experience to a premium model and a more guided, 'on-rails' learning path to reduce user decision fatigue.

MARKET TRAJECTORY AND GLOBAL EXPANSION

Speak achieved significant success in South Korea, becoming the largest English learning app there. This initial focus on a specific market allowed the company to refine its product and gather crucial data before expanding. Beyond South Korea, Speak is actively growing in other Asian markets like Japan and Taiwan, and is launching in the US and other countries with multiple languages. The company has also seen considerable success with its B2B offerings, which started as an experiment but are now a significant growth area.

THE ROLE OF ADVANCED AI AND FUTURE AMBITIONS

The advent of models like Whisper and advancements in LLMs have been transformative for Speak, enabling a shift from simple listen-and-repeat exercises to sophisticated AI tutoring with direct feedback and explanations. The company is actively exploring further AI integration, including AI-driven content generation for scaling lesson creation across more languages and leveraging knowledge graphs to quantify fluency. The long-term vision is to apply these AI learning principles beyond languages to virtually any subject, fundamentally changing how people learn.

NAVIGATING TECHNICAL CHALLENGES AND MARKET OPPORTUNITIES

Speak faces ongoing challenges, such as controlling inference costs for real-time features and ensuring low latency, especially for multilingual interactions requiring code-switching. The company is building custom infrastructure for real-time speech and features. They also noted that while real-time translation is a related field, the user desire for language acquisition is driven by a deeper need for connection and self-improvement, rather than just translation. The B2B side extends this into enterprise learning for communication and management skills.

BUILDING TEAMS AND CULTURAL ADAPTATION

Speak's early team-building was unconventional. Initially, the company hired engineers remotely from locations like Slovenia, reflecting a global approach to talent acquisition. While this fostered a strong engineering core, Andrew Hsu acknowledges the difficulties of managing a distributed team across time zones. They also placed significant emphasis on localization and ensuring the app felt authentic to its users, leading to misconceptions that it was a local company, a testament to their attention to detail.

THE FUTURE OF LEARNING AND AI'S ROLE

The company believes AI will fundamentally reinvent learning across all domains, not just languages. They see AI as a powerful tool for personal curiosity and structured learning. The development of AI agents, curriculum writing tools, and sophisticated knowledge graphs are key to scaling their offerings. Speak's ambition is to create a 'speak score' that holistically measures fluency, enabling users to achieve real-world proficiency and meet their learning goals, extending from academic subjects to professional skills.

Common Questions

Speak's vision is to create an AI language tutor that helps users become fluent faster than humanly possible, leveraging advanced speech and language models to provide a personalized and effective learning experience.

Topics

More from Latent Space

View all 167 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free