Key Moments
Dan Kokotov: Speech Recognition with AI and Humans | Lex Fridman Podcast #151
Key Moments
Rev.ai's Dan Kokotov discusses speech recognition, human transcription, and the future of AI in enhancing communication.
Key Insights
Rev.com offers human and AI-powered transcription and captioning services, simplifying content accessibility.
AI in speech recognition (ASR) is rapidly improving but still has a gap to close in accuracy compared to human performance.
Rev.ai is a two-sided marketplace balancing customer needs with freelance worker opportunities and flexibility.
The company's focus on data quality and continuous learning from edits drives advancements in their ASR technology.
Automating processes and creating frictionless user experiences are key to Rev's product philosophy.
The discussion touches on the broader implications of AI, the future of work, content discovery, and the ethics of moderation on online platforms.
THE FOUNDATION OF REV: SOLVING A PAIN POINT
Dan Kokotov introduces Rev.com, a service born from a desire to simplify the often-painful process of obtaining transcriptions and captions. Having personally experienced the frustrations of using platforms like Upwork for these tasks, the founder envisioned a more streamlined, user-friendly experience. Rev aimed to remove the complexity of hiring freelancers by standardizing services like translation and audio transcription, allowing users to submit files and receive high-quality results with minimal hassle, thus creating a service that 'just worked'.
THE DUAL NATURE OF REV: HUMAN AND AI POWER
Rev operates on a two-sided marketplace model, connecting customers with a pool of freelance 'Revers.' The service initially relied solely on human transcribers, but has since integrated advanced AI, specifically Automated Speech Recognition (ASR). Rev.ai represents their AI-driven solutions, offering a tiered approach from fully automated captions to AI-assisted human transcription, catering to different needs and budgets. This hybrid model leverages the efficiency of AI while maintaining the accuracy and nuance often required, which humans provide.
THE SCIENCE OF AUTOMATED SPEECH RECOGNITION (ASR)
Kokotov delves into the technical aspects of ASR, explaining it as the process of converting spoken language into text. He highlights Rev.ai's position as a leader in this field, particularly for unstructured speech found in podcasts and interviews, differentiating it from voice assistants like Siri. While Rev.ai boasts impressive accuracy with a 14% word error rate on their test suites, they acknowledge a significant gap still exists compared to ideal human performance, which is estimated to be around 2-3%. This ongoing pursuit of perfection is fueled by vast amounts of data.
DATA AS THE CORNERSTONE OF AI ADVANCEMENT
The key to improving ASR accuracy, Kokotov emphasizes, lies in the quality and quantity of data used for training. Rev.ai benefits immensely from its business model: by providing transcription services, they generate a continuous influx of high-quality, labeled data, including the audio, the final human-corrected transcripts, and detailed edit logs. This 'flywheel' effect allows the AI to learn from every correction and refinement made by human transcribers, driving rapid and effective improvements in the ASR engine over time.
BROADER IMPLICATIONS AND THE FUTURE OF COMMUNICATION
Beyond transcription, the conversation explores the potential of accurate ASR to revolutionize how we interact with information. The ability to search, index, and access spoken content as easily as written notes could transform meetings, lectures, and media consumption. Kokotov envisions a future where all audio and video content is as accessible as text, impacting everything from personal knowledge management to how podcasts are discovered and utilized, moving beyond simple keyword searches to truly understanding and navigating spoken content.
ETHICAL CONSIDERATIONS AND BALANCING ACTS
The discussion pivots to the complex issues surrounding content moderation, free speech, and the role online platforms play in shaping discourse. Kokotov expresses concerns about over-censorship while also acknowledging the responsibility platforms have to foster healthier conversations. He contrasts the engagement-driven metrics of many platforms with a potential future where user well-being and long-term happiness are prioritized, suggesting that a focus on 'niceness' through positive reinforcement might be more effective than punitive censorship. This leads to reflections on leadership exemplified by figures like Steve Jobs and Elon Musk, who prioritize vision and detail.
THE HUMAN ELEMENT: CREATION AND CONNECTION
Kokotov reflects on his transition from a programmer to a manager, highlighting the unique satisfaction derived from creating something from nothing. He discusses the challenges of 'programming humans' and adapting management styles to individual motivations. The conversation concludes with a reflection on the meaning of life, finding it in contribution to humanity, creation, and the intrinsic magic of bringing ideas into existence. The power of audio in fostering one-way connections, akin to a modern-day tribe, is celebrated as a unique aspect of podcasting.
Mentioned in This Episode
●Supplements
●Software & Apps
●Companies
●Organizations
●Books
●People Referenced
Common Questions
Rev.ai is a brand of Rev.com that focuses on its Automatic Speech Recognition (ASR) services. It uses advanced machine learning models to convert speech to text, aiming for high accuracy.
Topics
Mentioned in this video
An app that summarizes books, mentioned as a sponsor.
A company proposing an extension to RSS format to standardize podcast transcripts.
Mentioned in relation to Mechanical Turk's interface and the AWS platform, with criticism for the former's user experience.
A company that provides speech-to-text AI services, offering transcription and captioning.
A freelancing platform that Rev was founded to improve upon. Initially focused on programmers.
A YouTube channel whose creator the host has spoken with, implying they also experience limited platform support.
The main brand for Rev, focusing on human and AI-powered transcription and captioning services.
An all-in-one nutrition drink mentioned as a sponsor.
A dystopian film praised for its portrayal of authoritarian incompetence and bureaucratic dysfunction.
Discussed in the context of ASR technology, its YouTube captions, and API documentation quality.
A music streaming service that secured an exclusive deal with Joe Rogan, a move the host discusses with mixed feelings about its impact on podcasting freedom.
A cloud storage service that Rev can integrate with for automated file handling.
A video-sharing platform whose handling of creator feedback and algorithms for content moderation is discussed critically.
Mentioned as a competitor in the ASR space.
The current president of Russia, whom the host expresses a desire to interview.
A character in the TV show Hannibal known for extreme empathy, used as an analogy for understanding motivations.
Author of 'Brave New World', praised for its prescience regarding genetic sorting and societal stratification.
A podcast host whose conversation with Jack Dorsey about banning Donald Trump from Twitter is referenced.
Co-founder of Apple, known for his attention to detail and leadership style, which influenced product development and team management.
Co-founder of Twitter, whose views on content moderation and platform responsibility were discussed.
CEO of Tesla, cited as an example of a leader obsessed with details and inspiring a grand vision.
The author of the Dune series.
Former US President, whose presence on Twitter was a point of discussion regarding platform moderation policies.
Philosopher quoted at the end of the podcast: 'The limits of my language means the limits of my world.'
Mentioned as potentially playing Churchill in a film, and nominated for an Oscar.
Host of a popular podcast, with whom the speaker has a long-standing admiration and has appeared as a guest.
A historian of Stalin who the host has spoken with, known for deep knowledge of the subject.
A highly regarded engineer who previously advised the host on widening the podcast's interview range.
An American poet who was awarded the Nobel Prize for Literature, whose work was translated into Russian by the guest's father.
A podcast mentioned as a sponsor.
A movie that captures the absurdity of the situation surrounding Stalin's death, which the guest found to be too close to reality.
A film about Hitler's last days, recommended by the guest as a good portrayal of historical figures.
The greatest sci-fi novel of all time according to Dan Kokotov, who is a huge fan of the series.
A movie cited as an example of how films often depict highly competent government, contrasting with real-world bureaucratic incompetence.
A TV show referencing a character with extreme empathy, used as an analogy for understanding motivations.
Software for audio editing and cleanup, praised for its ease of use and effectiveness.
A voice assistant whose ASR technology is contrasted with Rev's capabilities for unstructured speech.
Software for video editing, cited as an example of a product that makes life easier.
A financial services app mentioned as a sponsor.
A programming language mentioned by the host in the context of automating workflows.
Amazon Web Services, used as an analogy for Rev's platform approach in enabling others to build applications.
More from Lex Fridman
View all 505 summaries
154 minRick Beato: Greatest Guitarists of All Time, History & Future of Music | Lex Fridman Podcast #492
23 minKhabib vs Lex: Training with Khabib | FULL EXCLUSIVE FOOTAGE
196 minOpenClaw: The Viral AI Agent that Broke the Internet - Peter Steinberger | Lex Fridman Podcast #491
266 minState of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free