Key Moments
FFmpeg: The Incredible Technology Behind Video on the Internet | Lex Fridman Podcast #496
Key Moments
Virtual human models are now so realistic they're almost indistinguishable from real people, but the cutting-edge capture process still costs $1M per virtual human, creating a tension between realism and prohibitive expense.
Key Insights
FFmpeg, an open-source software system, is the invisible backbone behind over 90% of online and offline video processing workflows, including platforms like YouTube, Netflix, and Chrome.
VC, a legendary open-source media player, has been downloaded over 6 billion times and is known for its ability to play virtually any media format, even broken or obscure ones, a capability rooted in its early history of handling unreliable UDP network streams.
Video compression, like that performed by codecs such as H.264 and AV1, aims for 100x to 1000x data reduction by exploiting spatial and temporal redundancy, leveraging mathematical tricks (e.g., transforming to YUV color space to divide size by two without visible degradation) and human perception models.
The x264 encoder, a VideoLAN project for the H.264 standard, became dominant partly due to its focus on 'psycho-visual rate distortion,' which optimizes for human visual perception on consumer screens rather than purely mathematical metrics like PSNR, even if it meant sacrificing some traditional benchmarks.
Projects like FFmpeg and Dav1d (an AV1 decoder) demonstrate that hand-written assembly code can achieve 10x to 50x (even 62x in some functions) speed improvements over optimized C code, proving critical for real-time video processing on billions of devices where every CPU cycle matters.
Security vulnerabilities in open-source projects, sometimes found by AI, highlight a disproportionate burden on volunteer maintainers, who often lack funding and are pressured by large corporations (like Microsoft Teams offering a 'few thousand' for critical support) to urgently fix issues in widely-used software.
Hand-written assembly code outperforms even optimized C by orders of magnitude in critical video processing
In the world of high-performance video codecs, the intuition that modern compilers can achieve optimal performance even with C++ code is consistently challenged. Expert engineers in projects like FFmpeg and Dav1d (an AV1 decoder) demonstrate that hand-written assembly can outperform highly optimized C code by orders of magnitude, with some functions showing improvements as high as 62x. This level of optimization is crucial because FFmpeg is estimated to be one of the world's biggest CPU users, running on billions of devices, where every instruction and CPU cycle significantly impacts performance and efficiency. For instance, the Dav1d AV1 decoder project comprises 30,000 lines of C but an astonishing 240,000 lines of hand-written assembly, specifically designed to achieve real-time 720p decoding on just one or two CPU cores, a feat considered impossible by many when AV1 was first launched. This intense focus on low-level optimization leverages SIMD (Single Instruction, Multiple Data) instructions and often 'abuses' CPU architecture in ways unintended by creators, sometimes even using cryptography instructions for video processing. Such efforts are becoming increasingly vital as hardware speed improvements slow down (e.g., end of Moore's Law), pushing developers to extract maximum performance from existing machines, particularly for latency-sensitive applications like video streaming and AI inference.
VLC's robust resilience to broken files emerged from its early streaming heritage
VLC media player's legendary ability to open and play almost any file, even those that are corrupt, partially downloaded, or use obscure formats, stems from its origins as part of the VideoLAN project. This project, which began in the late 1990s, involved streaming video over unreliable UDP (User Datagram Protocol) networks. In such environments, data loss was common, forcing developers to engineer VLC to be inherently distrustful of its input and capable of gracefully handling broken or incomplete streams. This philosophical approach, ingrained from the project's inception, became a core cultural value. For example, when early users were pirating content, downloaders often lacked the metadata at the end of AVI files required for playback; VLC's ability to interpret and play these incomplete files made it uniquely popular. This design principle ensures that VLC rarely 'breaks' or stops when encountering unexpected file structures or missing information, a stark contrast to many other media players.
Modern codecs achieve extreme compression by mimicking human perception
Video compression codecs like H.264 and AV1 achieve extraordinary compression ratios, often reducing raw video data by 100x to 1000x. This is accomplished by identifying and removing redundant information, both spatially (within a single frame) and temporally (across multiple frames). A crucial insight behind these codecs is that they are designed to be 'viewed by humans,' meaning their compression algorithms degrade the signal in ways that are least perceptible to the human ear and eye. For example, video is often processed in the YUV color space, separating luminance (brightness) from chrominance (color), and then downscaling the color resolution, as the human eye is less sensitive to color detail than brightness. This single trick can halve the data size without noticeable quality loss. Codecs utilize advanced mathematical techniques, such as frequency domain transforms and block-based processing, to discard imperceptible details. Each new generation of codecs offers 25-50% better compression for the same perceived quality, though this often requires exponentially more computational power for encoding, balancing the one-time cost of compression against the benefits of reduced bandwidth for billions of views.
The importance of low-level programming as a 'best school ever'
The open-source multimedia community, particularly projects like FFmpeg and VLC, serves as an unparalleled 'best school ever' for programmers. The extreme performance requirements—such as decoding a video frame within 16 milliseconds without glitches—demand an exceptional understanding of computer architecture, including CPU pipelining, SIMD, ALU operations, and memory hierarchies (registers, L1/L2/L3 caches, RAM, SSDs). Developers often engage in detailed debates over instruction cycles and their impact on different CPU generations. This low-level focus cultivates a deep programming culture, making contributors highly skilled, even in higher-level languages. Many renowned engineers and even teenagers have contributed significant assembly code, learning through a meritocratic process where only 'excellent code' is accepted, regardless of background or company affiliation. This rigorous environment, where code is scrutinized by seasoned programmers, shapes individuals like Andrew Kelly, who went on to create the Zig programming language after his tenure at FFmpeg. This intense learning environment, combined with the pride of contributing to software used by billions, forms a powerful motivational engine for contributors.
FFmpeg and VLC form a 'binary star system' powering internet video
FFmpeg and VLC are not competitors but rather a deeply intertwined 'binary star system' that coexists and thrives because of each other's success. This relationship is analogous to Android benefiting from Linux. FFmpeg provides the essential low-level libraries for codec compression, decompression, muxing, demuxing, and filtering—it's the 'core engine' for video processing. VLC, as a VideoLAN project, acts as a client that leverages FFmpeg, giving it broad exposure to a myriad of file types and formats. Conversely, VLC's vast user base and ability to play virtually anything has driven the need for FFmpeg to support an ever-expanding array of codecs, some of which were funded by VLC's early donations. Furthermore, many critical components, like the x264 encoder (the open-source implementation of H.264), are also VideoLAN projects that integrate with FFmpeg. This symbiotic relationship, with shared developers and a virtuous cycle of mutual dependence, ensures that both projects continuously evolve and remain at the forefront of multimedia technology.
The emotional toll and financial neglect of open-source maintainers
The life of an open-source project maintainer, especially for critical infrastructure like FFmpeg, is often marked by immense psychological and emotional strain, leading to burnout. While these projects are vital to global digital systems (often represented by the 'tiny box supporting the internet' meme), they rely on a small number of volunteers. Large corporations frequently treat these projects as traditional vendors, demanding urgent support for 'high-priority' issues in their commercial products (e.g., Microsoft Teams requesting support for a 'few thousand' dollars) without understanding the volunteer-driven nature. This neglect is exacerbated by 'AI slop'—AI-generated bug reports and patches that add significant burden on maintainers to sift through and fix. The XZ fiasco, where a single maintainer was manipulated into adding a backdoor, highlighted the extreme vulnerability of critical projects reliant on one or two individuals. Furthermore, maintainers sometimes face aggressive tones from security researchers, personal attacks, and even death threats (as experienced by VLC's lead developer for discontinuing support for old hardware). Despite these challenges, the open-source community emphasizes the importance of celebrating human endeavor in building 'damn good' and challenging software, driven by passion and a desire for widespread positive impact, even if financial reward is not the primary motivator.
Reverse engineering obscure codecs: An archaeological endeavor
The development of FFmpeg and VLC has involved extensive and highly skilled reverse engineering of proprietary and obscure codecs. This process, likened to archaeology, requires immense intuition and painstaking effort to reconstruct functionality from minimal 'signal'—often just binary blobs (compiled machine code) and limited video samples. For example, a 1 MB binary blob can take a month to reverse engineer, and some engineers, like Costa, tackle 20-30 MB blobs for complex codecs like GoToMeeting. This involves using disassemblers to guess 'hooks' in proprietary modules, dumping raw YUV data, and meticulously debugging memory to understand how compression, prediction, and transforms occur. The goal is to achieve 'bit exactness'—ensuring that the open-source decoder produces precisely the same output bits as the original proprietary one for any given sample. This challenging work is not just an intellectual exercise; decades later, these reverse-engineered codecs enable playback of old, otherwise inaccessible content on modern, diverse hardware platforms (e.g., playing a 1990s Windows application's video on an iPad or ARM-based system), preserving digital history for humanity. The ability to reconstruct complex functional logic from minimal binary information is considered a 'wizard-level' skill.
The future of ultra-low latency: Every millisecond counts for real-time control
The principles of extreme optimization and low-level engineering pioneered by FFmpeg and VLC are now being applied to new frontiers, such as ultra-low latency applications. Jean-Baptiste Kempf's new company, Kyber, focuses on achieving 'glass-to-glass' latencies as low as 4 milliseconds (equivalent to 240 Hz video) for real-time control of machines like robots, drones, and remote vehicles. This contrasts with traditional streaming, which prioritizes quality over latency, or even typical gaming, which allows for some slowdown. Kyber's SDK simultaneously addresses video, audio, and control inputs (mouse, keyboard, gamepad) over a single UDP-based Quick protocol connection, ensuring perfect coherence and synchronization across multiple data streams and sensors (e.g., multiple cameras, GPS). This is critical for applications like robotic tele-operation, remote surgery, or cloud gaming, where immediate feedback is essential and even a few milliseconds of delay can be catastrophic. The technology also incorporates advanced techniques like forward error correction to maintain reliability over unpredictable internet networks without incurring retransmission latency, pushing the boundaries of what's possible in real-time interaction with remote systems.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Organizations
●Concepts
●People Referenced
Common Questions
FFmpeg is an open-source software system that serves as the invisible backbone for most video and audio content on the internet, including platforms like YouTube, Netflix, and Chrome. It can decode, encode, transcode, stream, and play almost any video or audio format ever created, often through libraries like libavcodec, libavformat, and libavfilter.
Topics
Mentioned in this video
An open-source software system that is the backbone behind most video and audio on the internet, used for decoding, encoding, transcoding, streaming, and playing various formats. It is known for its low-level optimizations and extensive use of assembly.
An open-source media player known for playing almost any video or audio format across platforms, with no ads or tracking, and an iconic traffic cone logo. It has been downloaded over 6 billion times.
A web browser that integrates FFmpeg for multimedia playback capabilities.
A web browser that integrates FFmpeg to handle video playback.
A communication platform that uses FFmpeg as its invisible backbone for video and audio functionalities.
An open-source container format, considered more complex and feature-proof, developed by Steve Lhomme of the VideoLAN community.
A professional video visual effects software that FFmpeg's command-line capabilities are compared to, highlighting FFmpeg's breadth of filters.
Apple's mobile operating system, which VLC supports across many versions, including older ones like iOS 9, despite complex build processes with Xcode and SDKs.
An older web browser that, in the era of Google Video pre-YouTube, could run VLC as a plugin via ActiveX to play videos directly within the browser.
The planned next-generation software decoder for the AV2 video codec, following in the footsteps of the original David AV1 decoder.
A popular open-source software for video recording and live streaming that uses FFmpeg.
An open-source H.264 video encoder, a VideoLAN project, critical for its psycho-visual optimizations and its dominance in internet video and Blu-ray, often integrated into FFmpeg.
Microsoft's media player, which used proprietary formats that early FFmpeg and VLC provided native decoders for, eliminating the need for separate players.
An early programming environment for Pascal, used by JB for his first serious programming when he was young.
An image manipulation tool that is cited as a conceptual parallel to FFmpeg's power in video, but lacking the same breadth of complex filters for images.
A widely used open-source operating system kernel, cited as an example of a massive open-source project with thousands of contributors, similar in spirit to FFmpeg.
An older Microsoft operating system that VLC continues to support, demonstrating a commitment to broad compatibility and not forcing hardware upgrades.
An operating system that uses the Apache license and Linux kernel. VLC faced challenges updating on the Android Play Store due to a bug.
The digital distribution platform for iOS apps, whose terms of condition make it complex to host GPL applications, leading VLC to adopt LGPL and MPL licenses for its iOS and Apple TV versions.
A distributed version control system created by Linus Torvalds in two weeks, which profoundly changed the software development world.
A communication platform that posted a high-priority bug report on FFmpeg's tracker without offering adequate financial support, leading to public criticism about corporate reliance on unpaid open-source volunteers.
A programming language used by JB in middle school, where he learned to program a 'turtle' to design things, solidifying his interest in computers.
A programming language started by former FFmpeg developer Andrew Kelley, mentioned as an example of innovation stemming from the FFmpeg community's rigorous environment.
A media player from the 2000s that used proprietary Real Media formats, which FFmpeg and VLC developed native decoders for, offering a free alternative to its ad-laden experience.
A video conferencing software that historically used its own proprietary codecs, which Kostya Shishkov reverse engineered to allow playback in VLC and FFmpeg.
Kieran's company, which builds equipment for broadcasting sports matches and applies low-level optimization ethos to commercial applications, including handwritten assembly for 10-bit video.
An early programming language (Microsoft QuickBasic) used by Kieran Kuna to first learn programming in Windows 3.1 and 95.
A technology used to embed VLC as a plugin within Internet Explorer during the Google Video era, allowing in-browser playback of multimedia.
A low-level bytecode format for web browsers, used to compile VLC and FFmpeg to run within a JavaScript virtual machine, enabling media decoding directly in web browsers.
A major video platform that relies on FFmpeg for video processing and streaming, encoding popular videos in AV1 for efficiency.
A streaming service that utilizes FFmpeg for video processing, with a significant portion of its video content using AV1.
A technology company whose Windows Media format and Windows Media Player were dominant in the 2000s, but whose Teams product engaged in a PR misstep by demanding free, urgent support from open-source volunteers.
An organization known for publishing classified information, which released CIA Vault 7 documents that revealed the CIA used a modified version of VLC with a malicious plugin.
A company that once innovated significantly in sound technology but is now criticized for becoming primarily focused on lawyers and licensing, rather than continued innovation.
A company that changed its open-source license model to prevent cloud providers from commercially exploiting their open-source tools without contribution.
A major technology company that is a big supporter of open source but whose AI-driven security reports and limited funding caused tension with the FFmpeg community. They later made changes to contribute patches and offer rewards for fixes.
A technology company that reportedly decided to remove H.265 (HEVC) support from its Windows laptops due to increasing patent costs, illustrating the impact of multimedia patent issues.
A technology company that sponsored the initial network deployment at École Centrale Paris in the 1980s, which used a token ring network architecture.
A networking company that sponsored the initial network deployment at École Centrale Paris in the 1980s, which used a token ring network architecture.
JB's new open-source startup focused on ultra-low latency streaming for remote control of machines like robots and drones, aiming for 4 milliseconds glass-to-glass latency, using a dual commercial/AGPL license.
An aerospace manufacturer and space transport services company, mentioned as using VLC to monitor launches live feeds.
A common multimedia container format, often confused with codecs. It can contain various audio and video codecs, but is broadly understood to typically house H.264 video and AAC audio.
A video compression standard that succeeded H.264, offering 30% more compression for the same quality, but facing significant patent licensing costs.
A royalty-free video coding format developed by Google, offering quality compression similar to H.265 (HEVC) but without the patent licensing costs, a precursor to AV1.
The next generation video codec from the Alliance for Open Media (AOMedia), aiming for a 30% bandwidth reduction compared to AV1, with a planned software decoder called David 2.
A video compression standard, also known as MPEG-4 Part 10 or AVC. It's a widely used codec that gained maturity with the advent of HD video and is a reference for new encoders.
A video compression standard that predates H.264 (MPEG-4 Part 10), with many variants that FFmpeg developers like Michael Niedermayer worked to support.
Apple's codec designed for nonlinear video editing, prioritizing fast decoding and seeking over temporal compression, making it suitable for professional video workflows.
An older video and audio coding standard, primarily used for satellite broadcasting and DVDs, that the early VideoLAN project had to work with.
A video codec that was popular in the 2000s, receiving exhaustive support from FFmpeg during Michael Niedermayer's era.
An open, royalty-free video coding format developed by the Alliance for Open Media (AOMedia), offering significant compression benefits over H.264, now widely deployed on platforms like YouTube and Netflix.
The next-generation video compression standard after H.265, also known as Versatile Video Coding, aiming for further 30% bandwidth reduction for the same quality, but inheriting a complex patent landscape.
A video codec that was popular in the 2000s, receiving exhaustive support from FFmpeg during Michael Niedermayer's era.
Languages for which subtitles are often created by 'fan subbers' in the anime community, requiring specialized tools like Aegisub.
The creator of the Zig programming language, who was previously an FFmpeg developer, highlighting FFmpeg as a learning ground.
An active member of the VideoLAN community who started the Matroska format.
A Google Summer of Code and Google Coding student who contributed significantly to VideoLAN projects, including writing assembly for x264, VLC, and FFmpeg, starting at age 14.
CEO of Epic Games, who has praised JB for his dedication to VLC and open source, calling VLC a 'passion project'.
A Ukrainian genius who was instrumental in reverse engineering extremely complex and obscure codecs, including the GoToMeeting codecs, contributing significantly to FFmpeg for free.
An engineer at Warner Brothers who championed the use of x264 for Blu-ray releases, including a French box set, demonstrating corporate adoption of open-source solutions for quality.
A good friend of JB and Kieran, and one of the two students who started the VideoLAN project after the Network 2000 project at École Centrale Paris.
An FFmpeg developer who worked on rewriting ffmpeg.c with threading, contributing to the project's maintenance and improvement.
Co-founder of Stripe, quoted for his remark: 'the world is a museum of passion projects,' reflecting the ethos of open-source development.
Legendary programmer and engineer, mentioned as a high-level supporter who has raised awareness for FFmpeg and VideoLAN on X.
The creator of the Linux kernel and Git. Known for his blunt criticism but also for maintaining high quality standards in open-source development, his work powers many servers and devices.
A former FFmpeg developer who publicly criticized the security community's self-promotion and misaligned incentives in bug reporting.
An unpaid volunteer and FFmpeg developer responsible for massive refactorings and rewriting parts of ffmpeg.c with threading.
A high schooler and early contributor to FFmpeg who wrote thousands of lines of assembly code and fixed security issues without public drama.
One of the oldest contributors to VLC, who started working on the project at 16 and handles everything related to Mac and iOS.
A key figure in FFmpeg's 2000s era, known for exhaustive support of various MPEG-4 Part 2 variants (like DivX and Xvid) and native decoders for proprietary formats like Windows Media and Real Player.
One of the key reverse engineers in FFmpeg, who worked on proprietary codecs like Windows Media and Real Media in the 2000s.
A key contributor to x264 who made it 'amazing and fast' through extensive assembly language optimization, and was also involved in early VideoLAN projects.
A multimedia and signal processing researcher and contributor to H.264, MPEG-4 AVC, H.265, and other standards, noted for acknowledging past mistakes in video industry standards.
A 16-year-old contributor to FFmpeg who found and fixed issues, demonstrating that age is not a barrier to significant contributions.
The creator of FFmpeg, known for conceptualizing this powerful multimedia framework.
Famously known as 'DVD Jon', who broke the CSS encryption on DVDs, inspiring JB during his early days.
A very permissive open-source license that allows users to do almost anything with the code, popular for JavaScript frameworks and BSD operating systems.
A file-based copyleft license, used for VLC's iOS and Apple TV versions to navigate Apple App Store terms.
A copyleft open-source license that requires modifications to the software to be shared back with the community. VLC and FFmpeg primarily use GPL or LGPL.
A 'weak copyleft' open-source license, more permissive than GPL, allowing integration into proprietary applications as long as changes to the LGPL-licensed library itself are contributed back. Crucial for VLC's mobile adoption and commercial use.
A permissive open-source license used by projects like Android.
One of the regulatory bodies involved in the development of multimedia standards (like MPEG), which works in conjunction with the ITU but operates as a private entity.
A consortium of technology companies (including Google, Netflix, Amazon, Apple, Mozilla, VideoLAN) that developed the royalty-free AV1 and AV2 video codecs to avoid patent licensing costs associated with MPEG standards.
A United Nations agency involved in the development of multimedia standards (like H.264), working in conjunction with ISO.
An intergovernmental organization involved in space exploration, mentioned as using VLC for monitoring launches and live feeds.
A European research organization that operates the Large Hadron Collider, using VLC to monitor captors on its 27 km ring via analog cameras streamed over a multicast network.
A prestigious French engineering school (now CentraleSupélec) where the VideoLAN project, and subsequently VLC, originated due to student initiatives for network streaming.
A NASA rover operating on Mars that uses FFmpeg to compress pictures, showcasing FFmpeg's reach as a 'multi-planetary open-source library'.
A high-definition optical disc format, where x264 found significant adoption due to its ability to deliver high-quality video, often surpassing streaming services in visual fidelity.
An extremely optimized software decoder for the AV1 video format, developed by VideoLAN, consisting of 30,000 lines of C and 240,000 lines of handwritten assembly, allowing AV1 decoding on billions of devices without dedicated hardware.
More from Lex Fridman
View all 549 summaries
124 minVikings, Ragnar, Berserkers, Valhalla & the Warriors of the Viking Age | Lex Fridman Podcast #495
146 minJensen Huang: NVIDIA - The $4 Trillion Company & the AI Revolution | Lex Fridman Podcast #494
311 minJeff Kaplan: World of Warcraft, Overwatch, Blizzard, and Future of Gaming | Lex Fridman Podcast #493
154 minRick Beato: Greatest Guitarists of All Time, History & Future of Music | Lex Fridman Podcast #492
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free