Why is self-supervised learning more efficient than supervised or reinforcement learning?

Self-supervised learning utilizes a 'huge amount of information' from raw observation. Unlike reinforcement learning which provides a single scalar reward infrequently, or supervised learning which offers a few bits per sample, self-supervised learning allows the model to predict entire future video clips or missing text segments, providing a much richer training signal.

Why is vision in self-supervised learning more difficult to crack than language?

The main challenge with vision is representing uncertainty. In language, you can predict discrete words from a dictionary. For video, there's an infinite number of plausible continuations in a high-dimensional continuous space, making it difficult to represent all possible future outcomes effectively, unlike the discrete nature of words for language models.

Is intelligence just statistics, or does it involve deeper concepts like causality?

Yann LeCun states that intelligence is 'quite possible' to be 'just statistics of a particular kind.' He argues that models can learn notions of causality if actions are included as inputs, allowing the system to learn the consequences of interventions in the world. This doesn't preclude deep mechanistic explanations, but rather questions how they are learned.

What are the three main challenges in machine learning that Yann LeCun identifies?

The three challenges are: 1) enabling machines to learn world representations (via self-supervised learning), 2) allowing machines to reason in ways compatible with gradient-based learning, and 3) teaching machines to learn hierarchical representations of action plans, all preferably in a differentiable manner.

How do humans perform planning and reasoning using a 'model of the world'?

Humans use a 'dynamical model' of the world to predict outcomes of actions, much like model predictive control. We can 'run this in our mind' to imagine scenarios and choose favorable actions, even without highly accurate models. This ability to construct and utilize world models for planning is seen as the essence of intelligence.

What is data augmentation and how does it relate to contrastive and non-contrastive learning?

Data augmentation artificially increases a training set by distorting images without changing their nature (e.g., shifting, scaling, rotating). In contrastive learning (like siamese networks), two distorted versions of an image are shown, and the network is trained to produce similar representations for them, and different representations for dissimilar images. Non-contrastive methods like Barlow Twins and ViCReg aim to achieve similar separation without explicit negative examples.

Why is 'ground intelligence' important, and why can't AI learn purely from text?

'Ground intelligence' emphasizes learning from high-throughput sensory channels like vision because the amount of information about how the world works contained in text is tiny. Concepts like an object moving with a pushed table are intuitive to humans but not explicitly explained in text, making visual observation necessary for common sense.

What is Yann LeCun's unorthodox hypothesis about consciousness?

LeCun speculates that consciousness is not a consequence of the mind's power but a limitation of our brains. He suggests we have only one 'world model engine' in our prefrontal cortex that must be configured for the task at hand, thus requiring an 'executive overseer' (consciousness) to manage this singular focus.

Do intelligence systems need emotions?

Yann LeCun believes emotions are 'an integral part of autonomous intelligence'. If an AI is driven by intrinsic motivation and can predict positive or negative outcomes, it will naturally experience emotions like fear (for bad outcomes) and elation (for good outcomes). He dismisses the 'emotion chip' of sci-fi as ridiculous.

What is the 'Chinese Room Argument' and LeCun's take on it?

The Chinese Room Argument posits that a system merely following rules to mimic intelligent conversation isn't truly intelligent, functioning like a lookup table. LeCun considers it 'ridiculous', arguing that intelligence can be mechanized and that machines will eventually become more intelligent than humans in all domains.

What is the mission and structure of Meta AI, and how does FAIR fit in?

Meta AI is a larger organization encompassing various AI efforts within Meta. FAIR (Facebook AI Research, potentially Meta AI Research Fundamental) is its fundamental research arm, focused on scientist-driven, open-source research. Other Meta AI groups focus on applied research for products. FAIR aims to produce top-level research and tools like PyTorch, indirectly generating massive value for the company.

How does Yann LeCun respond to criticism that social media like Facebook is a net negative for society?

LeCun argues that media portrayals often misrepresent Meta. He cites academic studies (not Meta-funded) that found no causal link between social media use and increased political polarization in the US, or with negative self-perception among teenagers. He suggests complex societal issues are often unfairly blamed on an easy scapegoat.

What advice does Yann LeCun give to young aspiring AI researchers?

He advises focusing on 'big questions' about intelligence, the universe, and life, and learning foundational subjects like basic math, physics (quantum mechanics, classical mechanics, statistical physics), and engineering (signal processing). He stresses that these fundamental ideas have a long shelf life and can be applied in diverse, innovative ways.

How can AI be applied to solve grand challenges like climate change?

AI can design new materials for efficient hydrogen-oxygen separation from water using solar panels, effectively solving climate change. It could also help stabilize superhot plasma for practical nuclear fusion reactors, or design new materials for more efficient batteries and lighter vehicles, as well as accelerate drug discovery by modeling protein interactions.

Key Moments

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258

Lex Fridman

Science & Technology8 min read166 min video

Jan 22, 2022|572,590 views|7,396|604

agi ai ai podcast artificial intelligence artificial intelligence podcast facebook lex ai lex fridman lex jre lex mit lex podcast machine learning

Save to Pod

Key Moments

On this page

TL;DR

Yann LeCun discusses self-supervised learning, the dark matter of intelligence, and its potential to unlock human-level AI.

Key Insights

Self-supervised learning (SSL) is crucial for acquiring common sense and world models, which current AI paradigms (supervised and reinforcement learning) lack efficiently.

SSL's core idea is for AI to fill in missing information or predict the future, harnessing abundant raw data from observation rather than scarce human labels or scalar rewards.

The main challenge for SSL in vision and video is representing uncertainty and multiple plausible continuous outcomes, unlike the discrete nature of language predictions.

Intelligence, at its root, might be advanced statistics capable of learning causal, mechanistic models from data without human-designed knowledge bases.

Human intelligence, including high-level reasoning and planning, is built upon learned world models, akin to what animals like cats possess, challenging the idea of purely hardwired cognition.

Emotions are an integral part of autonomous intelligence, emerging from intrinsic motivations and a 'critic' system that predicts good or bad outcomes, rather than being an add-on.

Consciousness may be a mechanism for configuring our single, adaptable world model to focus on one task, suggesting it's a limitation rather than just a power of the brain.

AI's future impact extends to scientific discovery and solving global challenges like climate change and new material design by converting complex problems into learnable ones.

THE DARK MATTER OF INTELLIGENCE: SELF-SUPERVISED LEARNING

Yann LeCun introduces self-supervised learning as the 'dark matter of intelligence,' a fundamental type of learning crucial for humans and animals that AI currently struggles to replicate. Unlike supervised learning, which demands extensive human annotation, or reinforcement learning, which requires millions of trials, SSL aims to learn about the world through mere observation. This method is vital for acquiring background knowledge and common sense, enabling efficient learning of tasks like driving a car, which humans master in hours but self-driving cars still find profoundly challenging, even with vast simulated experience. The core missing piece in AI is the ability to build predictive world models by simply observing how the world works.

THE CAKE ANALOGY AND THE SIGNAL OF TRUTH

LeCun uses a 'cake analogy' to illustrate the information density in different learning paradigms. Reinforcement learning provides a sparse, single scalar reward (good/bad) only occasionally. Supervised learning offers a few bits of information per sample (e.g., classifying an image into one of 1,000 categories). In contrast, self-supervised learning potentially offers an immense amount of signal. By asking a machine to predict the next few frames of a video or fill in missing words in a text, and then showing it what actually happened, the system receives continuous, high-dimensional feedback, allowing it to learn more complex representations and world dynamics.

FILLING THE GAPS: THE BEST SHOT FOR INTELLIGENCE

The seemingly simple task of 'filling in the blanks' (predicting future video frames, missing words in text, unseen parts of a scene) is, according to LeCun, AI's best current shot at achieving human-level intelligence. This principle allows a system to build a model of what is possible and impossible in the world, constantly surprising itself and refining its internal model. While highly successful in natural language processing (e.g., Transformers pre-trained to mask words), it remains a significant challenge for vision and video, particularly in handling the continuous and uncertain nature of visual predictions.

THE CHALLENGE OF UNCERTAINTY IN VISION VS. LANGUAGE

The difficulty in applying self-supervised learning to vision, compared to language, stems from the nature of prediction. In language, given a partial sentence, the missing words can be represented as a probability distribution over a discrete set of known words. However, predicting future video frames or filling in missing visual information requires representing a vast, continuous, and potentially infinite number of plausible outcomes in a high-dimensional space. Current methods struggle with this, as they cannot simply list all possibilities. This challenge highlights the need for new ways to represent uncertainty and multiple outcomes in continuous domains.

INTELLIGENCE AS ADVANCED STATISTICS AND CAUSALITY

Addressing the criticism that 'filling in the blanks' is merely statistics and not true intelligence, LeCun posits that intelligence fundamentally is statistics—albeit a very particular kind. He argues that a truly intelligent system's world model must incorporate causality. By allowing the system's actions to be inputs to its world model, or by observing other agents' actions and their effects, machines can learn causal relationships. This learning of mechanistic models, whether through individual experience or evolution, is the key to understanding 'what causes what,' moving beyond mere correlation to a deeper understanding of reality.

BEYOND HIGH-LEVEL COGNITION: THE CAT BRAIN CHALLENGE

LeCun emphasizes the importance of first replicating basic animal intelligence before tackling complex human cognition. He points out that cats, with their 800 million neurons, possess fantastic models of intuitive physics, causal understanding, and body dynamics, yet we are far from reproducing this level of common sense. He suggests focusing on this 'cat level' intelligence, as the ability to learn world models is foundational to more sophisticated reasoning and planning. This approach suggests that a significant portion of what we consider intelligence is learned through observation and interaction, rather than being hardwired.

THREE PILLARS OF MACHINE LEARNING'S FUTURE

LeCun outlines three main challenges for machine learning: first, getting machines to learn effective world representations (addressed by self-supervised learning); second, enabling machines to reason in a gradient-compatible manner; and third, developing methods for machines to spontaneously learn hierarchical representations of action plans. The latter two build upon effectively learned world models, akin to how model predictive control uses a learned system model to plan optimal actions. This framework suggests that a differentiable, gradient-based approach to planning and reasoning, which allows for mental simulation of outcomes, is crucial for future AI.

THE POWER OF LEARNING VS. HARDWIRING

LeCun strongly believes that a vast amount of what humans and animals know is learned, not hardwired. He argues that many seemingly basic facts about the world, such as gravity or object permanence, are simple enough to be learned rapidly through experience. He supports this with examples like the rapid learning of edge detectors in the visual cortex. While intrinsic drives (like hunger or the desire to walk) are likely hardwired, the specific 'how-to' knowledge for fulfilling those drives is acquired through learning, emphasizing the profound plasticity and learning capabilities of biological brains.

NON-CONTRASTIVE JOINT EMBEDDING METHODS: A BREAKTHROUGH

LeCun expresses immense excitement for non-contrastive joint embedding methods like Barlow Twins and VicReg, which he considers the most significant advancement in machine learning in 15 years. These self-supervised techniques train two identical neural networks with shared weights, fed with distorted views of the same input. Unlike contrastive methods (which require negative samples to push apart dissimilar representations), non-contrastive methods avoid 'representational collapse' by using techniques that maximize the mutual information between the outputs, effectively learning representations that are invariant to relevant distortions (e.g., shifts, rotations, color changes) while preserving essential information. This approach is a promising path for building robust predictive world models.

GROUNDED INTELLIGENCE: THE LIMITATIONS OF TEXT-ONLY LEARNING

LeCun advocates for 'grounded intelligence,' asserting that machines cannot achieve true intelligence purely from text. He argues that the amount of information about how the physical world works, including intuitive physics, is vastly underrepresented in textual data. Training a machine solely on text, even with advanced models like GPT-5000, would not impart common sense knowledge like an object moving with a pushed table. He believes direct interaction with and observation of the physical world is indispensable for building comprehensive world models and acquiring foundational common sense.

CONSCIOUSNESS AS A LIMITATION, NOT JUST A POWER

LeCun offers a speculative hypothesis on consciousness: it might be an executive module that configures our single world-model engine in the prefrontal cortex to suit the task at hand. This suggests that consciousness arises not just from the power of our minds, but also from a fundamental limitation: our brains can only fully attend to and process one complex task or situation at a time using this configurable world model. If we had multiple, independent world models, we could multitask consciously, potentially eliminating the need for such an executive 'conscious' controller. Routine, automated tasks, like a grandmaster's chess moves, become subconscious, freeing the conscious model for novel challenges.

EMOTIONS AS INTEGRAL TO AUTONOMOUS AI

Contrary to the sci-fi trope of emotion chips, LeCun believes that emotions are an integral and necessary part of autonomous intelligence. If an AI system has intrinsic motivations (like built-in 'drives' in biology) and a 'critic' module that predicts future outcomes (good or bad) based on its actions, it will inevitably develop emotions. Fear would arise from predicting bad outcomes, elation from good ones, and social emotions from drives to relate with humans. He argues that emotions are not optional add-ons but rather natural emergent properties of an intelligent, goal-driven learning system. This has profound implications for how we might eventually interact with and grant rights to advanced AI.

THE METAVERSE AND THE EVOLUTION OF META AI

LeCun discusses the Metaverse as the next evolution of the internet, aiming to create more compelling, immersive experiences by leveraging 3D environments that better align with human perception and social conventions. He highlights the ongoing success of Facebook AI (Fair) – now Meta AI – in both fundamental research (producing open-source tools like PyTorch) and direct impact on the company's products. He notes his shift from managing Fair to a Chief AI Scientist role, focusing on long-term strategy and his own research, particularly in self-supervised learning. Fair continues as a key component of Meta AI, with specialized labs for fundamental (Fair Labs) and applied (Fair Excel) research.

SCIENCE ACCELERATED: AI FOR GRAND CHALLENGES

LeCun is optimistic about AI's potential to accelerate scientific discovery and solve humanity's grand challenges. He envisions deep learning applications in designing new materials (e.g., for efficient hydrogen production, solving climate change), optimizing fusion reactor stability, and pharmaceutical drug discovery (e.g., protein folding). He cites examples like using convolutional neural networks to predict aerodynamic properties, enabling the optimization of wing shapes. By converting complex scientific problems into learnable ones, AI can uncover phenomena not easily understood from first principles, pushing the boundaries of human knowledge and technological advancement.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Books

●Concepts

●People Referenced

Common Questions

Self-supervised learning is an AI paradigm where a system learns by observing the world and filling in missing information (like predicting the future or past frames of a video). It's called 'dark matter' because it represents a vast, unexplored component of intelligence that humans and animals use naturally, but machines currently struggle to replicate efficiently, unlike supervised or reinforcement learning.

Topics

Ai-Ethics Mindset & Self-Improvement AI & Machine Learning Science & Mathematics Neural Networks Deep Learning Predictive Models Scientific Research Self-supervised Learning Machine Learning Applications

Mentioned in this video

People

Yann LeCun

Chief AI Scientist at Meta, formerly Facebook, professor at NYU, and a Turing Award winner. A seminal figure in machine learning and AI.

Sheldon Solomon

A proponent of Terror Management Theory, whose work aligns with Ernest Becker's ideas regarding the human fear of death.

Mark Zuckerberg

CEO of Meta, who was heavily focused on AI during FAIR's creation and is described as having a deep interest in science and technology.

Jane Bromley

A colleague of Yann LeCun at Bell Labs with whom he originally proposed the idea of contrastive learning.

Charles Darwin

Mentioned as an analogy for how the Tesla Autopilot team is systematically studying the problem of driving.

David Chalmers

A philosopher whose work on consciousness is respected by Yann LeCun. A colleague at NYU.

Ernest Becker

A philosopher who wrote 'The Denial of Death', and whose ideas about the human fear of death being a core motivation are discussed.

Mike Schroepfer

Former CTO of Facebook (now Meta), mentioned as being deeply interested in AI and having a sense of wonder about science and technology.

Giorgio Parisi

Nobel Prize winner for the replica method, demonstrating the relevance of statistical physics to machine learning.

Isaac Asimov

A science fiction writer, quoted at the end of the podcast with words about assumptions and open-mindedness.

Andrej Karpathy

Gave a talk at MIT discussing car doors and the shortcomings of ImageNet as a single benchmark.

Heinz von Foerster

An German physicist who immigrated to the U.S. and worked on self-organizing systems in the 50s and 60s, creating the Biological Computer Laboratory.

Andrew McCallum

Started OpenReview, a platform that aligns with LeCun's vision for a more open and diverse peer review system.

John Platt

Leading a research group at Google working on using deep learning to control plasma for practical fusion reactors.

Pascal Fua

A professor at EPFL who started a company training convolutional nets to predict aerodynamic properties of solids.

Ishan Misra

Co-author with Yann LeCun of the article 'Self-Supervised Learning: The Dark Matter of Intelligence'.

Sue Becker

A student of Jeff Hinton's in the early 90s, with whom he proposed the idea of maximizing mutual information between system outputs for non-contrastive learning.

Elon Musk

Mentioned in the context of multiplanetary colonization and his claims about AI timelines, which Yann LeCun believes are too optimistic.

Organizations

Aalto University

A university in Finland where Stefanoni, a former postdoc of Yann LeCun, is now a junior professor.

NYU

New York University, where Yann LeCun is a professor, and where David Chalmers is also a colleague.

MIT

Where Andrej Karpathy gave a talk on car doors and ImageNet, also mentioned for experiments about the brain's plasticity.

Supreme Court

Hypothetically referenced as a body that might one day deliberate on the rights of intelligent robots.

FAIR

Meta AI's fundamental research lab, where new ideas like Barlow Twins and ViCReg are developed.

ICLR

A prominent machine learning conference, which Yann LeCun helped create with Yoshi Bengio, and where his paper was rejected.

Open Catalyst Project

A collaborative open project at Meta/FAIR aiming to use deep learning to design new chemical compounds for efficient hydrogen-oxygen separation.

EPFL

A university in Switzerland where Pascal Fua, who founded a company using deep learning for aerodynamic modeling, is a professor.

Biological Computer Laboratory

Created by Heinz von Foerster at Urbana-Champaign in the 1960s, focused on neural nets and self-organizing systems.

Companies

DeepMind

An AI research laboratory, referenced for their work on BYOL and their evolving views on the timeline for achieving advanced AI.

Instagram

A social media platform mentioned in the context of training image recognition systems using user-generated hashtags.

YouTube

Mentioned as a source of observational data from which an AI could potentially learn to understand the world.

Google

Mentioned as having a group working on using deep learning for fusion energy, led by John Platt.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free