Key Moments
MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)
Key Moments
AI needs to move beyond pattern recognition to model the world like humans, requiring a science-engineering approach.
Key Insights
Current AI excels at pattern recognition but lacks common sense and flexible, general-purpose intelligence.
Human intelligence involves modeling the world, enabling prediction, imagination, planning, and learning beyond pattern recognition.
The Center for Brains, Minds, and Machines (CBMM) aims to bridge cognitive science and AI engineering by reverse-engineering human intelligence.
Deep learning successes are rooted in decades of research in cognitive science and psychology, highlighting the value of a science-driven approach.
Visual intelligence requires more than just image recognition; it involves understanding space, objects, physics, and other minds.
Probabilistic programs and game engines offer promising computational frameworks for building AI that models the world and performs common-sense reasoning.
THE LIMITATIONS OF CURRENT AI TECHNOLOGIES
While AI has made significant strides, particularly in pattern recognition through deep learning, current systems lack true general intelligence. These systems, often excelling at single tasks like playing Go, fail to possess common sense or the flexible, adaptable intelligence that humans use for everyday learning and problem-solving. They require massive datasets and extensive engineering, highlighting a fundamental gap between specialized AI and human-level cognition.
HUMAN INTELLIGENCE: MODELING THE WORLD
Human intelligence is characterized by the ability to model the world, not just recognize patterns. This includes understanding cause and effect, imagining future scenarios, planning actions, and continuously learning by building and refining mental models. These modeling capabilities go beyond perception, encompassing aspects like consciousness, meaning, and the ability to acquire knowledge accumulated over generations, forming the core of human cognitive prowess.
THE CBMM VISION: SCIENCE AND ENGINEERING
The Center for Brains, Minds, and Machines (CBMM) advocates for a joint pursuit of understanding human intelligence through scientific research and engineering advanced AI systems. By reverse-engineering how intelligence arises in the human mind and brain, the center aims to develop computational theories and engineering applications. This approach treats cognitive science and neuroscience from an engineering perspective, producing insights directly translatable into building intelligent machines.
HISTORICAL ROOTS OF AI PROGRESS
The foundations of current AI techniques, like deep learning and reinforcement learning, have deep historical roots in psychology and cognitive science. Foundational papers on algorithms such as backpropagation and temporal difference learning were often published in psychology journals, demonstrating a long-standing tradition of scientists thinking like engineers. This history underscores the critical role of basic scientific research in driving AI innovation.
VISUAL INTELLIGENCE AND THE BRAIN'S 'OS'
Visual intelligence, a key area of focus, extends beyond mere image recognition. Our perception of the world is a rich representation constructed from foveated glimpses, requiring the brain to integrate information, understand space, objects, physics, and other agents' minds. This process is conceptualized as a 'brain OS' that stitches together bottom-up sensory input with prior knowledge, forming core cognitive representations necessary for higher-level cognition.
LIMITATIONS OF DATASET-DRIVEN APPROACHES
Current AI successes, like image captioning, often suffer from dataset overfitting, performing well on specific training data but failing on novel, real-world scenarios. This is exemplified by captioning bots making errors or providing superficial descriptions. This suggests that simply scaling up pattern recognition on larger datasets is insufficient for achieving true understanding or general intelligence, as demonstrated by Andrej Karpathy's critique.
ROBOTICS AND THE CHILDHOOD OF INTELLIGENCE
The challenges in robotics, particularly in natural object manipulation and control, highlight the gap between current AI and human capabilities. While robots can be precisely controlled at a low level, they lack the intuitive understanding of physics and planning that even young children possess. Studies of infants stacking cups or interacting with objects reveal sophisticated early forms of symbolic cognition and goal-directed planning that are difficult to replicate computationally.
CREATIVITY, OBJECT PERMANENCE, ANDSYMBOLIC REASONING
Human intelligence involves creativity, such as a baby stacking cups on a cat, and robust object permanence, enabling a child to represent unseen objects. These abilities, along with the capacity for symbolic reasoning, are crucial for planning and interaction. Understanding these phenomena, like a child's ability to infer the location of an object behind them without looking, requires going beyond simple perception to model the mind and world.
THE ROLE OF PROBABILISTIC PROGRAMS AND GAME ENGINES
Probabilistic programs and game-like physics engines are proposed as key technologies for building AI that can model the world. These frameworks combine symbolic representation, probabilistic inference, and learning, allowing for common-sense reasoning about physical interactions and agent intentions. By simulating plausible physical scenarios and inferring goals, these tools offer a path towards creating systems that understand and interact with the world more like humans.
INTUITIVE PHYSICS AND PSYCHOLOGY ENGINES
Research is developing 'intuitive physics engines' that simulate physical scenes and predict outcomes, showing correlations with human judgments. Similarly, 'intuitive psychology engines' model agents' goals and intentions. These models, tested with infants, suggest that a foundational understanding of physics and social interaction is present very early in development, providing insights for building more sophisticated AI.
INVERSE PLANNING AND UNDERSTANDING INTENTIONS
Understanding human actions, such as inferring the object someone is reaching for, relies on inverse planning – working backward from observed actions to probable goals. This requires models that integrate physics engines with utility calculations to predict intentions. Such systems can analyze complex social interactions, like helping or hindering, by recursively modeling agents' expectations about each other's utilities.
LEARNING AS PROGRAMMING THE MIND'S ENGINE
True learning, especially for achieving Artificial General Intelligence (AGI), involves more than optimizing differentiable functions; it's about program learning – creating and modifying programs that represent our understanding of the world. This 'hacking' process, analogous to a child making their internal code more awesome, includes writing new code, refactoring, and transferring knowledge, which is fundamentally different from current gradient-based optimization.
ONE-SHOT LEARNING AND CONCEPT FORMATION
A key aspect of human learning is the ability to learn from very few examples, or 'one-shot learning'. Research in this area, such as building systems that can learn to draw characters from a few samples, involves developing simple probabilistic programs that capture the generative process. This demonstrates a step towards machines that can form concepts and generalize like humans, moving beyond rote pattern recognition.
THE FUTURE: GROWING INTELLIGENCE LIKE HUMANS
The ultimate goal is to build AI systems that grow and learn intelligence organically, much like a human child. This involves understanding the fundamental building blocks of cognition, from early learning about physics and social interaction to the development of complex reasoning abilities. By combining insights from cognitive science with advanced computational tools, there is a path towards creating AGI that can navigate and interact with the human world effectively.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
●Books
●Studies Cited
●Concepts
●People Referenced
Common Questions
Current AI technologies excel at specific tasks like playing Go, but they lack common sense and flexible, general-purpose intelligence. True AGI, unlike specialized AI, would be able to model the world, imagine novel situations, set goals, make plans, and learn efficiently from sparse data, akin to human learning.
Topics
Mentioned in this video
A type of directed graphical model that represents probabilistic relationships among a set of variables, generalized by probabilistic programs.
A famous dataset of handwritten digits (0-9) used extensively in deep learning and pattern recognition research.
The understanding that objects continue to exist even when they cannot be seen, heard, or touched, demonstrated by a baby in a video and explained as a strong form of cognitive ability.
A test proposed by Alan Turing to assess a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.
An aspect of visual intelligence where AI systems generate textual descriptions for images, highlighted as an area of apparent success for pattern recognition-based AI but also revealing its limitations.
A type of machine learning that focuses on how intelligent agents should take actions in an environment to maximize cumulative reward.
A concept suggesting that if knowledge is embodied as a program, then learning involves programs that build or modify other programs, analogous to children hacking on their mental code.
A popular metaphor in cognitive development that emphasizes children as active theory builders and their play as a form of casual experimentation.
An algorithm used in deep learning for training neural networks, with origins tracing back to psychological journals.
A model that captures how young children understand other people's actions and goals, using probabilistic programs defined over simple planning and perception programs.
A type of artificial neuron and the earliest feedforward neural network, proposed by Rosenblatt.
A type of stochastic recurrent neural network proposed by Hinton and colleagues.
The idea that infants understand a basic calculus of cost and benefit, where children take costly actions to achieve rewarding goal states, with costs measured in physical terms.
A subset of machine learning, often used in pattern recognition, which has made significant strides but is argued to be insufficient for achieving human-like general intelligence.
The idea that the brain's common-sense knowledge representations are analogous to programs found in modern video game engines, integrating physics engines and simple AI planning models.
A computational model built in the speaker's group that uses probabilistic inference in a game-style physics engine to simulate physical interactions and predict outcomes.
A project from the Probabilistic Computing Group at Koch Men's, aiming to converge AI tools.
A software idea developed by Improbable for very big distributed computing environments, enabling more complex and realistic simulations for immersive videogames.
A Google DeepMind AI program that plays the game of Go, highlighted as an example of narrow AI that excels at a specific task but lacks general intelligence or common sense.
One of the state-of-the-art industry AI captioning systems whose outputs are shared by the Pic Desk Bot to illustrate current AI capabilities and failures.
A computational abstraction used to capture common-sense knowledge, generalizing Bayesian networks and combining symbolic representation, probabilistic inference, and neural networks for expressive knowledge representation and flexible uncertainty handling.
A modern probabilistic programming library that integrates with PyTorch, combining neural networks with Bayesian inference.
A tool that incorporates aspects of Bayesian inference and flow-based models, likely within the deep learning context.
A standard tool in robotics for planning physically efficient movements, used in the speaker's models to simulate human reaches and actions.
A modern probabilistic programming tool that combines neural networks with Bayesian inference.
A Twitter bot that uses a state-of-the-art industry AI captioning system to caption random images from the web and upload the results, serving as a real-world test for AI limitations.
A probabilistic programming language developed in the speaker's group, built on the lambda calculus or Lisp, demonstrating a turing-complete framework for probability models.
An individual associated with MIT and founder of Singular Computing, working on brain-inspired, low-power approximate computing for highly parallel hardware.
A psychologist who, along with Felix Warneken, conducted famous experiments on the spontaneous helping behavior of young children.
Professor at MIT leading the computational cognitive science group and the speaker of this talk, focused on how humans learn efficiently and how to apply this to AI systems.
A prominent researcher in deep learning, associated with the MNIST dataset.
A colleague at CSAIL, known for work in programming languages and automatic code synthesis, collaborating with Kevin Ellis on program learning.
A leading figure in deep learning, formerly at Google and OpenAI, then Director of AI Research at Tesla; his blog post on the state of computer vision highlighted current AI limitations regarding human-like understanding.
Founder of Boston Dynamics, a leading company in humanoid and legged robots; an advisor to the CBMM.
A researcher who developed the models for a study on infants' understanding of goals, showing sensitivity to physical work.
Pioneering researchers in reinforcement learning, credited with early work on temporal difference learning.
A researcher who, along with Jess Hamrick, started the work on the intuitive physics engine in the speaker's group.
A researcher who, along with Pete Battaglia, started the work on the intuitive physics engine in the speaker's group.
Former President of the United States, featured in a popular image used by Andrej Karpathy to illustrate the gap between current computer vision and human understanding.
Proposed simple recurrent networks, an earlier and simpler version of recurrent neural network architectures.
First author of a Science paper on human-level concept learning, which gained significant publicity.
A current PhD student working with the speaker and Armando Solar-Lezama on combining programming language techniques with machine learning for program synthesis.
A psychologist who, along with Michael Tomasello, conducted famous experiments on the spontaneous helping behavior of young children.
A pioneer in AI who, like Turing, proposed the idea of building systems that grow into intelligence much like humans do.
A popular video-sharing platform where several videos (e.g., stacking cups baby, cat and cups baby, mouse vs. cracker, orangutan with Legos) are hosted and used as examples of human and animal intelligence.
A major tech company (big tech) that develops AI technologies; discussed in the context of commercial AI progress and research collaboration with academia.
A London-based startup developing a 'Spatial OS' for very large distributed computing environments, creating more complex and immersive video games.
A major tech company (big tech) that develops AI technologies.
A major industry player in computer vision and AI, specifically mentioned for its image captioning systems and researchers who produced the training datasets.
An automotive and energy company where Andrej Karpathy served as Director of AI Research.
A startup in Kendall Square founded by Joe Bates, developing technology for low-power, approximate, brain-like computing with the goal of building machines with billions of cores and reasonable power consumption.
A British artificial intelligence company acquired by Google, known for its work on AI systems like AlphaGo and a potential destination for researchers as discussed by the speaker.
A major tech company (big tech) that develops AI technologies.
A leading company known for its sophisticated humanoid and legged robots, highlighted for impressive hardware but acknowledged to have limited human-like cognition in its robots.
An AI research and deployment company where Andrej Karpathy was one of the founders.
The funding body for the Center for Brains, Minds, and Machines.
The academic institution where Andrej Karpathy received his PhD and wrote his influential blog post.
A research group led by Josh Tenenbaum at MIT, focusing on cognition and intelligence.
An NSF-funded Science and Technology Center that bridges the science and engineering of intelligence, with affiliations at MIT and Harvard.
The academic institution where Josh Tenenbaum is a professor and where the Center for Brains, Minds, and Machines is hosted.
An academic institution partnered with the Center for Brains, Minds, and Machines (CBMM).
A leading journal of theoretical and mathematical psychology, where many foundational papers for deep learning and reinforcement learning were originally published.
A general interest science journal where the backpropagation paper was published by researchers affiliated with an Institute for Cognitive Science.
A notable scientific journal where a paper by Brendan Lake and colleagues on human-level concept learning was published and featured on its cover.
A journal focused on cognitive science, also a publication venue for early AI research.
More from Lex Fridman
View all 505 summaries
154 minRick Beato: Greatest Guitarists of All Time, History & Future of Music | Lex Fridman Podcast #492
23 minKhabib vs Lex: Training with Khabib | FULL EXCLUSIVE FOOTAGE
196 minOpenClaw: The Viral AI Agent that Broke the Internet - Peter Steinberger | Lex Fridman Podcast #491
266 minState of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI | Lex Fridman Podcast #490
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free