Key Moments

MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)

Lex FridmanLex Fridman
Science & Technology5 min read96 min video
Feb 8, 2018|212,693 views|4,064|210
Save to Pod
TL;DR

AI needs to move beyond pattern recognition to model the world like humans, requiring a science-engineering approach.

Key Insights

1

Current AI excels at pattern recognition but lacks common sense and flexible, general-purpose intelligence.

2

Human intelligence involves modeling the world, enabling prediction, imagination, planning, and learning beyond pattern recognition.

3

The Center for Brains, Minds, and Machines (CBMM) aims to bridge cognitive science and AI engineering by reverse-engineering human intelligence.

4

Deep learning successes are rooted in decades of research in cognitive science and psychology, highlighting the value of a science-driven approach.

5

Visual intelligence requires more than just image recognition; it involves understanding space, objects, physics, and other minds.

6

Probabilistic programs and game engines offer promising computational frameworks for building AI that models the world and performs common-sense reasoning.

THE LIMITATIONS OF CURRENT AI TECHNOLOGIES

While AI has made significant strides, particularly in pattern recognition through deep learning, current systems lack true general intelligence. These systems, often excelling at single tasks like playing Go, fail to possess common sense or the flexible, adaptable intelligence that humans use for everyday learning and problem-solving. They require massive datasets and extensive engineering, highlighting a fundamental gap between specialized AI and human-level cognition.

HUMAN INTELLIGENCE: MODELING THE WORLD

Human intelligence is characterized by the ability to model the world, not just recognize patterns. This includes understanding cause and effect, imagining future scenarios, planning actions, and continuously learning by building and refining mental models. These modeling capabilities go beyond perception, encompassing aspects like consciousness, meaning, and the ability to acquire knowledge accumulated over generations, forming the core of human cognitive prowess.

THE CBMM VISION: SCIENCE AND ENGINEERING

The Center for Brains, Minds, and Machines (CBMM) advocates for a joint pursuit of understanding human intelligence through scientific research and engineering advanced AI systems. By reverse-engineering how intelligence arises in the human mind and brain, the center aims to develop computational theories and engineering applications. This approach treats cognitive science and neuroscience from an engineering perspective, producing insights directly translatable into building intelligent machines.

HISTORICAL ROOTS OF AI PROGRESS

The foundations of current AI techniques, like deep learning and reinforcement learning, have deep historical roots in psychology and cognitive science. Foundational papers on algorithms such as backpropagation and temporal difference learning were often published in psychology journals, demonstrating a long-standing tradition of scientists thinking like engineers. This history underscores the critical role of basic scientific research in driving AI innovation.

VISUAL INTELLIGENCE AND THE BRAIN'S 'OS'

Visual intelligence, a key area of focus, extends beyond mere image recognition. Our perception of the world is a rich representation constructed from foveated glimpses, requiring the brain to integrate information, understand space, objects, physics, and other agents' minds. This process is conceptualized as a 'brain OS' that stitches together bottom-up sensory input with prior knowledge, forming core cognitive representations necessary for higher-level cognition.

LIMITATIONS OF DATASET-DRIVEN APPROACHES

Current AI successes, like image captioning, often suffer from dataset overfitting, performing well on specific training data but failing on novel, real-world scenarios. This is exemplified by captioning bots making errors or providing superficial descriptions. This suggests that simply scaling up pattern recognition on larger datasets is insufficient for achieving true understanding or general intelligence, as demonstrated by Andrej Karpathy's critique.

ROBOTICS AND THE CHILDHOOD OF INTELLIGENCE

The challenges in robotics, particularly in natural object manipulation and control, highlight the gap between current AI and human capabilities. While robots can be precisely controlled at a low level, they lack the intuitive understanding of physics and planning that even young children possess. Studies of infants stacking cups or interacting with objects reveal sophisticated early forms of symbolic cognition and goal-directed planning that are difficult to replicate computationally.

CREATIVITY, OBJECT PERMANENCE, ANDSYMBOLIC REASONING

Human intelligence involves creativity, such as a baby stacking cups on a cat, and robust object permanence, enabling a child to represent unseen objects. These abilities, along with the capacity for symbolic reasoning, are crucial for planning and interaction. Understanding these phenomena, like a child's ability to infer the location of an object behind them without looking, requires going beyond simple perception to model the mind and world.

THE ROLE OF PROBABILISTIC PROGRAMS AND GAME ENGINES

Probabilistic programs and game-like physics engines are proposed as key technologies for building AI that can model the world. These frameworks combine symbolic representation, probabilistic inference, and learning, allowing for common-sense reasoning about physical interactions and agent intentions. By simulating plausible physical scenarios and inferring goals, these tools offer a path towards creating systems that understand and interact with the world more like humans.

INTUITIVE PHYSICS AND PSYCHOLOGY ENGINES

Research is developing 'intuitive physics engines' that simulate physical scenes and predict outcomes, showing correlations with human judgments. Similarly, 'intuitive psychology engines' model agents' goals and intentions. These models, tested with infants, suggest that a foundational understanding of physics and social interaction is present very early in development, providing insights for building more sophisticated AI.

INVERSE PLANNING AND UNDERSTANDING INTENTIONS

Understanding human actions, such as inferring the object someone is reaching for, relies on inverse planning – working backward from observed actions to probable goals. This requires models that integrate physics engines with utility calculations to predict intentions. Such systems can analyze complex social interactions, like helping or hindering, by recursively modeling agents' expectations about each other's utilities.

LEARNING AS PROGRAMMING THE MIND'S ENGINE

True learning, especially for achieving Artificial General Intelligence (AGI), involves more than optimizing differentiable functions; it's about program learning – creating and modifying programs that represent our understanding of the world. This 'hacking' process, analogous to a child making their internal code more awesome, includes writing new code, refactoring, and transferring knowledge, which is fundamentally different from current gradient-based optimization.

ONE-SHOT LEARNING AND CONCEPT FORMATION

A key aspect of human learning is the ability to learn from very few examples, or 'one-shot learning'. Research in this area, such as building systems that can learn to draw characters from a few samples, involves developing simple probabilistic programs that capture the generative process. This demonstrates a step towards machines that can form concepts and generalize like humans, moving beyond rote pattern recognition.

THE FUTURE: GROWING INTELLIGENCE LIKE HUMANS

The ultimate goal is to build AI systems that grow and learn intelligence organically, much like a human child. This involves understanding the fundamental building blocks of cognition, from early learning about physics and social interaction to the development of complex reasoning abilities. By combining insights from cognitive science with advanced computational tools, there is a path towards creating AGI that can navigate and interact with the human world effectively.

Common Questions

Current AI technologies excel at specific tasks like playing Go, but they lack common sense and flexible, general-purpose intelligence. True AGI, unlike specialized AI, would be able to model the world, imagine novel situations, set goals, make plans, and learn efficiently from sparse data, akin to human learning.

Topics

Mentioned in this video

Concepts
Bayesian networks

A type of directed graphical model that represents probabilistic relationships among a set of variables, generalized by probabilistic programs.

MNIST dataset

A famous dataset of handwritten digits (0-9) used extensively in deep learning and pattern recognition research.

Object Permanence

The understanding that objects continue to exist even when they cannot be seen, heard, or touched, demonstrated by a baby in a video and explained as a strong form of cognitive ability.

Turing Test

A test proposed by Alan Turing to assess a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.

Image Captioning

An aspect of visual intelligence where AI systems generate textual descriptions for images, highlighted as an area of apparent success for pattern recognition-based AI but also revealing its limitations.

Reinforcement Learning

A type of machine learning that focuses on how intelligent agents should take actions in an environment to maximize cumulative reward.

Program Learning Programs

A concept suggesting that if knowledge is embodied as a program, then learning involves programs that build or modify other programs, analogous to children hacking on their mental code.

Child as Scientist

A popular metaphor in cognitive development that emphasizes children as active theory builders and their play as a form of casual experimentation.

Backpropagation

An algorithm used in deep learning for training neural networks, with origins tracing back to psychological journals.

Intuitive Psychology Engine

A model that captures how young children understand other people's actions and goals, using probabilistic programs defined over simple planning and perception programs.

Perceptron

A type of artificial neuron and the earliest feedforward neural network, proposed by Rosenblatt.

Boltzmann Machine

A type of stochastic recurrent neural network proposed by Hinton and colleagues.

Naive Utility Calculus

The idea that infants understand a basic calculus of cost and benefit, where children take costly actions to achieve rewarding goal states, with costs measured in physical terms.

Deep Learning

A subset of machine learning, often used in pattern recognition, which has made significant strides but is argued to be insufficient for achieving human-like general intelligence.

Game Engine in the Head

The idea that the brain's common-sense knowledge representations are analogous to programs found in modern video game engines, integrating physics engines and simple AI planning models.

Intuitive Physics Engine

A computational model built in the speaker's group that uses probabilistic inference in a game-style physics engine to simulate physical interactions and predict outcomes.

Software & Apps
Gen

A project from the Probabilistic Computing Group at Koch Men's, aiming to converge AI tools.

Spatial OS

A software idea developed by Improbable for very big distributed computing environments, enabling more complex and realistic simulations for immersive videogames.

AlphaGo

A Google DeepMind AI program that plays the game of Go, highlighted as an example of narrow AI that excels at a specific task but lacks general intelligence or common sense.

Microsoft Caption Bot

One of the state-of-the-art industry AI captioning systems whose outputs are shared by the Pic Desk Bot to illustrate current AI capabilities and failures.

Probabilistic Programs

A computational abstraction used to capture common-sense knowledge, generalizing Bayesian networks and combining symbolic representation, probabilistic inference, and neural networks for expressive knowledge representation and flexible uncertainty handling.

ProbTorch

A modern probabilistic programming library that integrates with PyTorch, combining neural networks with Bayesian inference.

BayesFlow

A tool that incorporates aspects of Bayesian inference and flow-based models, likely within the deep learning context.

MuJoCo Physics Engine

A standard tool in robotics for planning physically efficient movements, used in the speaker's models to simulate human reaches and actions.

Pyro

A modern probabilistic programming tool that combines neural networks with Bayesian inference.

Pic Desk Bot

A Twitter bot that uses a state-of-the-art industry AI captioning system to caption random images from the web and upload the results, serving as a real-world test for AI limitations.

Church

A probabilistic programming language developed in the speaker's group, built on the lambda calculus or Lisp, demonstrating a turing-complete framework for probability models.

People
Joe Bates

An individual associated with MIT and founder of Singular Computing, working on brain-inspired, low-power approximate computing for highly parallel hardware.

Michael Tomasello

A psychologist who, along with Felix Warneken, conducted famous experiments on the spontaneous helping behavior of young children.

Josh Tenenbaum

Professor at MIT leading the computational cognitive science group and the speaker of this talk, focused on how humans learn efficiently and how to apply this to AI systems.

Yann LeCun

A prominent researcher in deep learning, associated with the MNIST dataset.

Armando Solar-Lezama

A colleague at CSAIL, known for work in programming languages and automatic code synthesis, collaborating with Kevin Ellis on program learning.

Andrej Karpathy

A leading figure in deep learning, formerly at Google and OpenAI, then Director of AI Research at Tesla; his blog post on the state of computer vision highlighted current AI limitations regarding human-like understanding.

Mark Raibert

Founder of Boston Dynamics, a leading company in humanoid and legged robots; an advisor to the CBMM.

Tomer Ullman

A researcher who developed the models for a study on infants' understanding of goals, showing sensitivity to physical work.

Sutton and Barto

Pioneering researchers in reinforcement learning, credited with early work on temporal difference learning.

Pete Battaglia

A researcher who, along with Jess Hamrick, started the work on the intuitive physics engine in the speaker's group.

Jess Hamrick

A researcher who, along with Pete Battaglia, started the work on the intuitive physics engine in the speaker's group.

Barack Obama

Former President of the United States, featured in a popular image used by Andrej Karpathy to illustrate the gap between current computer vision and human understanding.

Jeff Elman

Proposed simple recurrent networks, an earlier and simpler version of recurrent neural network architectures.

Brendan Lake

First author of a Science paper on human-level concept learning, which gained significant publicity.

Kevin Ellis

A current PhD student working with the speaker and Armando Solar-Lezama on combining programming language techniques with machine learning for program synthesis.

Felix Warneken

A psychologist who, along with Michael Tomasello, conducted famous experiments on the spontaneous helping behavior of young children.

Marvin Minsky

A pioneer in AI who, like Turing, proposed the idea of building systems that grow into intelligence much like humans do.

Companies
YouTube

A popular video-sharing platform where several videos (e.g., stacking cups baby, cat and cups baby, mouse vs. cracker, orangutan with Legos) are hosted and used as examples of human and animal intelligence.

Google

A major tech company (big tech) that develops AI technologies; discussed in the context of commercial AI progress and research collaboration with academia.

Improbable

A London-based startup developing a 'Spatial OS' for very large distributed computing environments, creating more complex and immersive video games.

Facebook

A major tech company (big tech) that develops AI technologies.

Microsoft

A major industry player in computer vision and AI, specifically mentioned for its image captioning systems and researchers who produced the training datasets.

Tesla

An automotive and energy company where Andrej Karpathy served as Director of AI Research.

Singular Computing

A startup in Kendall Square founded by Joe Bates, developing technology for low-power, approximate, brain-like computing with the goal of building machines with billions of cores and reasonable power consumption.

DeepMind

A British artificial intelligence company acquired by Google, known for its work on AI systems like AlphaGo and a potential destination for researchers as discussed by the speaker.

IBM

A major tech company (big tech) that develops AI technologies.

Boston Dynamics

A leading company known for its sophisticated humanoid and legged robots, highlighted for impressive hardware but acknowledged to have limited human-like cognition in its robots.

OpenAI

An AI research and deployment company where Andrej Karpathy was one of the founders.

More from Lex Fridman

View all 505 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free