Key Moments
Stanford CS25: Transformers United V6 I Advancing Science and Medicine with Collaborative AI Agents
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
AI agents can now generate novel scientific hypotheses, but their current capabilities are largely system-one thinking; to achieve scientific breakthroughs, we need system-two thinking AI.
Key Insights
The co-scientist AI system, utilizing a multi-agent Gemini-based architecture, assists researchers by systematically generating and refining novel hypotheses for complex scientific challenges.
While current large language models excel at system-one thinking (fast, intuitive correlations), true scientific discovery requires system-two thinking (slow, deliberate, rigorous analysis).
AlphaFold, while revolutionary, is a specialized system, highlighting the need for generality in AI for true scientific superintelligence.
The co-scientist system employs a self-play, self-debate mechanism inspired by AlphaGo and AlphaZero, where agents refine hypotheses through continuous peer review and critique.
A validated hypothesis from the co-scientist system led to a peer-reviewed publication in Advanced Science, demonstrating its potential in real-world research.
The system has shown promise in identifying new drug repurposing candidates for cancers and novel epigenomic targets for liver fibrosis.
AI as a collaborative partner for scientific and medical experts
The core mission is to develop general-purpose AI systems that act as collaborative partners for scientists and doctors. This involves accelerating the pace of scientific discovery and democratizing medical expertise. The co-scientist project, a multi-agent Gemini-based system, aims to assist researchers by systematically generating and refining novel hypotheses for complex scientific challenges. This initiative builds upon earlier work, such as Med-PaLM, an early medically tuned large language model that achieved passing and expert-level scores on US medical license exams, underscoring the potential of AI in specialized domains.
From passing medical exams to hypothesis generation
The genesis of the co-scientist project stemmed from a realization that while LLMs like Med-PaLM were adept at tasks like question answering and summarization, their potential for hypothesis generation was largely untapped. Initially, the idea was met with skepticism due to the perceived limitations of LLMs, particularly their tendency towards hallucinations and surface-level correlations, characteristic of 'system-one' thinking. However, the project moved forward with the understanding that scientific discovery often requires a more deliberate and rigorous 'system-two' thinking process, which the team aimed to imbue into AI systems.
Defining and pursuing artificial general scientific intelligence
The pursuit of a truly superintelligent scientific AI requires generality – the ability to tackle a wide range of problems. Unlike specialized models like AlphaFold, which excels at protein structure prediction but cannot address other scientific questions, a general AI should understand and make progress on diverse challenges. This generality is akin to the human brain's remarkable ability to engage in various cognitive tasks, from language and art to science and philosophy. The key building block for this generality is argued to be natural language, as evidenced by the broad capabilities of current LLMs like Gemini, Claude, and GPT. While these models demonstrate generality, their application to complex scientific tasks like hypothesis generation is still in its early stages.
The self-play and self-debate mechanism for hypothesis refinement
The co-scientist system adopts an approach inspired by DeepMind's success with AlphaGo and AlphaZero, utilizing a self-play and reinforcement learning framework. Instead of games, however, the environment consists of scientific problems. The core mechanism involves a team of agents engaging in continuous 'scientific debates' and 'self-debates.' These agents generate, refine, review, critique, and rank hypotheses. This multi-agent setup, powered by advanced Gemini models, simulates a structured, rigorous thinking process that mirrors aspects of human scientific thought, aiming for a more deliberate and effective approach to hypothesis generation.
Architectural overview of the co-scientist system
Co-scientist functions as a general-purpose multi-agent system for scientific discovery. The human scientist remains in the driver's seat, guiding the system through natural language prompts specifying research goals, constraints, and preferences. This input forms the context for the AI's computation, which can dynamically proceed for minutes, hours, or even days. The output is a research report containing hypotheses or solutions. Internally, the system operates on a loop with four primary functions: generating ideas, reviewing them, ranking them, and evolving them. Each agent is configured with specific strategies, drawing from a 'library of strategies' that can be inspired by expert human thinking and refined over time. This architecture allows for a continuous feedback loop and self-improvement.
The role of the ranking agent and epistemic humility
A crucial component of the co-scientist system is the ranking agent, which uses a debate mechanism to prioritize hypotheses. This is vital because experts often have more ideas than resources, making it essential to surface only the most promising ones. The system computes Elo scores for hypotheses, similar to competitive games, to rank them based on defined criteria. Furthermore, the system emphasizes 'epistemic humility,' conveying its confidence levels and identifying key uncertainties. This ensures that the generated hypotheses are presented with appropriate context, allowing scientists to focus their attention effectively and guiding future research directions.
Validation through real-world scientific discovery examples
The efficacy of the co-scientist system has been demonstrated through several validation studies. In one instance, the system generated hypotheses that closely mirrored a significant, yet unpublished, discovery by researchers at Imperial College London concerning antimicrobial resistance, leading to a peer-reviewed publication. Other examples include identifying new drug repurposing candidates for acute myeloid leukemia, discovering novel epigenomic targets for liver fibrosis, and even designing de novo proteins with specific activities, sometimes integrating tools like AlphaFold. These case studies highlight the system's ability to contribute to genuine scientific breakthroughs across various domains.
Bridging human expertise with AI's broad exploration
The co-scientist system facilitates a powerful synergy between human expertise and AI-driven exploration. For instance, when identifying potential treatments for liver fibrosis, the AI suggested drugs from a cancer research context, a connection a human liver expert might not readily make. Similarly, in analyzing complex data like protein structures, the AI can identify novel patterns that lead to discovering previously unknown biological entities, such as a massive potato immune protein. This complementarity, where AI offers breadth and unexpected connections, and humans provide deep domain expertise for validation and judgment, represents a new paradigm for AI-human scientific collaboration.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
●Books
●Drugs & Medications
●Concepts
●People Referenced
Co-Scientist: Enhancing Scientific Discovery
Practical takeaways from this episode
Do This
Avoid This
Common Questions
Co-Scientist is an AI system designed to act as a collaborative partner for scientists. It uses a multi-agent approach to generate, critique, rank, and refine scientific hypotheses, aiming to accelerate discovery and provide novel insights.
Topics
Mentioned in this video
Research laboratory where Vive works, focusing on AI in science and medicine.
A research program co-led by Vive aiming to build and democratize medical superintelligence.
The research division Vive worked at previously, focusing on multimodal assistant systems.
An institution where Vive is on the faculty for executive education.
A hospital in Texas where physician scientists used Co-Scientist for cancer drug discovery.
A hospital where researchers studied the link between ACE inhibitors and the risk of Alzheimer's.
An early AI system developed by Vive and his team that achieved passing scores on the US medical license exam.
A subsequent version of the MedPM system that achieved expert-level scores on the US medical license exam.
An AI system designed to act as a collaborative partner for scientists, aiming to accelerate discovery.
A precursor model to Gemini, used in early experiments for hypothesis generation.
Google's advanced AI model, mentioned as the successor to PaLM and a current tool for various tasks.
A popular AI chatbot, mentioned in comparison to Gemini.
A highly specialized AI model for predicting protein structures, cited as an example of capability but not generality.
An AI program that mastered the game of Go, used as an example of self-play and reinforcement learning.
An AI chess-playing computer that competed against Garry Kasparov in 1999.
An advanced version of AlphaGo that demonstrated the power of self-play and reinforcement learning in complex environments.
An AI system developed by DeepMind that achieved high performance in complex strategy games like StarCraft II.
Google's latest AI models, used to power the Co-Scientist agents, possessing long context, multimodal, and agentic tool-use capabilities.
A receptor on brain cells triggered by bradinin, leading to neurodegeneration.
A gene identified by Co-Scientist as potentially linking neurodegenerative diseases with small cell lung cancer.
A gene identified by Co-Scientist as potentially linking neurodegenerative diseases with small cell lung cancer.
A preprint server that introduced a new policy regarding hallucinated references in submitted papers.
A professor at Stanford who suggested using LLMs for hypothesis generation.
World chess champion, famously competed against Deep Blue, highlighted for his general intelligence.
CEO of DeepMind, who commented on Gary Kasparov's general intelligence compared to Deep Blue.
Mentioned as a source of inspiration for developing strategies for scientific idea generation through podcasts.
Researchers at Imperial College who had a breakthrough in antimicrobial resistance and collaborated on testing Co-Scientist.
Nobel laureate recognized for discovering Yamanaka factors, used for cell rejuvenation.
A professor in Germany who validated the Co-Scientist's hypothesis regarding DHX9 and SRRM4 in small cell lung cancer.
More from Stanford Online
View all 67 summaries
66 minStanford CS153 Frontier Systems | The Road Ahead: Resilience Required
102 minStanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation
80 minStanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training
85 minStanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 14: Data
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free