Key Moments

Stanford CS25: Transformers United V6 I Advancing Science and Medicine with Collaborative AI Agents

Stanford OnlineStanford Online
Education5 min read67 min video
May 27, 2026|1,433 views|64|4
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

AI agents can now generate novel scientific hypotheses, but their current capabilities are largely system-one thinking; to achieve scientific breakthroughs, we need system-two thinking AI.

Key Insights

1

The co-scientist AI system, utilizing a multi-agent Gemini-based architecture, assists researchers by systematically generating and refining novel hypotheses for complex scientific challenges.

2

While current large language models excel at system-one thinking (fast, intuitive correlations), true scientific discovery requires system-two thinking (slow, deliberate, rigorous analysis).

3

AlphaFold, while revolutionary, is a specialized system, highlighting the need for generality in AI for true scientific superintelligence.

4

The co-scientist system employs a self-play, self-debate mechanism inspired by AlphaGo and AlphaZero, where agents refine hypotheses through continuous peer review and critique.

5

A validated hypothesis from the co-scientist system led to a peer-reviewed publication in Advanced Science, demonstrating its potential in real-world research.

6

The system has shown promise in identifying new drug repurposing candidates for cancers and novel epigenomic targets for liver fibrosis.

AI as a collaborative partner for scientific and medical experts

The core mission is to develop general-purpose AI systems that act as collaborative partners for scientists and doctors. This involves accelerating the pace of scientific discovery and democratizing medical expertise. The co-scientist project, a multi-agent Gemini-based system, aims to assist researchers by systematically generating and refining novel hypotheses for complex scientific challenges. This initiative builds upon earlier work, such as Med-PaLM, an early medically tuned large language model that achieved passing and expert-level scores on US medical license exams, underscoring the potential of AI in specialized domains.

From passing medical exams to hypothesis generation

The genesis of the co-scientist project stemmed from a realization that while LLMs like Med-PaLM were adept at tasks like question answering and summarization, their potential for hypothesis generation was largely untapped. Initially, the idea was met with skepticism due to the perceived limitations of LLMs, particularly their tendency towards hallucinations and surface-level correlations, characteristic of 'system-one' thinking. However, the project moved forward with the understanding that scientific discovery often requires a more deliberate and rigorous 'system-two' thinking process, which the team aimed to imbue into AI systems.

Defining and pursuing artificial general scientific intelligence

The pursuit of a truly superintelligent scientific AI requires generality – the ability to tackle a wide range of problems. Unlike specialized models like AlphaFold, which excels at protein structure prediction but cannot address other scientific questions, a general AI should understand and make progress on diverse challenges. This generality is akin to the human brain's remarkable ability to engage in various cognitive tasks, from language and art to science and philosophy. The key building block for this generality is argued to be natural language, as evidenced by the broad capabilities of current LLMs like Gemini, Claude, and GPT. While these models demonstrate generality, their application to complex scientific tasks like hypothesis generation is still in its early stages.

The self-play and self-debate mechanism for hypothesis refinement

The co-scientist system adopts an approach inspired by DeepMind's success with AlphaGo and AlphaZero, utilizing a self-play and reinforcement learning framework. Instead of games, however, the environment consists of scientific problems. The core mechanism involves a team of agents engaging in continuous 'scientific debates' and 'self-debates.' These agents generate, refine, review, critique, and rank hypotheses. This multi-agent setup, powered by advanced Gemini models, simulates a structured, rigorous thinking process that mirrors aspects of human scientific thought, aiming for a more deliberate and effective approach to hypothesis generation.

Architectural overview of the co-scientist system

Co-scientist functions as a general-purpose multi-agent system for scientific discovery. The human scientist remains in the driver's seat, guiding the system through natural language prompts specifying research goals, constraints, and preferences. This input forms the context for the AI's computation, which can dynamically proceed for minutes, hours, or even days. The output is a research report containing hypotheses or solutions. Internally, the system operates on a loop with four primary functions: generating ideas, reviewing them, ranking them, and evolving them. Each agent is configured with specific strategies, drawing from a 'library of strategies' that can be inspired by expert human thinking and refined over time. This architecture allows for a continuous feedback loop and self-improvement.

The role of the ranking agent and epistemic humility

A crucial component of the co-scientist system is the ranking agent, which uses a debate mechanism to prioritize hypotheses. This is vital because experts often have more ideas than resources, making it essential to surface only the most promising ones. The system computes Elo scores for hypotheses, similar to competitive games, to rank them based on defined criteria. Furthermore, the system emphasizes 'epistemic humility,' conveying its confidence levels and identifying key uncertainties. This ensures that the generated hypotheses are presented with appropriate context, allowing scientists to focus their attention effectively and guiding future research directions.

Validation through real-world scientific discovery examples

The efficacy of the co-scientist system has been demonstrated through several validation studies. In one instance, the system generated hypotheses that closely mirrored a significant, yet unpublished, discovery by researchers at Imperial College London concerning antimicrobial resistance, leading to a peer-reviewed publication. Other examples include identifying new drug repurposing candidates for acute myeloid leukemia, discovering novel epigenomic targets for liver fibrosis, and even designing de novo proteins with specific activities, sometimes integrating tools like AlphaFold. These case studies highlight the system's ability to contribute to genuine scientific breakthroughs across various domains.

Bridging human expertise with AI's broad exploration

The co-scientist system facilitates a powerful synergy between human expertise and AI-driven exploration. For instance, when identifying potential treatments for liver fibrosis, the AI suggested drugs from a cancer research context, a connection a human liver expert might not readily make. Similarly, in analyzing complex data like protein structures, the AI can identify novel patterns that lead to discovering previously unknown biological entities, such as a massive potato immune protein. This complementarity, where AI offers breadth and unexpected connections, and humans provide deep domain expertise for validation and judgment, represents a new paradigm for AI-human scientific collaboration.

Co-Scientist: Enhancing Scientific Discovery

Practical takeaways from this episode

Do This

Define clear research goals with sufficient detail for the AI.
Provide constraints, rubrics, and preferences to guide hypothesis generation.
Leverage multimodal data like PDFs and experimental results as context.
Utilize the AI's ability to explore diverse scientific domains and make novel connections.
Engage with the detailed reports, focusing on prioritized hypotheses.
Refine the problem or break it down if the AI gets stuck, especially in mathematical problems.
Implement layered safety checks for research prompts and generated ideas.
Collaborate with the AI to leverage its broad search capabilities and your deep expertise.

Avoid This

Do not expect the AI to solve problems beyond current scientific knowledge (e.g., time machines).
Do not rely solely on standard LLMs for complex, structured scientific thinking; agentic systems are more robust.
Do not simply generate many ideas; prioritize and validate them to respect expert time.
Do not ignore the AI's uncertainty estimations or confidence levels.
Do not underestimate the potential for unexpected connections between different scientific fields.

Common Questions

Co-Scientist is an AI system designed to act as a collaborative partner for scientists. It uses a multi-agent approach to generate, critique, rank, and refine scientific hypotheses, aiming to accelerate discovery and provide novel insights.

Topics

Mentioned in this video

Software & Apps
MedPM

An early AI system developed by Vive and his team that achieved passing scores on the US medical license exam.

MedPM 2

A subsequent version of the MedPM system that achieved expert-level scores on the US medical license exam.

Co-Scientist

An AI system designed to act as a collaborative partner for scientists, aiming to accelerate discovery.

PaLM

A precursor model to Gemini, used in early experiments for hypothesis generation.

Gemini

Google's advanced AI model, mentioned as the successor to PaLM and a current tool for various tasks.

ChatGPT

A popular AI chatbot, mentioned in comparison to Gemini.

AlphaFold

A highly specialized AI model for predicting protein structures, cited as an example of capability but not generality.

AlphaGo

An AI program that mastered the game of Go, used as an example of self-play and reinforcement learning.

Deep Blue

An AI chess-playing computer that competed against Garry Kasparov in 1999.

AlphaZero

An advanced version of AlphaGo that demonstrated the power of self-play and reinforcement learning in complex environments.

AlphaStar

An AI system developed by DeepMind that achieved high performance in complex strategy games like StarCraft II.

Gemini models

Google's latest AI models, used to power the Co-Scientist agents, possessing long context, multimodal, and agentic tool-use capabilities.

B2R receptor

A receptor on brain cells triggered by bradinin, leading to neurodegeneration.

DHX9

A gene identified by Co-Scientist as potentially linking neurodegenerative diseases with small cell lung cancer.

SRRM4

A gene identified by Co-Scientist as potentially linking neurodegenerative diseases with small cell lung cancer.

arXiv

A preprint server that introduced a new policy regarding hallucinated references in submitted papers.

More from Stanford Online

View all 67 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free