How is AI being applied to protein biology?

AI models, like protein language models, are being trained on vast amounts of protein sequence data to predict structures and functions, demonstrating that scaling laws and general ML approaches are effective in this domain.

Can AI models predict protein structure as well as specialized models like AlphaFold?

Recent models can achieve near-par performance to specialized models like AlphaFold using only sequence data, especially in critical areas like antibody design, challenging the necessity of handcrafted features.

What are the challenges with current LLM reinforcement learning?

Current LLM RL often plateaus due to a lack of continually novel and useful learning signals. Collecting RL tasks by hand is also a bottleneck for continuous improvement.

How does 'selfplay' aim to improve LLMs?

Selfplay involves the LLM generating its own tasks (conjecturer) and then attempting to solve them (solver), creating a continuous loop of learning and improvement that can potentially surpass human-level performance.

What are the limitations of basic selfplay for LLMs?

Basic selfplay can fail because the 'conjecturer' might generate overly complex or artificial problems that don't lead to meaningful learning for the 'solver', resulting in stagnation.

How does 'streaming RAG' improve voice AI interactions?

Streaming RAG processes user speech in chunks while it's being spoken, rather than waiting for the utterance to end, significantly reducing latency and making voice AI interactions more natural.

Why is Low-Rank Adaptation (LoRA) useful for LLMs?

LoRA shows impressive performance with lower sample sizes compared to full fine-tuning, offering a more efficient way to adapt LLMs to new data or tasks.

What is 'Lean' and why is it important for AI?

Lean is a formal verification language and functional programming language that allows for fully explicit and verifiable proofs, crucial for ensuring the correctness of AI systems and mathematical research.

How are LLMs being used with formal verification tools like Lean?

LLMs are being combined with formal verification tools to check proofs, generate code, and verify its correctness, moving towards 'verifiable coding' and guaranteed software.

What is the philosophy behind agentic programming assistance like Claude?

The philosophy is to treat software development like real-time strategy games: parallelize tasks, maximize agent output, use minimal human input for course correction, and prioritize speed and throughput over initial perfection.

How can developers leverage AI agents for software engineering?

Developers can use agents by running them in cloud instances, providing them with extensive documentation and context, and setting up systems for parallel task execution with clear monitoring and feedback loops.

Key Moments

5 Papers That Show Where AI Research Is Heading Right Now

Y Combinator

Science & Technology8 min read77 min video

Jun 12, 2026|20,083 views|770|22

YC Y Combinator

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

New AI models for protein biology show scaling laws hold similarly to language models, but require vast amounts of data and can perform comparably to hand-engineered systems like AlphaFold.

Key Insights

Scaling laws in protein biology, similar to language models, show improvements with increased compute and data, with ESM Cambrian models demonstrating continuous climbing performance.

The ESM Cambrian model achieved near-parity with AlphaFold 3 in protein structure prediction without using multiple sequence alignments (MSAs), and even surpassed it in antibody design tasks.

Self-play for LLMs, while promising for unbounded learning, currently plateaus due to the conjecturer generating overly complex and unhelpful tasks, requiring a 'guide' component to ensure relatedness and relevance.

Streaming RAG reduces latency in voice AI by analyzing spoken words in chunks and running retrieval while the query is still being formed, decreasing latency by up to 1.5 seconds without sacrificing accuracy.

Lean, a formal verification system, is enabling a new era of 'verified intelligence' by allowing AI to generate and verify complex mathematical proofs and potentially ensure code correctness and scientific reproducibility.

Agentic programming in software engineering is compared to real-time strategy games, emphasizing maximal parallelization, high visibility, continuous feedback, and satisficing over perfection to increase output by up to 3.5x engineer per month.

Scaling laws in biology mirror language models, but data is key

The research presented suggests that scaling laws, a fundamental driver of progress in language models, also apply to protein biology. Models like ESM Cambrian show that with increased parameters and, crucially, massive datasets (e.g., 2.8 billion sequences compared to 50 million in prior ESM2 models), performance continues to improve without plateauing. This "bitter lesson" from AI, as termed by Richard Sutton, posits that general methods leveraging scale often outperform hand-engineered domain knowledge. In protein biology, this translates to training models on vast evolutionary sequence data to predict protein structure and function. The paper highlights that biology's "training data"—evolutionary sequences—is orders of magnitude larger than human-generated text, suggesting immense potential for continued scaling. While LLM scaling laws are well-understood, their application to biology required validation. The presented work tests this by training models on protein sequences, treating amino acids as tokens. A key metric used is predicting 'long-distance contacts' in proteins, a proxy for understanding protein structure. The ESM Cambrian model, trained on extensive metagenomic data, demonstrated a clear log-linear improvement curve with compute, extrapolating cleanly from smaller training runs. This suggests that the principles of scaling compute and data are transferable, and that biology, like language, benefits from these general AI principles. The implication is that continued investment in data collection and model scale will yield predictable improvements in biological AI tasks.

AI models match and surpass specialized protein folding systems

A significant advancement discussed is the ability of general protein language models, trained solely on sequence data, to perform comparably to, or even outperform, highly specialized systems like AlphaFold 3, which rely on hand-engineered features like Multiple Sequence Alignments (MSAs). The ESM fold 2 model, using only per-residue embeddings from the language model as input to a structure predictor, achieved near-parity in general protein complex prediction. More strikingly, it outperformed AlphaFold 3 on antibody design tasks, an area critical for drug development. This success underscores the 'bitter lesson' by showing a generalist model can rival or beat specialist systems when trained at scale. The advantage of this general approach is highlighted by its speed and applicability to areas where MSAs are scarce, such as novel antibody design. The paper also noted that even when MSAs are available, the general model performs well, and performance can be further improved by scaling inference-time compute (e.g., using looped refinement networks). This suggests a paradigm shift where general protein language models can serve as powerful foundation models, reducing reliance on lengthy, specialized feature engineering for many biological tasks. The ability to generate interpretable features within the model's latent space, corresponding to biological motifs and functions, further bolsters this claim, indicating deep learning of biological principles without explicit supervision.

Self-play for LLMs struggles with task generation quality

Self-play, inspired by systems like AlphaZero, offers a path to unbounded learning for LLMs by having the model generate and solve its own tasks, moving beyond human-generated data. However, a paper on 'Scaling Selfplay with Selfguidance' revealed a critical flaw: the 'conjecturer' model, tasked with generating challenging problems, tends to produce overly complex, artificial, and unhelpful tasks. This is because it's rewarded simply for difficulty, leading to contrived problems that don't effectively improve the 'solver' model's capabilities on truly useful tasks. For instance, in formal mathematics, the conjecturer generated extremely convoluted problem statements, mere noise for genuine problem-solving. This resulted in self-play performing no better than standard Reinforcement Learning (RL) baselines, failing to progress beyond an asymptote, such as solving only 60% of formal math problems. To address this, the researchers introduced 'Self-Guided Selfplay' (SGS). SGS incorporates a 'guide' component that acts as a judge, evaluating whether generated synthetic tasks are genuinely related to a set of target problems (initially unsolved problems) and are not artificially complex. The conjecturer is then updated with a dual reward: one for task difficulty and another for the guide's score. This approach grounds the synthetic data generation in meaningful problem distributions and penalizes the creation of "junk" tasks. While SGS showed improvement, achieving the performance of a much larger model with a smaller one, it did not fully solve the problem, indicating that refining task generation remains a significant challenge for self-play in LLMs.

Streaming RAG enhances voice AI responsiveness

Latency is a major hurdle for natural conversational AI, especially in voice applications where rapid responses are expected. Traditional Retrieval-Augmented Generation (RAG) systems, while reducing hallucinations, add significant delay. A paper on 'Streaming RAG' proposes a solution by analyzing spoken words in real-time and initiating the RAG pipeline *while* the user is still speaking their query. This approach aims to reduce the overall interaction time by overlapping speech recognition, retrieval, LLM processing, and response generation. The core idea is to avoid waiting for the complete utterance before starting the retrieval process. The paper explores two methods: fixed-interval streaming RAG, which runs RAG on sequential audio chunks, and a more sophisticated approach that fine-tunes a model to dynamically decide when to trigger the RAG system based on the relevance and novelty of the incoming speech. This decision-making process can be based on factors like the quality of retrieval from partial queries or the semantic content of the partial utterance. Results showed latency reductions of up to 1.5 seconds on human datasets with comparable accuracy to standard RAG, making conversations feel more natural and responsive. This research highlights the importance of addressing these practical engineering challenges to unlock the full potential of conversational AI.

Lean enables verified intelligence and rigorous AI for science

The increasing success of AI in solving complex mathematical problems, like IMO gold medals and 80-year-old conjecture proofs, highlights a growing need for formal verification. Lean, a theorem prover and functional programming language, is at the forefront of this movement, enabling what's termed 'verified intelligence.' Unlike informal mathematics, which can be flexible but prone to errors, Lean requires explicit, rigorous proofs that cannot be fooled. This rigor is crucial not only for validating AI-generated proofs but also for ensuring the correctness and reproducibility of AI in science and software development. Lean's power lies in its expressivity, compatibility with programming paradigms (like I/O and meta-programming), and its extensive formalized math library. Tools like 'TorchLean' allow for the formalization of neural networks directly within Lean, enabling the verification of properties like certified robustness and even the correctness of highly optimized operations like FlashAttention. This capability extends to building verifiable code, ensuring that generated code meets rigorous specifications, a significant advancement over current LLMs that primarily focus on code generation without guaranteed correctness. The vision is a future where scientific discoveries and software are built upon a foundation of formal guarantees, increasing trust and reliability across AI applications.

Agentic programming mirrors RTS games for maximum productivity

Agentic programming, leveraging AI agents for software development, is likened to playing real-time strategy (RTS) games, demanding a shift in traditional programming assumptions. Instead of linear, thoughtful design, the focus is on hyper-parallelization, continuous feedback, and 'satisficing'—achieving 'good enough' rather than perfect results. This approach, exemplified by Channel AI's workflow, involves numerous autonomous agents working in parallel, managed by an orchestrator. Key practices derived from RTS include: running all work on cloud instances for portability, aggressively documenting extensively to aid future agents, and prioritizing high agent "actions per minute" (tool calls per minute) over human intervention speed. This methodology encourages a mindset akin to managing an RTS army: deploying many agents simultaneously, providing minimal but timely course correction, and using audio-visual cues for high-level monitoring rather than deep dives into each agent's progress. Errors are expected and factored into the workflow, with corrections made early to save overall time. The goal is to maximize parallel execution and continuously push work forward, akin to constantly producing units and micro-managing tasks across the map. This approach has led to significant increases in output, such as a 60% growth in PRs per engineer per month by adopting these principles, suggesting that optimizing for high throughput and iterative progress is more effective than striving for initial perfection in agent-assisted development.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Drugs & Medications

●People Referenced

Developing AI-Powered Software

Practical takeaways from this episode

Do This

Run all processes, including scripts, from cloud instances for portability.

Utilize Git work trees for parallel development and avoid stepping on each other's toes.

Employ autonomous agents, one or many, for specific workflows.

Minimize human keystrokes to initiate work; course-correct later.

Use agents to push work as far as possible before requiring human feedback.

Aggressively document code and create structured, linked knowledge base files.

Spawn multiple agents and processes in parallel to maximize cognitive capacity.

Utilize audio cues and visual indicators (colors, icons) for better monitoring.

Track agent activity via tool calls per minute (APM) to ensure productivity.

Spend all available AI resources (tokens) efficiently; never leave them unused.

Practice satisficing: aim for 'good enough' rather than perfect solutions.

Mix different ticket sizes to keep processes moving and occupied.

Avoid This

Avoid typing anything outside of the cloud instances if possible.

Do not assume agents will adhere rigorously to initial specs; expect them to learn and adapt.

Do not be afraid of agents making assumptions; they can be corrected.

Avoid running code locally if portability and overnight execution are needed elsewhere.

Never trust LLMs to accurately predict task completion time.

Do not rely solely on code as the source of truth; dense documentation is more accessible for agents.

Avoid having idle AI resources (tokens); they represent an inefficient economy.

Do not let agents go in wrong directions without early detection and correction.

Common Questions

The 'Bitter Lesson' refers to the observation stated by Richard Sutton that AI research progresses most effectively by scaling compute and data, rather than relying on hand-engineered human knowledge.

Topics

Reinforcement Learning AI & Machine Learning Technology & Innovation Science & Mathematics Large Language Models Software Engineering Protein Folding Formal Verification Agentic Programming Voice Artificial Intelligence

Mentioned in this video

People

Luke Worthwine

AI token maxer, presenter discussing his work.

Robert George

Presenter discussing Lean for science, a PhD student at Caltech.

Magnus Carlsen

Mentioned in the context of continuous learning and monotonic improvement.

Steve Quake

Co-advisor of the speaker at Stanford, known for work in bioengineering and former director of Biohub.

Richard Sutton

Author of the famous 'Bitter Lesson' article in AI.

Dan Fu

Alumnus from the speaker's lab whose work on looped models is built upon in the ESM projection networks.

Companies

Cohere

Company that released the Composer 2 technical report.

Giga

Company where speaker Arnob is a researcher, working on stream RAG.

Lora

Low-Rank Adaptation, discussed as an alternative for fine-tuning, showing impressive performance at lower sample sizes.

DeepMind

Company that released work on solving mathematical problems and uses formal verification.

Harmonic AI

Company whose AI model solved all problems in the POKAM competition.

Math Inc

Company behind the 'fields metal work' related to AI and math.

Channel 4

The startup founded by Luke Worthwine, focused on consumer entertainment AI and automating development.

OpenAI

Company that claimed to solve an 80-year-old mathematical problem using AI.

Software & Apps

AlphaGo

Mentioned as a predecessor to AlphaZero and a benchmark in AI development.

ESM2

Previous generation of protein language models from the same group, which showed diminishing returns with parameter scaling.

AlphaFold

A landmark AI model in biology for protein structure prediction, which relies on multiple sequence alignments (MSAs).

Composer 2

Technical report from Cohere illustrating the benefits of scaling RL tasks.

Lean

A theorem prover and programming language used for formal mathematics and verification.

Bridge

A framework using Lean as a functional programming language to help LLMs prove code.

PyTorch

A deep learning framework whose style is used for the TorchLean system.

Claude

Mentioned as an orchestrator agent for software development and as a source for generating presentations and code.

Codex

Mentioned as an alternative orchestrator agent for software development.

Git

Version control system, with its work trees discussed as a useful tool for parallel development.

CSL

A project started by Clark Barry's group at Stanford for contributing to CS concepts.

TorchLean

A unified framework for writing and verifying neural networks in Lean.

Organizations

Biohub

Research institution in the Bay Area where the paper on AI for biology was developed.

Drugs & Medications

PD-L1

A target protein for immunotherapy, related to a successful medication for cancer treatment.

Media

A two-player game used as an example where selfplay led to performance beyond human capability.

StarCraft

Real-time strategy game used for audio cues in agentic programming.

Books

Warcraft

Real-time strategy game used as an analogy and for audio cues in agentic programming.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free