How do diffusion models enhance drug discovery, particularly in 3D structure prediction?

Diffusion models are proving to be a much more useful primitive for 3D structure prediction in drug discovery compared to earlier models like GANs. They allow for the iterative refinement of predicted structures, guided by physics-based feedback, leading to more accurate and generalizable molecular models essential for understanding protein-ligand interactions.

What is the Pearl model and what makes it unique for small molecule drug discovery?

Pearl is Genesis's structure prediction model that takes a protein sequence and ligand representation to predict their combined 3D structure. Its uniqueness lies in its focus on the vast search space of small molecules and its ability to go beyond mere pattern matching, creating generalizable models capable of extrapolating to unseen targets, partly by generating synthetic training data and incorporating physical priors.

Why is the one angstrom threshold critical for drug discovery AI models?

The one angstrom threshold is crucial for AI models because molecular interactions, especially hydrogen bonds, require extremely high resolution. Predictions above this accuracy, like two angstroms, can lead to fundamental errors (e.g., mispositioning an aromatic ring), making the structural hypothesis useless for medicinal chemists and leading to wrong downstream predictions.

How does Genesis address the challenge of data scarcity in drug discovery AI?

Genesis addresses data scarcity by generating synthetic data using physics-based modeling of small molecule behavior, allowing them to train better models where protein-to-protein interactions are too complex to model computationally. This approach is analogous to pre-training scaling in LLMs.

What is the role of 'agents' in Genesis Molecular AI's drug discovery platform?

Genesis is developing an agentic platform, code-named Sapphire, for automated 24/7 drug discovery. These agents, powered by LLMs, orchestrate various underlying AI models (for pose prediction, potency, ADMET) and make decisions based on predicted crystal structures, allowing medicinal chemists to focus on strategic direction rather than manual tool operation.

How does Genesis collaborate with pharmaceutical partners like Insights?

Genesis partners with pharma companies like Insights to provide AI services. They fine-tune their foundational models on partner data to accelerate drug development (e.g., reaching development candidates faster) and identify first-in-class chemical matter for challenging targets without known binders. This partnership involves rapid design-make-test-analyze cycles and continuous model retraining.

What are the limitations of robotic labs and automation in current drug discovery?

While automation can perform routine tasks quickly, current robotic labs struggle with the cutting-edge aspects of drug discovery. Synthesizing novel molecules, purifying them, and accurately characterizing them remains complex and labor-intensive. High-throughput screens also suffer from high false-positive rates due to the challenges of finding true outliers and broad chemical space exploration, making human expertise indispensable for complex research.

What is the biggest bottleneck faced by the AI drug discovery industry?

According to the guests, the most significant bottleneck is the shortage of GPUs, as large language model companies are consuming most of the available capacity. They argue that drug discovery is critical for humanity and that chipmakers should invest more in life sciences AI due to sustained demand and diminishing returns in pure LLM applications.

What does 'ADMET' stand for and why is it important in drug discovery?

ADMET stands for Absorption, Distribution, Metabolism, Elimination, and Toxicity. These are critical properties a molecule must possess to be a successful drug, ensuring it is orally bioavailable, reaches the target tissue, is metabolized safely, eliminated effectively, and does not cause adverse side effects. Predicting these properties is as important as predicting crystal structure.

Key Moments

🔬 "The Most Innovative Diffusion Research Is Happening in Drug Discovery, Not Image Generation"

Q: What is the primary focus of Genesis Molecular AI and how has it evolved?

Genesis Molecular AI, formerly Genesis Therapeutics, focuses on applying deep learning to drug discovery. Initially, they aimed to show AI's practical value in biotech, evolving from a pure AI research company to a full-stack entity that partners with pharma while also developing its own pipeline, recognizing the shift in pharma's interest in buying AI tools.

Latent Space Podcast

Science & Technology6 min read109 min video

Jun 30, 2026|904 views|23|7

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

Diffusion models, initially for images, are now revolutionizing drug discovery by predicting 3D molecular structures with unprecedented accuracy, enabling the development of 'undruggable' medicines.

Key Insights

Drug discovery AI research is shifting from image generation to 3D structure prediction, with diffusion models proving much more useful for proteins and small molecules.

Genesis's PEARL model predicts protein-ligand complex structures with sub-angstrom accuracy, a level crucial for predicting binding affinity and designing effective drugs.

The pharmaceutical industry increasingly relies on specialized AI tools, with Genesis focusing on small and medium-sized molecule discovery, which represent 65% of FDA-approved drugs.

Genesis is leveraging synthetic data, physics-based simulations, and LLM-inspired 'thinking' techniques to train its AI models for drug discovery.

The company emphasizes integrating AI models with lab experiments through partnerships, enabling rapid feedback loops for continuous model improvement.

While AI agents can automate tasks, human expertise remains critical for strategic direction and validating AI-generated hypotheses in drug discovery.

Shifting AI innovation from images to molecular structures

The frontier of AI innovation has notably shifted from image generation to 3D structure prediction, particularly in drug discovery. Historically, generative adversarial networks (GANs) were considered the future for image generation, but they proved less effective for complex biological systems like proteins. The advent of diffusion models, however, has provided a more powerful primitive. Sergey Udov, formerly leading Llama 2 and 3 pretraining at Meta, highlights that the most innovative diffusion research is now happening in the field of 3D structure prediction, a development few would have predicted. This shift is enabling breakthroughs in understanding how molecules interact with biological targets, a critical step in designing new medicines.

Achieving sub-angstrom accuracy for molecular interactions

Genesis's PEARL (Place Every Atom at the Right Location) model exemplifies this advancement by accurately predicting the 3D structures of protein-ligand complexes. Achieving accuracy down to sub-angstrom resolution is paramount because the precise positioning of individual atoms dictates molecular interactions. Traditional benchmarks using two-angstrom accuracy are insufficient, as they can allow for significant errors, such as flipped aromatic rings, which drastically alter a molecule's behavior. PEARL's ability to predict structures with sub-angstrom precision is crucial for accurately assessing binding affinity and designing potent drug candidates, especially for targets previously considered 'undruggable.' This level of detail is essential for any downstream application, from potency prediction to prospective molecular design.

The challenge of vast chemical space and synthetic data generation

The search for new drugs is hindered by an astronomically large chemical space, estimated to contain 10^60 drug-like small molecules. To navigate this, AI models require extensive training data. While public databases like the Protein Data Bank (PDB) offer crystal structures, they are limited in size and slow to expand, growing at a 'glacial pace.' Genesis overcomes this by leveraging physics-based simulations to model small molecules, generating synthetic data at a much lower cost. This approach allows for training models on a more comprehensive dataset than currently available from experimental structures alone. This synthetic data, combined with real-world data, forms the foundation for their powerful generative models.

Applying LLM scaling principles to drug discovery

The development of Genesis's models draws parallels to the scaling strategies used for large language models (LLMs). This includes pre-training on vast datasets, followed by fine-tuning and inference-time scaling. A key innovation is 'inference time scaling,' analogous to LLMs 'thinking' in tokens. In Genesis's models, this translates to iterative refinement of predicted structures in memory, guided by physics-based principles. This allows the model to 'think' about potential molecular configurations, improving performance significantly. The core of their models also utilizes diffusion-based heads, which are inherently iterative, and can be steered during the refinement process.

Integrating physics and AI for robust and interpretable models

Genesis emphasizes incorporating physical priors into their AI models without overly biasing them. This approach views AI as representation learning, similar to how convolutional neural networks bake in image priors or transformers capture language sequence properties. In drug discovery, this means understanding molecular behavior through physics. The input stage involves generating diverse training data using physics, the model architecture aims to minimize relearning physics principles, and the output stage enforces physicality. This holistic approach enhances both model performance and interpretability, ensuring outputs are physically meaningful and useful for downstream applications.

Focus on small and medium-sized molecules

Genesis is strategically focused on small and medium-sized molecule discovery, which include traditional pills (small molecules) and modalities like macrocycles and peptides. Small molecules constitute 65% of FDA-approved drugs, making this a critical area. This focus allows for a deep understanding of the specific needs of drug hunters and medicinal chemists. By prioritizing these molecule types, Genesis aims to build models that are directly applicable and usable by human experts, enabling them to discover drugs faster and more effectively than before.

The role of AI agents in automating drug discovery

The company is developing an agentic platform for 24/7 drug discovery, codenamed Sapphire. This builds upon the recent advancements in AI agents, particularly those orchestrated by LLMs. The prerequisite for this platform was ensuring the underlying models for 3D pose prediction, potency, and ADMET properties were robust enough for an agent to use them continuously. These agents can leverage complex tools and orchestrate them without requiring deep expertise from human chemists in every specific parameter. This transforms human roles into grand strategists, guiding the agents which execute complex tasks, significantly boosting productivity and creativity.

Addressing ADMET properties beyond structure prediction

While 3D structure prediction is crucial, developing a successful drug requires optimizing numerous other properties, collectively known as ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). Genesis has a long history of developing models for these properties, recognizing that a molecule's effectiveness hinges on more than just binding affinity. Predicting over 30 different ADMET properties, such as solubility, oral bioavailability, and potential for cardiotoxicity, is vital. These properties often involve complex biological pathways and are sometimes hard to predict due to small, sparse datasets in the public domain. Genesis's full-stack approach aims to build models for all aspects of drug discovery, not just structure prediction.

The Genesis/Insight partnership and lab-in-the-loop development

A key partnership with Insight enables a 'lab-in-the-loop' approach, crucial for continuous model improvement. Insight's expertise in experimental capabilities and data generation provides rapid feedback on Genesis's AI predictions. This synergy allows for rapid design-make-test-analyze cycles, where new molecules are synthesized based on AI predictions, their properties measured in the lab, and the results fed back to retrain and refine the models. This close collaboration is vital, especially given the complexities of chemical synthesis and the need for rigorous validation, which often limits purely robotic automation.

The importance of high-resolution data and evolving benchmarks

Genesis's commitment to sub-angstrom resolution is driven by the practical limitations of lower-resolution predictions in drug discovery. The company highlights that historical benchmarks like RMSD < 2 Angstroms are insufficient and that the field is moving towards more rigorous metrics like pose busters and LDDT. The recent Open-Blink challenge, which featured a dynamic target, further demonstrated PEARL's superior performance over other publicly available models, showcasing its ability to handle complex biological scenarios that haven't been seen during training. This focus on high-resolution, clinically relevant benchmarks is essential for developing AI models that truly accelerate medicine development.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Books

●Drugs & Medications

●Studies Cited

●Concepts

●People Referenced

Common Questions

Genesis Molecular AI, formerly Genesis Therapeutics, focuses on applying deep learning to drug discovery. Initially, they aimed to show AI's practical value in biotech, evolving from a pure AI research company to a full-stack entity that partners with pharma while also developing its own pipeline, recognizing the shift in pharma's interest in buying AI tools.

Topics

Ai Agents AI & Machine Learning Science & Mathematics Drug Discovery Deep Learning Structural Biology Medicine & Clinical Protein-ligand Binding Molecular Simulation Pharmaceutical Development

Mentioned in this video

People

R.J. Haniki

CTO of Mirmix and co-host of the podcast.

Brandon Sanderson

Co-host of the podcast.

Evan Feinberg

Founder and CEO of Genesis Molecular AI, with a background in physics and computer science, focused on AI for medicine.

Sergey Udov

Former lead of LLaMA 2 and LLaMA 3 pre-training at Facebook, now CTO of Genesis Molecular AI, bringing his expertise in large language models to drug discovery.

Concepts

diffusion models

A class of generative AI models that superseded GANs in utility for drug discovery, particularly in 3D structure prediction, and are considered a more useful primitive for the space.

OpenBMMT

A benchmark set that Genesis Molecular AI uses, incorporating improved metrics like RMSD, PoseBusters validity, and LDDT, reflecting the evolving standards in the field.

PoseBusters

A metric for evaluating the physical validity of 3D poses, developed by a lab at the University of Oxford, it improved upon traditional RMSD benchmarks.

DNA-encoded libraries

A high-throughput screening method for drug discovery, mentioned for its high false positive rate when predicting actual molecule efficacy.

Software & Apps

Llama 3

A large language model that Sergey Udov led the pre-training for at Facebook.

OpenFold 3

A protein folding prediction model, mentioned alongside AlphaFold 3 as having similar ideas.

Claude

A large language model, used as an example of a pattern-matching AI that performs best on data similar to its training set.

PyTorch

A machine learning framework used by Genesis Molecular AI, mentioned in the context of their early work and scaling graph neural networks.

ESMFold

A protein structure prediction model mentioned as having architectural similarities or drawing inspiration from Genesis's model architecture.

LLaMA 2

A large language model that Sergey Udov led the pre-training for at Facebook.

ChatGPT

A large language model whose early versions significantly improved over time, serving as an analogy for iterative development in AI for drug discovery.

Pearl

Genesis Molecular AI's structure prediction model that takes a protein sequence and a ligand representation to predict their 3D structure together, noted for its ability to go beyond pattern matching and deliver generalizable models.

AlphaFold 3

A protein folding prediction model, mentioned in context of co-folding, though the discussion clarifies that protein folding itself is not the sole solution to drug discovery.

RoseTTAFold

A protein folding prediction model, mentioned alongside AlphaFold 3 as having similar ideas.

LLMs

Discussed as a parallel for scaling and development stages (pre-training, post-training, inference) for molecular AI, and an analogy for agent systems.

Gemini

A language model mentioned as performing well on SWEBench, but implicitly criticized for not being as practically useful for coding as other competitors.

Sapphire

The codename for Genesis Molecular AI's agentic platform for 24/7 drug discovery, designed to orchestrate underlying models for pose prediction, potency, and ADMET.

GitHub Copilot

An AI coding tool, used as an analogy for the progression of AI agents, from basic autocomplete utility to more comprehensive code generation.

ChatGPT Roslyn

An initiative related to large language models (LLMs) and life sciences, cited as an example of LLM companies moving into the life sciences space.

Companies

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free