Key Moments
🔬 "The Most Innovative Diffusion Research Is Happening in Drug Discovery, Not Image Generation"
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
Diffusion models, initially for images, are now revolutionizing drug discovery by predicting 3D molecular structures with unprecedented accuracy, enabling the development of 'undruggable' medicines.
Key Insights
Drug discovery AI research is shifting from image generation to 3D structure prediction, with diffusion models proving much more useful for proteins and small molecules.
Genesis's PEARL model predicts protein-ligand complex structures with sub-angstrom accuracy, a level crucial for predicting binding affinity and designing effective drugs.
The pharmaceutical industry increasingly relies on specialized AI tools, with Genesis focusing on small and medium-sized molecule discovery, which represent 65% of FDA-approved drugs.
Genesis is leveraging synthetic data, physics-based simulations, and LLM-inspired 'thinking' techniques to train its AI models for drug discovery.
The company emphasizes integrating AI models with lab experiments through partnerships, enabling rapid feedback loops for continuous model improvement.
While AI agents can automate tasks, human expertise remains critical for strategic direction and validating AI-generated hypotheses in drug discovery.
Shifting AI innovation from images to molecular structures
The frontier of AI innovation has notably shifted from image generation to 3D structure prediction, particularly in drug discovery. Historically, generative adversarial networks (GANs) were considered the future for image generation, but they proved less effective for complex biological systems like proteins. The advent of diffusion models, however, has provided a more powerful primitive. Sergey Udov, formerly leading Llama 2 and 3 pretraining at Meta, highlights that the most innovative diffusion research is now happening in the field of 3D structure prediction, a development few would have predicted. This shift is enabling breakthroughs in understanding how molecules interact with biological targets, a critical step in designing new medicines.
Achieving sub-angstrom accuracy for molecular interactions
Genesis's PEARL (Place Every Atom at the Right Location) model exemplifies this advancement by accurately predicting the 3D structures of protein-ligand complexes. Achieving accuracy down to sub-angstrom resolution is paramount because the precise positioning of individual atoms dictates molecular interactions. Traditional benchmarks using two-angstrom accuracy are insufficient, as they can allow for significant errors, such as flipped aromatic rings, which drastically alter a molecule's behavior. PEARL's ability to predict structures with sub-angstrom precision is crucial for accurately assessing binding affinity and designing potent drug candidates, especially for targets previously considered 'undruggable.' This level of detail is essential for any downstream application, from potency prediction to prospective molecular design.
The challenge of vast chemical space and synthetic data generation
The search for new drugs is hindered by an astronomically large chemical space, estimated to contain 10^60 drug-like small molecules. To navigate this, AI models require extensive training data. While public databases like the Protein Data Bank (PDB) offer crystal structures, they are limited in size and slow to expand, growing at a 'glacial pace.' Genesis overcomes this by leveraging physics-based simulations to model small molecules, generating synthetic data at a much lower cost. This approach allows for training models on a more comprehensive dataset than currently available from experimental structures alone. This synthetic data, combined with real-world data, forms the foundation for their powerful generative models.
Applying LLM scaling principles to drug discovery
The development of Genesis's models draws parallels to the scaling strategies used for large language models (LLMs). This includes pre-training on vast datasets, followed by fine-tuning and inference-time scaling. A key innovation is 'inference time scaling,' analogous to LLMs 'thinking' in tokens. In Genesis's models, this translates to iterative refinement of predicted structures in memory, guided by physics-based principles. This allows the model to 'think' about potential molecular configurations, improving performance significantly. The core of their models also utilizes diffusion-based heads, which are inherently iterative, and can be steered during the refinement process.
Integrating physics and AI for robust and interpretable models
Genesis emphasizes incorporating physical priors into their AI models without overly biasing them. This approach views AI as representation learning, similar to how convolutional neural networks bake in image priors or transformers capture language sequence properties. In drug discovery, this means understanding molecular behavior through physics. The input stage involves generating diverse training data using physics, the model architecture aims to minimize relearning physics principles, and the output stage enforces physicality. This holistic approach enhances both model performance and interpretability, ensuring outputs are physically meaningful and useful for downstream applications.
Focus on small and medium-sized molecules
Genesis is strategically focused on small and medium-sized molecule discovery, which include traditional pills (small molecules) and modalities like macrocycles and peptides. Small molecules constitute 65% of FDA-approved drugs, making this a critical area. This focus allows for a deep understanding of the specific needs of drug hunters and medicinal chemists. By prioritizing these molecule types, Genesis aims to build models that are directly applicable and usable by human experts, enabling them to discover drugs faster and more effectively than before.
The role of AI agents in automating drug discovery
The company is developing an agentic platform for 24/7 drug discovery, codenamed Sapphire. This builds upon the recent advancements in AI agents, particularly those orchestrated by LLMs. The prerequisite for this platform was ensuring the underlying models for 3D pose prediction, potency, and ADMET properties were robust enough for an agent to use them continuously. These agents can leverage complex tools and orchestrate them without requiring deep expertise from human chemists in every specific parameter. This transforms human roles into grand strategists, guiding the agents which execute complex tasks, significantly boosting productivity and creativity.
Addressing ADMET properties beyond structure prediction
While 3D structure prediction is crucial, developing a successful drug requires optimizing numerous other properties, collectively known as ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). Genesis has a long history of developing models for these properties, recognizing that a molecule's effectiveness hinges on more than just binding affinity. Predicting over 30 different ADMET properties, such as solubility, oral bioavailability, and potential for cardiotoxicity, is vital. These properties often involve complex biological pathways and are sometimes hard to predict due to small, sparse datasets in the public domain. Genesis's full-stack approach aims to build models for all aspects of drug discovery, not just structure prediction.
The Genesis/Insight partnership and lab-in-the-loop development
A key partnership with Insight enables a 'lab-in-the-loop' approach, crucial for continuous model improvement. Insight's expertise in experimental capabilities and data generation provides rapid feedback on Genesis's AI predictions. This synergy allows for rapid design-make-test-analyze cycles, where new molecules are synthesized based on AI predictions, their properties measured in the lab, and the results fed back to retrain and refine the models. This close collaboration is vital, especially given the complexities of chemical synthesis and the need for rigorous validation, which often limits purely robotic automation.
The importance of high-resolution data and evolving benchmarks
Genesis's commitment to sub-angstrom resolution is driven by the practical limitations of lower-resolution predictions in drug discovery. The company highlights that historical benchmarks like RMSD < 2 Angstroms are insufficient and that the field is moving towards more rigorous metrics like pose busters and LDDT. The recent Open-Blink challenge, which featured a dynamic target, further demonstrated PEARL's superior performance over other publicly available models, showcasing its ability to handle complex biological scenarios that haven't been seen during training. This focus on high-resolution, clinically relevant benchmarks is essential for developing AI models that truly accelerate medicine development.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Organizations
●Books
●Drugs & Medications
●Studies Cited
●Concepts
●People Referenced
Common Questions
Genesis Molecular AI, formerly Genesis Therapeutics, focuses on applying deep learning to drug discovery. Initially, they aimed to show AI's practical value in biotech, evolving from a pure AI research company to a full-stack entity that partners with pharma while also developing its own pipeline, recognizing the shift in pharma's interest in buying AI tools.
Topics
Mentioned in this video
CTO of Mirmix and co-host of the podcast.
Co-host of the podcast.
Founder and CEO of Genesis Molecular AI, with a background in physics and computer science, focused on AI for medicine.
Former lead of LLaMA 2 and LLaMA 3 pre-training at Facebook, now CTO of Genesis Molecular AI, bringing his expertise in large language models to drug discovery.
A class of generative AI models that superseded GANs in utility for drug discovery, particularly in 3D structure prediction, and are considered a more useful primitive for the space.
A benchmark set that Genesis Molecular AI uses, incorporating improved metrics like RMSD, PoseBusters validity, and LDDT, reflecting the evolving standards in the field.
A metric for evaluating the physical validity of 3D poses, developed by a lab at the University of Oxford, it improved upon traditional RMSD benchmarks.
A high-throughput screening method for drug discovery, mentioned for its high false positive rate when predicting actual molecule efficacy.
A large language model that Sergey Udov led the pre-training for at Facebook.
A protein folding prediction model, mentioned alongside AlphaFold 3 as having similar ideas.
A large language model, used as an example of a pattern-matching AI that performs best on data similar to its training set.
A machine learning framework used by Genesis Molecular AI, mentioned in the context of their early work and scaling graph neural networks.
A protein structure prediction model mentioned as having architectural similarities or drawing inspiration from Genesis's model architecture.
A large language model that Sergey Udov led the pre-training for at Facebook.
A large language model whose early versions significantly improved over time, serving as an analogy for iterative development in AI for drug discovery.
Genesis Molecular AI's structure prediction model that takes a protein sequence and a ligand representation to predict their 3D structure together, noted for its ability to go beyond pattern matching and deliver generalizable models.
A protein folding prediction model, mentioned in context of co-folding, though the discussion clarifies that protein folding itself is not the sole solution to drug discovery.
A protein folding prediction model, mentioned alongside AlphaFold 3 as having similar ideas.
Discussed as a parallel for scaling and development stages (pre-training, post-training, inference) for molecular AI, and an analogy for agent systems.
A language model mentioned as performing well on SWEBench, but implicitly criticized for not being as practically useful for coding as other competitors.
The codename for Genesis Molecular AI's agentic platform for 24/7 drug discovery, designed to orchestrate underlying models for pose prediction, potency, and ADMET.
An AI coding tool, used as an analogy for the progression of AI agents, from basic autocomplete utility to more comprehensive code generation.
An initiative related to large language models (LLMs) and life sciences, cited as an example of LLM companies moving into the life sciences space.
The company where Sergey Udov previously worked, leading the LLaMA 2 research team.
An autonomous driving technology company, mentioned as an example of rapid technological advancement.
A company founded by Evan Feinberg, focusing on using AI, particularly diffusion models, for drug discovery and molecular design, including 3D structure prediction.
A large pharmaceutical company with whom Genesis Molecular AI partners, providing AI services for drug discovery.
A company that Genesis Molecular AI expanded its partnership with, known for its experimental capabilities in data generation for drug discovery campaigns.
The former name of Genesis Molecular AI, reflecting their initial focus and desire to show seriousness in the biotech domain.
A technology company that has invested in and collaborated closely with Genesis Molecular AI, particularly on the Pearl model and advocating for life sciences research applications of GPUs.
The institution where Evan Feinberg pursued his PhD and conducted research in AI for molecular systems, which led to the founding of Genesis.
The regulatory body for drug approval, mentioned in the context of drug success rates and the importance of clinical trials.
Facebook AI Research team, where Sergey Udov did a lot of AI research.
The lab at Stanford where Evan Feinberg completed his PhD, involved in early machine learning research applied to molecules.
A public database of historical crystal structures, representing a limited dataset for AI models compared to the internet for LLMs.
The European Medicines Agency, a regulatory body for drug approval, mentioned alongside the FDA.
A lab at this university developed PoseBusters, a metric that improved upon RMSD for evaluating physical validity in pose prediction, acknowledging the limitations of RMSD less than 2.
A scientific journal where a paper was published highlighting the limitations of AlphaFold for traditional docking methods.
An early graph-based network paper published by Evan Feinberg's lab at Stanford, influential in the machine learning for molecular space.
A published work on multitask graph neural networks for ADMET prediction, cited thousands of times and influential in the field.
More from Latent Space
View all 233 summaries
24 minThe Blueprint for Autonomous Work Agents | Gavriel Cohen, NanoClaw
42 minCooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen
71 minThe Agent Cloud: Databricks’ Bet on the Future of AI — Matei Zaharia and Reynold Xin
68 minAI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free