Key Moments
Scaling Past Informal AI - Carina Hong, Axiom Math
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
Formal verification, not just for bug fixing but for scaling AI brilliance, is the critical path to superintelligence, as demonstrated by Axiom Math's perfect score on the Putnam exam.
Key Insights
Axiom Math, founded seven months prior, secured a $200 million Series A funding round at a $1.6 billion valuation.
Axiom Math achieved a perfect score of 120/120 on the 2024 Putnam exam, outperforming both human participants and other AI systems.
Formal verification, as advocated by Axiom Math, is positioned not as a defense against AI hallucination or errors, but as a mechanism for compounding and scaling AI brilliance.
The Lean theorem prover is a key technology used by Axiom Math, functioning as a formal language that enables verified generation and rigorous mathematical proofs.
The annual U.S. math budget for research is approximately $250 million, highlighting the significant investment Axiom Math has attracted.
Axiom Math's approach to formal verification demonstrates performance gains, higher sample efficiency, and the ability to match or exceed human performance on complex tasks, exemplified by the Putnam exam results.
Formal verification as the path to compounded intelligence
Carina Hong, CEO of Axiom Math, argues that the future of AI, particularly superintelligence, hinges on formal verification, not merely as a tool for identifying and fixing errors, but as a fundamental method for scaling and compounding AI capabilities. This contrasts with the common perception of verification as a compliance or bug-fixing exercise. Axiom Math, a seven-month-old company with a team of 30, has demonstrated this potential by achieving a perfect score of 120/120 on the 2024 Putnam exam, outperforming all human and AI competitors. This feat, coupled with a recent $200 million Series A funding round at a $1.6 billion valuation, underscores the significant market belief in their approach. Hong posits that structured and formal data, exemplified by mathematical proofs, possesses greater horizontal transferability than conventionally trained data, leading to more robust and broadly applicable AI reasoning.
The power of Lean and verified generation
Central to Axiom Math's strategy is the use of the Lean theorem prover. Lean is described as a formal language, akin to other proof assistants like Coq or Isabelle, that allows for the rigorous, step-by-step verification of mathematical proofs. Unlike informal mathematical reasoning or natural language proofs, Lean ensures that when a proof compiles and is validated, it is definitively correct. This process can be likened to a type checker for mathematical logic. The 'U-car Howard correspondence' links proofs directly to programs, meaning a verified proof in Lean can be understood as a correct program. This formal system allows mathematicians to leverage Lean not just for its logic capabilities but also for its functional programming aspects, enabling complex computations and even the development of tools like autograd within Lean itself.
Beyond bug fixing: Scaling brilliance, not just eliminating flaws
Hong emphasizes that formal verification, in Axiom Math's view, is not about correcting 'lousiness' or hallucinations but about 'scaling brilliance and compounding brilliance.' Drawing an analogy to Srinivasa Ramanujan, she explains how formal proofs transform intuitive insights into theorems, which then become building blocks for future mathematical advancements. This process of formalization and verification acts as a multiplier for existing intelligence. Traditional human-driven peer review, which can take years, is contrasted with the potential of AI-assisted formal verification. While mathematicians might initially rely on intuition, formal systems aid in handling low-level deductions, freeing them to navigate high-level conceptual spaces more effectively. This is where tools like Lean's 'grind' tactic can handle significant proofs, shocking some observers with their capabilities.
Axiom's performance advantage and future applications
The success on the Putnam exam, where Axiom Math scored 120 points against DeepSeek's 103 and the best human's 110, illustrates the performance benefits of their approach. Hong notes that while frontier labs possess vast resources, startups like Axiom can achieve comparable or superior performance on superhuman tasks through greater sample efficiency derived from formal methods. This approach is not limited to mathematics; Axiom sees formal verification as a foundational element for 'verified AI' applicable across various domains. The company's ambition extends to broadening its scope beyond math, with potential applications in hardware and software verification. For hardware, where partial verification yields no benefit (e.g., a GPU), perfect verification is crucial, making Axiom's technology a significant disruptive force.
The market for formal verification and team composition
The substantial $200 million Series A funding suggests a large market perception for formal verification, dwarfing the annual U.S. math research budget of approximately $250 million. Axiom Math's strategy leverages structured and formal data, akin to how early AI models demonstrated strong transfer learning from coding to reasoning. Their approach involves a system of models, post-trained using RL or SFT on 'Lean data'—data where correctness is inherently known. This allows them to compete effectively despite potentially smaller compute and data budgets compared to frontier labs. The team at Axiom is highlighted as a key differentiator, comprising expert mathematicians who are also users of the systems they develop, combined with applied ML and codegen experts.
Navigating theoretical limitations and future potential
While theoretical results like Gödel's incompleteness theorems and Rice's theorem acknowledge limitations in verifying all programs, Axiom Math focuses on verifying a majority of useful programs. The company's vision is to make verification so performant and accessible that it becomes a standard choice for complex coding tasks, from web development to distributed systems. They are developing 'verify generation' capabilities, where generated code is accompanied by a formal proof of correctness. This is distinct from simply verifying existing code. The benchmark 'Code_Marina' shows advanced performance for formal systems like Axiom's, significantly outperforming general LLMs in generating code with proofs. The challenge remains in specification—humans are not always adept at precisely defining all requirements, but Axiom believes formalization tools and interactive processes can bridge this gap.
Mathematical discovery and the role of intuition
Axiom Math is also investing in 'mathematical discovery' tools, recognizing that proof is not the only critical step in mathematics; conjecture and intuition are equally important. These tools aim to help mathematicians explore problems by suggesting constructions or identifying patterns, essentially aiding in the creative process before formal proving begins. The company plans to open-source codebases related to mathematical discovery, which have been used to solve long-standing conjectures. This reflects a belief that while formal verification can handle rigorous deduction, the generation of novel ideas and conjectures requires different AI approaches, often informed by human intuition and the exploration of examples. The goal is to make these discovery tools accessible to the broader mathematical and scientific community.
The vision for verified AI and broadened impact
Axiom Math's overarching vision is that 'anything that can be defined can be executed, and anything that can be specified can be proven.' They see verification not as a niche requirement for closed industries but as a path to openness and enhanced collaboration, whether human-AI or AI-AI. This verified AI is expected to lead to significant performance gains, higher sample efficiency, and ultimately, a democratized ability to achieve superhuman performance. The company believes that the path to superintelligence must be verified, and they are committed to building this future. Their approach aims to unlock capabilities not only in mathematics and computer science but also in related fields like science and law, leveraging the foundational advancements in reasoning and verification.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
●Concepts
●People Referenced
Common Questions
Axiom Math is a startup founded by Carina Hong, focused on leveraging formal verification to build superhuman AI mathematicians. Their core mission is to scale and compound brilliance through verifiable AI, aiming to solve complex mathematical problems and improve code generation.
Topics
Mentioned in this video
Mathematician who Kenny (from Axiom Math) worked with to build out mathlib, the Lean mathematics library.
A renowned computer scientist and mathematician, whose results were formalized using Claude and AXL tools, showcasing the practical application of Axiom's tools.
A prominent mathematician whose video about using Lean for collaboration is mentioned. Also cited for his database of Erdos problems.
A core contributor to Frontier Math and a benchmark setter, described as a key talent on the Axiom Math team who brings strong capabilities in proving and discovery.
A mathematician involved in blueprint writing for complex formalization projects, mentioned alongside Terence Tao for his role in organizing collaborative math efforts.
A Harvard Law professor and strong appellate litigator, cited as an example of someone with math training excelling in legal fields like appellate litigation due to its logical structure.
A brilliant self-taught mathematician whose intuitions were solidified into theorems after he learned formal proofs at Cambridge, serving as an example of how verification scales brilliance.
Mathematician who collaborated with Ramanujan and Hardy at Cambridge, mentioned in the context of Ramanujan's development of formal proof writing.
CEO and founder of Axiom Math, with a background in neuroscience (UCL Gatsby) and a brief stint in law school before founding the company. Driven by an obsession to build AI that can do math.
An 'OG' in mathematical discovery, member of technical staff at Axiom Math. Previously disproved a 30-year-old conjecture and found the solution to the 130-year-old global Leono function problem.
Professor whose work in the AI for Math community is highlighted for its interesting approach to conjecturing and theory building, suggesting avenues for self-improvement in AI systems.
Cited as an example of a great deep tech company where talented people join forces for a common mission, contrasting with market fragmentation in other fields.
A startup founded by Carina Hong, focused on formal verification for mathematics and AI. Recently raised $200 million in Series A funding at a $1.6 billion valuation.
A prominent AI research lab, mentioned as a frontier lab that, along with Meta and Anthropic, overlooked the potential of structured data for transfer learning in the early days of coding AI.
Aerospace company also mentioned as using formal verification, similar to Boeing, for its critical systems.
An LLM that scored 103 points on the Pudnam exam (out of 120), performing well but still less than Axiom Math's system.
An AI research company that initially focused on coding, demonstrating the potential for transfer learning from structured data to broader reasoning tasks, a concept Axiom Math applies to formalized math.
A competitor of Axiom Math, known for verifying a GPT-generated proof of an Erdos problem and publicizing their solutions to other Erdos problems that were later found to have been solved previously.
Aerospace company using formal verification for its systems, alongside Airbus, highlighting its long-standing use in safety-critical industries.
Mentioned with an anecdote about its mascot, Tim the Beaver, highlighting that theoretical limits (like Rice's Theorem) don't stop practical efforts in engineering and research.
AI research organization, mentioned in the context of other frontier labs that overlooked the power of structured data in AI development.
Researchers from Berkeley, along with Meta, introduced the Code Marena benchmark in 2025.
Carina Hong was a grad student there, studying math and law, before deciding to leave and found Axiom Math.
Mentioned as an early adopter of formal verification for safety-critical systems, specifically for the Arian spacecraft around the time of the Challenger disaster.
AI research company that developed AlphaProof, noted for its initial strong performance in formal math, but subsequent lack of progress in the field due to organizational factors.
A large language model that found a proof to an unsolved Erdos problem, which was then verified by Axiom Math's competitor, Harmonic.
A computer program and formal language for mathematics proofs, crucial for Axiom Math's work. It acts as a type checker for mathematical proofs, based on the Curry-Howard correspondence, and is Turing complete.
A set of proof validation and manipulation tools built for Lean, released to the community for free use. Designed to make large-scale Lean operations more robust and faster.
Harmonic AI's prover system, mentioned for verifying a GPT-generated proof.
A code verification benchmark that is Lean-friendly, where Axiom Math's system achieved 99% accuracy in code with proof generation, significantly outperforming other LLMs.
Has a strong push for automated reasoning due to enterprise customers requiring 100% verified solutions and where general testing is insufficient.
An AI system from Google DeepMind, that achieved significant results in mathematics. Its progress in formal math seemingly stalled due to non-technical reasons at a large organization.
A vast web of conjectures and theorems in mathematics, referred to as an example of highly complex math where human experts capable of understanding and proving non-trivial results are extremely rare.
A theoretical result in computer science stating that any non-trivial property about the language recognized by a Turing machine is undecidable by a general algorithm; discussed in the context of formal verification's limitations.
More from Latent Space
View all 225 summaries
41 min⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @AhmadAwais , CommandCode.ai
78 minWhen AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs
42 minSatya Nadella on AI: @NoPriorsPodcast x Latent Space Crossover Special at Microsoft Build 2026
85 minGitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free