What skills from trading are transferable to AI research?

Trading, like high-frequency trading, emphasizes attention to detail, brutal optimization, and squeezing performance from a system. These skills in hard-core optimization and dealing with a 'real world' metric can transfer to AI research, which also requires precision and rigorous problem-solving.

Why do some AI models struggle with simple human tasks despite excelling at complex ones like math?

This is attributed to the 'jagged frontier' analogy, where models excel in areas influenced by their training data and discoverable patterns. Humans possess broader context, biological wiring for tasks like vision, and the natural ability to learn from single tasks and apply lessons to future ones, which models are still developing.

What is OpenAI's stance on the 'pre-training is dead' narrative?

OpenAI firmly believes in the power of scaling laws and disagrees with the idea that pre-training is dead. They argue that historical bottlenecks have always been overcome with better engineering and research insights, and continued scaling is key.

How does OpenAI balance risk-taking in research with the need for results?

OpenAI takes many high-risk bets, acknowledging that some will not pan out, which they see as a source of their success. They emphasize a postmortem process to avoid self-delusion, learn from failures, and share insights even from unsuccessful endeavors.

How does OpenAI manage its research roadmap and project prioritization?

The high-level research roadmap remains stable to provide grounding, but implementation details are flexible. Project evaluation and resourcing are reconsidered at key points, such as compute allocation, to ensure priorities align with evolving needs and opportunities.

What is the future of AI research concerning human vs. model contributions?

The future likely involves models handling much of the implementation and orchestration thanks to their improved capabilities. Human researchers will increasingly focus on coming up with ideas and 'taste,' guiding the AI's execution, with models eventually potentially developing their own research taste.

Key Moments

Cooking with OpenAI’s Research Chief: AGI, o1, Evals, and Scaling Laws — Mark Chen

Latent Space Podcast

Science & Technology7 min read42 min video

Jun 25, 2026|1,776 views|71|5

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

OpenAI's research chief believes scaling laws are still valid and crucial for AGI, despite skepticism, and emphasizes the importance of replication and well-designed evals over just benchmark scores.

Key Insights

OpenAI's chief research officer, Mark Chen, remains a strong believer in the continued validity of scaling laws, stating there's no reason they shouldn't continue to hold, as they have for nearly ten orders of magnitude.

Research taste, crucial for AI development, can be best developed through rigorous replication of existing papers, aiming to match exact training curves and losses, a technique that taught Chen many overlooked techniques.

Reinforcement learning (RL) traditionally faces challenges in subjective fields like creative writing where expert opinions can vary widely, making it more effective in objective domains such as math and computer science where correctness is binary.

The field faces an 'evals crisis' due to a low number of canonical benchmarks and the potential for models to 'overfit' onto existing distributions, necessitating continuous innovation in evaluation methods.

OpenAI deliberately separates teams developing evals from those optimizing models to prevent co-incentivization and maintain an adversarial process where evaluators strive to create genuinely challenging tests.

While models excel at complex tasks like solving IMO problems, they still struggle with mundane human-like capabilities, partly due to a lack of broader context and biological wiring, though long-context learning and engineering shortcuts are being developed.

Scaling laws and the pre-training paradigm remain central to AGI development

Mark Chen, OpenAI's Chief Research Officer, firmly believes in the enduring power of scaling laws, stating there's no reason they should cease to hold, given their consistent validity across nearly ten orders of magnitude of development. He dismisses claims that pre-training is dead, characterizing such narratives as recurring skepticism that has historically been overcome by engineering improvements and new research insights. Chen likens the current advancements to previous phases where bottlenecks were identified and then surpassed, suggesting that continued careful research, data engineering, and scaling will unlock further capabilities. This perspective underpins OpenAI's strategy, emphasizing that fundamental research, even when challenged, is key to pushing AI frontiers. The act of cooking together, a hands-on, real-world activity, serves as a lighthearted backdrop to a serious discussion about the complex research landscape.

Developing research taste through replication and learning from the past

For aspiring AI researchers, particularly those without formal training, Chen highlights replication as the most effective mechanism for developing 'research taste.' He emphasizes replicating papers to match exact training curves and loss metrics, a process that reveals crucial, often unstated, techniques. Chen’s own journey into AI was inspired by AlphaGo's matches and subsequently led him to work on Deep Q-Networks (DQN). He notes that the current era feels like witnessing 'move 37s' across various fields, indicating rapid, profound advancements. This sentiment is echoed by the observation that many professionals are now realizing AI agents can perform long-horizon, meaningful work in their domains, signifying a paradigm shift in human-AI collaboration and task execution.

NavigatingRL's challenges in subjective domains and the 'evals crisis'

Reinforcement learning (RL), traditionally powerful in objective tasks, faces headwinds in domains where outcomes are subjective, such as creative writing, where expert opinions can differ significantly. This makes grading and direct application difficult. In contrast, RL excels in fields like mathematics and computer science where correctness is clearly defined. This distinction leads to the broader 'evals crisis' impacting the field. The sheer power of current models and their ability to surpass top human performance, even on benchmarks like IMO questions, raises the question of how to evaluate superhuman intelligence. The scarcity of canonical, gold-standard benchmarks, coupled with the risk of models overfitting to existing evaluation distributions, means that newly developed evaluation methods are crucial. Tools like CodeX have been instrumental in enabling rapid, high-quality eval creation, allowing for faster iteration and a better understanding of model capabilities in real-world, long-horizon tasks.

The rationale behind OpenAI's research bets and organizational structure

OpenAI strategically allocates compute to key research bets, with a high-level roadmap remaining stable to provide direction, while implementation details evolve. This roadmap encompasses foundational areas: pre-training for world knowledge, RL for reasoning and insight chaining, and alignment/post-training. The company actively seeks new bets that unlock different or more aggressive scaling properties. To manage this, OpenAI focuses its bets, typically on three to five core initiatives per 'org,' empowering managers with both directed compute for major bets and flexible pools for exploration. This approach balances top-down strategic vision, driven by respected research leaders, with bottom-up innovation, where researchers can bring compelling evidence for new directions. Decisions on resource allocation are critical, prompting regular reviews to ensure compute and talent are applied to the highest-priority areas.

Identifying research potential and the value of 'research taste'

Identifying potential researchers is a challenging but crucial task. While past research output is a primary indicator, experienced managers develop an intuition for a candidate's thinking process, the nature of their ideas, and whether their intuition aligns with established research directions. This 'gut check' is difficult to assess fully at the outset, with clear trajectories often emerging within six to twelve months. OpenAI recognizes diverse forms of impact, from those who execute clear ideas swiftly to those who propose ambitious, 'moonshot' concepts that fundamentally shift perspectives. The distinction between top engineers and top researchers lies in the inherent uncertainty of research. While engineering principles can follow known patterns, research success hinges on 'research taste'—the ability to identify promising directions, articulate their value, and integrate them into the core research strategy.

The challenge of evals: avoiding benchmark overfitting and cultivating new methods

A significant challenge in AI development is avoiding 'benchmark maxing,' where models become overfitted to specific evaluation distributions, leading to an inaccurate representation of their true generalization capabilities. Chen emphasizes the need to operate across diverse and representative eval mixtures and to continuously invest in creating new evaluation methods. The philosophy at OpenAI is that once an eval is widely known, it is no longer truly effective. To combat this, they partner with external organizations to develop novel, high-quality evaluations, particularly in difficult areas like math and science. A key strategy is to separate the teams responsible for creating evals from those optimizing the models. This adversarial approach ensures that evaluators are incentivized to create challenging benchmarks, preventing the self-deception that can arise from internal optimization.

The evolving role of researchers: orchestration and the future of idea generation

The landscape of AI research is shifting towards orchestration, where models are increasingly capable of implementation and execution, placing greater value on human researchers' ability to generate novel ideas. While both idea generation and execution remain important, there's a market shift favoring the conception of many ideas, with models handling the execution. This marks a significant evolution in research methodology. However, there's a recognition that models currently lack 'taste'—the intuitive judgment for what constitutes a good research idea—making the researcher's role in ideation critical. While acknowledging that models may eventually develop taste, the immediate benefit lies in accelerating research through automated execution and orchestration, enabling a more efficient pace toward AGI.

Embracing failure and the importance of post-mortems

OpenAI’s strategy involves taking significant high-risk bets, which inherently means some will not pan out. A crucial part of their 'alpha' is the ability to learn from these failures. When a bet doesn't succeed, it's vital to avoid self-delusion and disengage from the idea. This involves retrospective analysis, identifying whether an idea was less important than initially thought, if a better approach emerged, or if discoveries were made that were not directly related to the original goal. Even unsuccessful research efforts yield valuable insights. Write-ups from failed projects often become important resources, naturally leading to ideas that can be built upon, saving others from repeating the same work. This positive view of failure is balanced with the expectation that researchers must eventually deliver impactful contributions, justifying ambitious, riskier endeavors with periodic major successes based on sound, albeit ambitious, ideas.

Mentioned in This Episode

●Software & Apps

●Organizations

●Concepts

●People Referenced

Common Questions

Mark Chen suggests that replicating existing research papers is the best way to develop research taste and learn practical techniques. While formal training is valuable, the ability to creatively problem-solve and think outside the box is crucial.

Topics

Reinforcement Learning AI & Machine Learning Technology & Innovation Science & Mathematics AI Evaluation Model Training AI Development Scaling Laws Artificial General Intelligence Machine Learning Research Research Strategy

Mentioned in this video

Companies

OpenAI

Mark Chen's employer and a leading AI research laboratory, discussed in relation to its research roadmap, hiring philosophy, and approach to AGI development.

Y Combinator

Mentioned in relation to the value of research taste and execution, though the transcript garbles the name to 'Yakami'. The context points to 'Y Combinator' or a similar respected entity in tech.

People

Mark Zuckerberg

Mentioned as an inspiration for Mark Chen bringing soup to researchers to poach them, highlighting a humorous anecdote in the competitive AI landscape.

Sam Altman

Mentioned for a tweet suggesting high-frequency traders consider joining OpenAI, linking technical skills from trading to AI research.

Lee Sedol

A professional Go player who famously lost to AlphaGo, marking a significant milestone in AI development that inspired many researchers.

Software & Apps

ResNet

A deep learning architecture mentioned as an example of a paper that Mark Chen replicated to develop his research skills.

Pixel CNNs

Mentioned alongside ResNet as an example of influential research papers that Mark Chen replicated to learn practical AI techniques.

AlphaGo

The AI system that played Lee Sedol is cited as a pivotal moment that inspired many to enter the field of AI research, including Mark Chen.

DQN

Deep Q-Network, a reinforcement learning algorithm, was Mark Chen's first major project after being inspired by AlphaGo.

Codex

A tool mentioned for enabling rapid iteration of evaluations, allowing for quick creation of high-quality benchmarks in AI development.

Concepts

LLMs

Large Language Models are discussed as a core technology where scaling laws have held true, and the debate around 'pre-training is dead' is addressed.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free