Brendan Foody on Teaching AI and the Future of Knowledge Work

Conversations with TylerConversations with Tyler
News & Politics4 min read63 min video
Jan 7, 2026|6,931 views|112|5
Save to Pod
TL;DR

AI reshapes knowledge work via expert-led evals, long-horizon tasks, and RL-enabled training.

Key Insights

1

Apex measures economic value, not just test scores, by mapping expert time and tasks to real customer outcomes.

2

Expert-driven rubrics and evaluations (e.g., Summers, Sunstein, Topol) help align AI progress with domain needs (law, medicine, finance).

3

Long-horizon tasks and multi-tool use are the current frontiers; next 6–12 months may unlock substantially more capable models.

4

Most knowledge-work roles will shift to training agents and building RL environments, creating a new class of AI-focused jobs.

5

Education and labor markets will be transformed by personal tutors and improved matching, with privacy and data-use concerns central.

6

Talent assessment benefits from project-based demonstrations over vibes; scalable interviewing with AI-driven analysis is a growing trend.

APEX: MEASURING ECONOMIC VALUE BEYOND TEST SCORES

Brendan Foody describes Apex as the AI productivity index that shifts focus from academic evals to outcomes that customers actually care about. The approach starts with large surveys of industry experts to understand how professionals spend their time, then uses those insights to craft prompts and rubrics that reflect real workflows. The measured value is anchored in economic impact, not merely performance on tests. This means translating a model’s capabilities into tangible savings or revenue for the domains in law, medicine, and finance, while acknowledging the complexities of mapping output quality to real-world outcomes.

EXPERTS, RUBRICS, AND THE ART OF JUDGMENT

To navigate subjective domains like poetry or nuanced law, Foody emphasizes pairing top experts with structured rubrics. He cites high-profile collaborators (e.g., Summers, Sunstein, Topol) to help segment industries, define success criteria, and outline evaluation protocols. Rubrics anchor assessments, but there remains a tension with taste and judgment—areas where formal rubrics may fall short. Alternatives like comparing multiple model outputs and preference data (RHF) are discussed as ways to capture nuanced user preferences beyond rigid criteria.

LONG-HORIZON TASKS AND MULTI-TOOL WORKFLOWS

A core challenge is tasks that unfold over long horizons and require coordinating multiple tools and humans. Foody highlights that next-generation evaluations will measure a model’s ability to plan across extended timelines, manage complex workflows, and interact with human collaborators. He foresees substantial progress in 6–12 months, with models learning to operate more effectively within entire workspaces and to hill-climb improvements through iterative feedback, suggesting a future where AI agents perform extended projects with increasing independence.

ECONOMIC IMPACT, PROGRESS TIMELINES, AND ELASTICITY

Foody argues that the rate of improvement in economically valuable tasks can be rapid, evidenced by notable gains since earlier frontier models. He discusses how to quantify impact by linking expert time allocation to pay for outcomes, while noting medicine’s need for high reliability. He also points to elasticity: software and knowledge-work tooling are highly price-elastic, meaning efficiency gains can scale the workforce rather than shrink it. Predictions touch on meaningful progress within a few years and a shift in how businesses allocate capital and labor.

THE FUTURE OF KNOWLEDGE WORK: NEW ROLES AND RL ENVIRONMENTS

A salient theme is the creation of a new job category: people training agents and building reinforcement learning environments. Instead of performing every analysis, workers will train models to perform repeated tasks, curate data, and optimize workflows. Foody envisions a world where a majority of high-end knowledge workers spend significant time shaping and supervising AI agents, rather than performing every routine calculation themselves. This shift mirrors software development—build once, reuse across many tasks—applied to knowledge work.

EDUCATION, PRIVACY, AND PERSONAL DATA DYNAMICS

Education is seen as a major beneficiary, with the prospect of ubiquitous personal tutors that personalize learning at scale. Yet privacy remains a key concern, particularly when data can personalize agents without diluting privacy in base models. Foody cites Apple’s privacy stance as a competitive advantage and underscores the trade-off between personalization and data governance. The debate also touches on wearable devices or pendants that could personalize agents while raising questions about who owns and can access the data.

RECRUITING, INTERVIEWS, AND AI-ENABLED TALENT SCREENS

In talent acquisition, Foody argues for project-based assessments over vibe-driven interviews. He discusses scaling interviews with AI to evaluate candidates more reliably and notes that platforms like Teal Fellowship face structural challenges in matching and breadth. The idea is to enable a broader, fairer assessment by analyzing transcripts or outputs from realistic tasks, thereby improving selection without sacrificing diversity or depth of capability.

POETRY, TASTE, AND DATA FOR TRAINING EVALS

Foody dives into the difficulty of teaching models to excel in highly subjective domains like poetry. He argues that data collection should include rubrics that codify desired traits, while recognizing Kant’s claim that taste may resist complete rubricization. Alternatives such as comparing multiple model outputs and using human preferences to shape the model’s tastes are discussed. The conversation highlights the challenge of balancing expert taste with broad audience appeal when training evaluative metrics.

More from Conversations with Tyler

View all 14 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free