How did Karina Nguyen's background in computer vision influence her work with LLMs?

Her early work with computer vision for investigative journalism at Berkeley exposed her to AI. This experience, combined with seeing Chris Solo's work on elicitability, led her to explore Anthropic and eventually OpenAI, shaping her approach to human-computer interaction with AI.

What were the initial product efforts at Anthropic?

Anthropic's early product development included Claude and Slack, which explored summarizing threads and suggesting new ideas within the Slack interface. While innovative, UX constraints and platform dependencies limited their expansion.

Why did ChatGPT launch before Anthropic's similar web UI?

Although Anthropic had chat interfaces, leadership lacked conviction in deploying Claude 1.3 due to hallucinations. OpenAI's ChatGPT, released earlier, benefited from this hesitation, despite Anthropic's models having creative potential.

What inspired the creation of ChatGPT Canvas?

Inspired by Tom Riddle's diary from Harry Potter, Karina envisioned a shared workspace where humans and AI could collaborate. While the ideas existed two years prior, AI landscape novelty and a lack of focus on UX prioritized other areas.

How are AI models like Claude 3 trained and evaluated?

Training involves managing numerous models with unique 'personalities' and performance quirks. Evaluation is complex, with high-variance metrics and inconsistencies in how different labs run benchmarks, making direct comparisons difficult.

What are the challenges in developing AI product features like Canvas?

A major challenge is anticipating user behavior. Iterative deployment allows for learning from feedback, but decisions must be made on improving core model capabilities versus workarounds, like rewriting entire documents for higher accuracy.

What is the vision for ChatGPT Tasks?

Tasks is envisioned as a foundational module that, when coupled with the model's general capabilities (like search or creative writing), becomes powerful. The goal is proactive AI assistance that learns user preferences and suggests actions.

How is trust built with AI agents?

Trust with AI agents is built gradually through collaboration and consistent, reliable task execution, similar to how humans build trust. This starts with one-off actions and progresses to complex, long-horizon delegation in multi-agent environments.

What is the role of computer use in AI agents?

Computer-using agents are crucial for delegation tasks like ordering flights or books. While vision models and execution accuracy are improving, latency and precise understanding remain key research challenges.

How is ChatGPT evolving towards a generative OS?

ChatGPT is moving beyond text output to generating dynamic interfaces like React apps on the fly, adapting UI to user preferences. This evolving into a task-oriented, generative OS is a key direction for future AI interaction.

What are the key cultural differences between OpenAI and Anthropic?

OpenAI is perceived as more willing to take product risks and explore diverse bets, fostering creative freedom. Anthropic appears more focused, potentially prioritizing enterprise over consumer markets, though research cultures are largely similar.

Key Moments

The Agent Reasoning Interface: Claude, ChatGPT Canvas, Tasks, Operator — with Karina Nguyen, OpenAI

Latent Space Podcast

Science & Technology4 min read67 min video

Feb 1, 2025|4,687 views|96|15

Save to Pod

Key Moments

TL;DR

Karina Nguyen discusses AI interaction paradigms, Claude, ChatGPT Canvas, Tasks, and the future of agents.

Key Insights

Karina Nguyen's team focuses on human-computer interaction and novel methods for improving LLMs for specific tasks.

The development of ChatGPT Canvas was a collaborative effort between research, design, and engineering from the outset.

Launching Claude 3 involved a small, dedicated post-training fine-tuning team, highlighting the importance of rapid iteration and debugging.

Balancing LLM values like honesty, harmlessness, and helpfulness is a complex art, often requiring careful synthetic data generation.

ChatGPT Canvas moves beyond a simple writing assistant to become a collaborative 'scratch pad' for drafting and iteration.

The future of AI interaction likely involves more proactive, personalized agents and a shift towards task-oriented operating systems.

EVOLVING AI INTERACTION PARADIGMS

Karina Nguyen leads a research team at OpenAI focused on creating new interaction paradigms for reasoning interfaces. Her team's work bridges human-computer interaction (HCI) with the advancement of large language models (LLMs). They explore novel methods to improve model performance on specific tasks, often involving extensive model training and synthetic data generation. This involves a full-stack approach, from training models to deploying novel product features that define the future evolution of AI assistants.

JOURNEY THROUGH LLM DEVELOPMENT

Nguyen’s early career involved computer vision applications for investigative journalism, leading her to explore AI. Her path to OpenAI included a stint at Anthropic, where she was an early product designer and front-end engineer, instrumental in building Claude. She co-wrote a significant portion of Claude's initial codebase and was involved in early product forms like Claude and Slack integration. This background provided crucial insights into LLM development and productization, highlighting challenges and innovations faced by different organizations.

THE MAKING OF CLAUDE 3

Nguyen was part of the post-training fine-tuning team for Claude 3, working with a small, dedicated group. This role involved developing new evaluations and writing the model card. She emphasized that each model has unique characteristics and 'personality,' necessitating rapid iteration and debugging to address side effects from contradictory training data. Techniques from software engineering, particularly around data management, proved vital in this intensive development process.

BEHAVIORAL DESIGN AND LLM PERSONALITY

The concept of behavioral design for LLMs, extending product design into model behavior, was a key focus. This involves shaping a model’s persona based on its intended context, such as a collaborator in ChatGPT Canvas. Balancing core values like honesty and helpfulness is complex, requiring careful synthetic data generation to align with principles. This process is described as more art than science, involving decomposing core values into specific scenarios to ensure generalization and consistent behavior.

THE INNOVATION OF CHATGPT CANVAS

Canvas emerged from a need to address edge cases that couldn't be fixed through prompt-based tuning alone. The decision to retrain a specific model for Canvas allowed for rapid iteration based on user feedback and faster deployment. Behavioral engineering was crucial in defining how Canvas should act as a collaborator—when to ask follow-up questions, adjust tone, or modify existing content versus rewriting it. Canvas is positioned as a collaborative 'scratch pad' that can morph into powerful writing or coding IDEs based on user intent.

DEVELOPING CHATGPT TASKS AND FUTURE AGENTS

ChatGPT Tasks, developed rapidly, aims to be a foundational module for various user behaviors. When combined with other capabilities like search or creative writing, Tasks becomes powerful. The vision for agents is a gradual progression from one-off actions to long-horizon delegation, building trust through collaboration. Computer use and multi-agent collaboration are seen as core capabilities for future agents, enabling complex tasks like online ordering or code execution within virtual environments.

THE EVOLUTION TOWARDS A GENERATIVE OS

The trajectory of tools like ChatGPT Search, which generate not just text but interactive outputs like charts, points towards the evolution of ChatGPT into a generative operating system. The UI itself is expected to become more dynamic and personalized, adapting to user preferences. This task-oriented OS approach suggests a future where users interact less with websites directly and more through AI models that curate and present information in user-specific formats.

BUILDING TRUST AND SCALABLE AI PRODUCTS

Building trust in AI agents is paramount, especially for sensitive tasks. This trust is cultivated through consistent collaboration and demonstrating reliability, much like human interactions. The development process for features like Canvas and Tasks emphasizes iterative deployment to gather user feedback quickly. This approach allows for rapid learning and improvement, ensuring that models and product features evolve in tandem with user needs and expectations.

Mentioned in This Episode

●Software & Apps

●Companies

●Organizations

●Studies Cited

●Concepts

●People Referenced

Common Questions

Karina leads a research team at OpenAI focused on creating new interaction paradigms for reasoning interfaces and capabilities like ChatGPT Canvas and Tasks. Her team works on improving AI models through novel training methods and developing new product features.

Topics

Human Performance AI & Machine Learning Technology & Innovation Prompt Engineering Model Evaluation Human-AI Collaboration LLM Development AI Product Design Agent Capabilities Generative AI Interfaces

Mentioned in this video

Companies

OpenAI

Karina Nguyen leads a research team at OpenAI, focusing on new interaction paradigms for reasoning interfaces and capabilities like ChatGPT Canvas and Tasks.

Anthropic

Karina first applied to Anthropic and later joined as a front-end engineer and product designer, being the first to hold such a role. She was involved in early products like Claude and Slack.

Software & Apps

GPT-4

Mentioned in the context of comparing model card numbers and evaluation settings, highlighting the difficulty of apples-to-apples comparisons across different model versions.

Dolly

Mentioned as a potential tool to integrate with Canvas, highlighting the complexities of building evals for multimodal interactions.

Python Advanced Data Analysis

A tool that presents a tricky decision boundary with Canvas, requiring careful derivation of user intent to determine which tool is most appropriate.

ChatGPT Tasks

A recent feature from OpenAI involving streaming Chain of Thought for language models, aiming to improve reasoning and task execution.

Friend

A startup mentioned as attempting to create proactive AI assistants that act like natural friends, similar to the vision for future AI capabilities.

Claude 1.3

An earlier version of Claude that was noted for being extremely creative but also having a lot of hallucinations, which led to discussions about its deployability.

ChatGPT

The release of ChatGPT influenced Anthropic's product direction, and Karina was challenged to reproduce a similar interface within two weeks.

Stanford HELM

A benchmark evaluation where Claude reportedly performed poorly due to incorrect prompting techniques, illustrating the challenges in consistent model evaluation.

ChatGPT Canvas

A feature developed by OpenAI that supports writing and coding, with ongoing work to enhance its capabilities, requiring custom training and synthetic data generation.

Claude

A language model developed by Anthropic, with early product experiments like Claude and Slack, and later the Claude 3 family of models.

Claude 3 Haiku

Mentioned as part of the Claude 3 family, representing smaller, faster models that can improve the performance of computer agents.

E2B

A company specializing in code sandbox solutions, relevant to the discussion on computer use agents and coding environments.

Claude 3

The third generation of Claude models, released as a family (Haiku, Sonnet, Opus), involved in post-training fine-tuning and evaluation efforts by Karina's team.

A model discussed in relation to prompting techniques, where hard constraints help the model select better candidates. It excels at problems requiring specific criteria matching.

CLIP

Karina used CLIP for fashion recommendation search in early prototypes before joining Anthropic.

Locations

Berkeley

Karina Nguyen attended school at Berkeley, where she worked with the Human Rights Center and professors on computer vision applications for investigative journalism.

People

Mira G.

A leader Karina learned from, particularly regarding her interdisciplinary mindset and ability to connect product and research, while balancing safety concerns.

Taylor Swift

Mentioned as an example of a concert that might drive demand for a Facebook Marketplace bot to secure tickets.

Andrew Ng

Karina took a course with Andrew Ng, mentioning a lesson relevant to AI model development processes.

Barrett Z.

Karina's former manager at OpenAI who helped her staff the ChatGPT Canvas project and encouraged its development.

Organizations

New York Times

Karina interned and later worked full-time at The New York Times, focusing on product engineering and R&D prototypes for storytelling features.

Concepts

Constitutional AI

A paper discussing methods for creating model completions that adhere to specific principles, relevant to behavioral design and shaping model behavior.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free