How I use LLMs

Andrej KarpathyAndrej Karpathy
Science & Technology3 min read132 min video
Feb 27, 2025|2,297,209 views|60,565|2,212
Save to Pod

Key Moments

TL;DR

Practical guide to using LLMs: models, thinking, tools, search, code, and multimodal flows.

Key Insights

1

There is a diverse, fast‑growing LLM ecosystem with incumbents and competitors across major companies and startups.

2

Tokens form a context window that acts as the model's working memory; manage it by starting new chats for topic shifts.

3

Thinking/reinforcement learning models improve difficult tasks like math and coding but may incur delays.

4

Tooling (web search, deep research, data analysis, Python, artifacts) dramatically expands what LLMs can do.

5

Multimodal capabilities (voice, images, video) enable natural, real-time interactions beyond text.

ECOSYSTEM OVERVIEW AND MODEL LANDSCAPE

OpenAI's ChatGPT popularized conversational LLMs in 2022, and since then the ecosystem has exploded. The video starts with ChatGPT as the incumbent, feature-rich and long‑standing, but it also surveys Gemini, Claude, Grock, and other players from the U.S., Europe, and beyond. It notes startups like Anthropic's Claud, xAI's Gro, and various regional engines; the landscape is tracked on leaderboards such as chatbot arena and Scale AI's seal leaderboard. The takeaway is not to lock in on one tool, but to explore options and mix models for different tasks.

UNDERSTANDING TOKENS, CONTEXT WINDOWS, AND MODEL VARIANTS

Behind the chat bubbles lies a few core ideas. Text is tokenized into small units; your chat forms a one-dimensional token stream that the model consumes and extends. The context window is the model's working memory, which you should manage by starting new chats to reset it when topics shift. The model is a fixed, self-contained Zip-file of parameters shaped by pre-training and post-training; pre-training absorbs internet data, post-training injects a persona via human demonstrations. Prices and tiers vary by provider and model size, influencing speed and capability.

REASONING MODES: RLHF AND THINKING

Thinking models arise from reinforcement learning from human feedback (RLHF), where the model practices solving problems and learns problem‑solving strategies. These models tend to think for longer, producing step-by-step reasoning that can improve accuracy on hard math or coding tasks but slow down responses. The video demonstrates switching from standard GPT-4‑level models to advanced thinking variants (often labeled Pro or “thinking” modes) for stubborn problems like gradient checks. Different providers (GPT‑4o, Claude, Grock, Gemini) offer their own thinking options with varying trade-offs in speed and reliability.

TOOLING AND WORKFLOWS: SEARCH, DEEP RESEARCH, ANALYSIS, AND DIAGRAMS

Tooling is the core multiplier here: the model can use internet search to fetch fresh information, pull in pages into its context, and cite sources so you can verify outputs. Deep Research extends this with tens of minutes of structured inquiry, multiple sources, and a polished report, akin to a lightweight literature synthesis. The video also shows Advanced Data Analysis to plot data programmatically, and Artifacts to generate custom apps or diagrams inside the UI. For coding workflows, tools like Cursor or in-editor prompts let the model write and modify code in your environment.

MULTIMODAL INPUTS AND OUTPUTS: VOICE, IMAGES, VIDEO

Multimodality expands what you can feed the model and what it can return. Speech input can be captured with a mic that transcribes to text, or, on some platforms, advanced voice mode lets the model handle audio tokens directly. Images and videos are uploaded or streamed; images can be captioned, described, or used as prompts, and video input can be treated similarly via camera feeds. The tools also generate images (DALL·E, Ideogram) and even short videos. The experience varies by app and device, but the trend is toward seamless cross‑modal conversation.

PRACTICAL TAKEAWAYS: MEMORIES, CUSTOM GPTS, AND WORKFLOW STRATEGIES

Finally, the video emphasizes practical patterns you can adopt. Use memory to store preferences and tailor responses; leverage custom instructions to set tone and goals; build custom GPTs for language learning or domain tasks to avoid repeating long prompts. The idea of an 'LLM council'—pulling from multiple providers to cross‑check answers— helps mitigate model bias and coverage gaps. For real work, verify critical outputs with sources, start new chats for topic shifts, and pick tools aligned to the task (search for fresh facts, deep research for literature, or code assistants for development).

LLM Practical Cheat Sheet — Dos & Don'ts

Practical takeaways from this episode

Do This

Do start a new chat when you switch topics to avoid overloading the context window (saves cost and improves relevance).
Do verify important factual outputs (medical, financial, legal) against primary sources — LLMs can be probabilistic and may hallucinate.
Do pick the appropriate model/tier: use reasoning ('thinking') models for hard math/code tasks and fast non‑thinking models for casual writing or brainstorming.
Do use tools (internet search, file upload, Python interpreter) when you need up‑to‑date info, exact computation, or to analyze documents/programs.
Do keep an eye on what tools are available per provider (search, python, deep research, voice, file upload) and use the best fit.

Avoid This

Don't assume every concise answer is correct — always check citations for claims and data.
Don't let a long conversation pile up irrelevant context tokens — prune or start a new chat when needed.
Don't rely on LLMs for high‑stakes diagnoses, legal advice, or unverified scientific claims without consulting experts.
Don't copy arbitrary code or plots generated without reviewing the code — LLMs can make implicit assumptions or errors.

Common Questions

Start a new chat whenever you change topics or no longer need the prior conversation context — it clears the context window, reduces distraction, and lowers token cost. (See the advice at 984s).

Topics

Mentioned in this video

toolGPT 40 mini

A smaller, free‑tier variant of GPT 40 mentioned in the discussion of tiers and pricing.

toolChatbot Arena

A leaderboard / ranking site for comparing chat models (mentioned as a way to track models).

toolScale leaderboard

Another leaderboard/eval site (referred to when discussing ways to monitor model performance).

toolTik tokenizer

A tokenizer app (used to show tokenization and token counts of prompts and responses).

supplementNyQuil

Over‑the‑counter night relief medication discussed alongside DayQuil for cold symptoms.

supplementDayQuil

Over‑the‑counter medication mentioned when asking the model about remedies for a runny nose.

toolLongevity mix

A multi‑supplement product (Brian Johnson's mix) used as a Deep Research example to investigate ingredients.

toolAdvanced Data Analysis

ChatGPT tool (Python + plotting integration) used to analyze data, create plots and run code in the conversation.

toolClaude Artifacts

Claude feature that can generate runnable in‑browser apps (used to produce flashcard apps and Mermaid diagrams).

toolComposer (Cursor)

Cursor's higher‑level agent/assistant that can modify multiple files, run installs, and autonomously update a codebase.

toolColgate

Colgate toothpaste label scanned and discussed with the LLM to interpret ingredients and safety.

toolMermaid diagrams (diagram generation)

Used by Claude Artifacts to render conceptual diagrams from book chapters and other texts.

supplementAKG

One active ingredient in the Longevity Mix that the presenter asks the model to research (mechanism, studies, safety).

toolDeep research

ChatGPT Pro feature that performs long-form research combining internet search + extended reasoning (demoed on supplements).

toolPython interpreter

The runtime environment ChatGPT can call to compute exact results and run user‑provided code (used to avoid hallucinated math).

toolMermaid

A diagramming syntax/library Claude used to produce conceptual diagrams from text (e.g., book chapters).

toolNotebookLM

Google's NotebookLM (notebook. ai) style tool demoed for generating on‑demand podcasts and interactive audio from documents.

bookWealth of Nations

Adam Smith's 1776 book used as an example of reading a long historical text together with an LLM.

bookJack Weatherford

Author of 'Genghis Khan and the Making of the Modern World', referenced during a camera demo (book visible on shelf).

toolDALL·E (image generation)

Text→image model family referenced when generating images for thumbnails and summarizing headlines (referred to in the transcript as the image generator tied to ChatGPT).

More from Andrej Karpathy

View all 14 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free