How can I make an LLM use up‑to‑date web information?

Use the provider's internet search tool (or a service like Perplexity) so the model can fetch web pages and insert them into the context window; otherwise the model is limited by its pre‑training cutoff. (See the tool introduction at 1862s and examples at 2346s).

What is a 'thinking' model and when should I use it?

A thinking model has been reinforcement‑learned to emit internal reasoning tokens and is useful for difficult tasks (math, coding, logical reasoning); expect it to take longer but often be more accurate. (Discussion starts at 1376s).

How do I get an LLM to run exact computations or plots?

Use a model integration with a programming runtime (e.g., ChatGPT's Python interpreter / Advanced Data Analysis) so the model writes and executes code, producing reproducible results and figures. (See the Python interpreter demo at 3541s and ADA at 3876s).

How can I research a supplement or paper quickly with an LLM?

Use the Deep Research / long‑form research feature (if available) or combine search + citations; treat the output as a first draft and review cited papers manually. (Feature demo at 2525s and AKG example at 2622s).

Can I upload PDFs, papers or book chapters for an LLM to read?

Yes — many LLMs support file uploads so the document text is loaded into the context window and you can ask the model to summarize or answer questions about it. (See paper/book reading demo at 3101s).

Are LLMs safe to use for medical or medication advice?

LLMs can provide common‑knowledge background (e.g., over‑the‑counter meds), but you should verify details on the medication label and consult a clinician for medical decisions. (Illustrated with DayQuil/NyQuil at 796s).

How do I get an LLM to help me read and retain books?

Load the chapter text into the model's context, ask for summaries and clarifications, create flashcards or diagrams, and iterate while reading — the presenter shows this workflow with 'Wealth of Nations'. (See book reading example at 3101s).

What is the best way to collaborate with an LLM on code?

Use development tools with LLM integration (Cursor, Composer) that have direct access to your codebase so the agent can edit files, run installs, and propose commits; always review generated changes. (See Cursor/composer demo at 4469s).

How do LLM voice features differ across providers?

There are two layers: simple speech→text and text→speech wrappers (easy to use), and native 'true audio' models that encode audio tokens directly — the latter enable more natural audio interactions but may be gated by tier. (See voice discussion and demos at 4950s and 5353s).

Can I get an LLM to generate a podcast on demand about a specific paper?

Yes — NotebookLM (and similar tools) can synthesize an audio podcast from uploaded documents and even let you interrupt for questions during playback. (See the NotebookLM podcast demo at 5837s).

How reliable are LLM‑generated data tables and extrapolations?

LLMs can produce tables and trend extrapolations, but they may make implicit assumptions or formatting mistakes — always inspect the generated code and raw numbers (example and gotchas at 3876s & 3950s).

Key Moments

How I use LLMs

Andrej Karpathy

Science & Technology3 min read132 min video

Feb 27, 2025|2,522,482 views|64,613|2,327

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Practical guide to using LLMs: models, thinking, tools, search, code, and multimodal flows.

Key Insights

There is a diverse, fast‑growing LLM ecosystem with incumbents and competitors across major companies and startups.

Tokens form a context window that acts as the model's working memory; manage it by starting new chats for topic shifts.

Thinking/reinforcement learning models improve difficult tasks like math and coding but may incur delays.

Tooling (web search, deep research, data analysis, Python, artifacts) dramatically expands what LLMs can do.

Multimodal capabilities (voice, images, video) enable natural, real-time interactions beyond text.

ECOSYSTEM OVERVIEW AND MODEL LANDSCAPE

OpenAI's ChatGPT popularized conversational LLMs in 2022, and since then the ecosystem has exploded. The video starts with ChatGPT as the incumbent, feature-rich and long‑standing, but it also surveys Gemini, Claude, Grock, and other players from the U.S., Europe, and beyond. It notes startups like Anthropic's Claud, xAI's Gro, and various regional engines; the landscape is tracked on leaderboards such as chatbot arena and Scale AI's seal leaderboard. The takeaway is not to lock in on one tool, but to explore options and mix models for different tasks.

UNDERSTANDING TOKENS, CONTEXT WINDOWS, AND MODEL VARIANTS

Behind the chat bubbles lies a few core ideas. Text is tokenized into small units; your chat forms a one-dimensional token stream that the model consumes and extends. The context window is the model's working memory, which you should manage by starting new chats to reset it when topics shift. The model is a fixed, self-contained Zip-file of parameters shaped by pre-training and post-training; pre-training absorbs internet data, post-training injects a persona via human demonstrations. Prices and tiers vary by provider and model size, influencing speed and capability.

REASONING MODES: RLHF AND THINKING

Thinking models arise from reinforcement learning from human feedback (RLHF), where the model practices solving problems and learns problem‑solving strategies. These models tend to think for longer, producing step-by-step reasoning that can improve accuracy on hard math or coding tasks but slow down responses. The video demonstrates switching from standard GPT-4‑level models to advanced thinking variants (often labeled Pro or “thinking” modes) for stubborn problems like gradient checks. Different providers (GPT‑4o, Claude, Grock, Gemini) offer their own thinking options with varying trade-offs in speed and reliability.

TOOLING AND WORKFLOWS: SEARCH, DEEP RESEARCH, ANALYSIS, AND DIAGRAMS

Tooling is the core multiplier here: the model can use internet search to fetch fresh information, pull in pages into its context, and cite sources so you can verify outputs. Deep Research extends this with tens of minutes of structured inquiry, multiple sources, and a polished report, akin to a lightweight literature synthesis. The video also shows Advanced Data Analysis to plot data programmatically, and Artifacts to generate custom apps or diagrams inside the UI. For coding workflows, tools like Cursor or in-editor prompts let the model write and modify code in your environment.

MULTIMODAL INPUTS AND OUTPUTS: VOICE, IMAGES, VIDEO

Multimodality expands what you can feed the model and what it can return. Speech input can be captured with a mic that transcribes to text, or, on some platforms, advanced voice mode lets the model handle audio tokens directly. Images and videos are uploaded or streamed; images can be captioned, described, or used as prompts, and video input can be treated similarly via camera feeds. The tools also generate images (DALL·E, Ideogram) and even short videos. The experience varies by app and device, but the trend is toward seamless cross‑modal conversation.

PRACTICAL TAKEAWAYS: MEMORIES, CUSTOM GPTS, AND WORKFLOW STRATEGIES

Finally, the video emphasizes practical patterns you can adopt. Use memory to store preferences and tailor responses; leverage custom instructions to set tone and goals; build custom GPTs for language learning or domain tasks to avoid repeating long prompts. The idea of an 'LLM council'—pulling from multiple providers to cross‑check answers— helps mitigate model bias and coverage gaps. For real work, verify critical outputs with sources, start new chats for topic shifts, and pick tools aligned to the task (search for fresh facts, deep research for literature, or code assistants for development).

Mentioned in This Episode

●Supplements

●Products

●Software & Apps

●Tools

●Companies

●Organizations

●Books

●Studies Cited

●Concepts

●People Referenced

LLM Practical Cheat Sheet — Dos & Don'ts

Practical takeaways from this episode

Do This

Do start a new chat when you switch topics to avoid overloading the context window (saves cost and improves relevance).

Do verify important factual outputs (medical, financial, legal) against primary sources — LLMs can be probabilistic and may hallucinate.

Do pick the appropriate model/tier: use reasoning ('thinking') models for hard math/code tasks and fast non‑thinking models for casual writing or brainstorming.

Do use tools (internet search, file upload, Python interpreter) when you need up‑to‑date info, exact computation, or to analyze documents/programs.

Do keep an eye on what tools are available per provider (search, python, deep research, voice, file upload) and use the best fit.

Avoid This

Don't assume every concise answer is correct — always check citations for claims and data.

Don't let a long conversation pile up irrelevant context tokens — prune or start a new chat when needed.

Don't rely on LLMs for high‑stakes diagnoses, legal advice, or unverified scientific claims without consulting experts.

Don't copy arbitrary code or plots generated without reviewing the code — LLMs can make implicit assumptions or errors.

Common Questions

Start a new chat whenever you change topics or no longer need the prior conversation context — it clears the context window, reduces distraction, and lowers token cost. (See the advice at 984s).

Topics

Thinking Models Internet Search Deep Research Python Interpreter Advanced Data Analysis File Upload Voice/advanced Voice Custom GPTs

Mentioned in this video

Software & Apps

GPT 40 mini

A smaller, free‑tier variant of GPT 40 mentioned in the discussion of tiers and pricing.

Chatbot Arena

A leaderboard / ranking site for comparing chat models (mentioned as a way to track models).

Scale leaderboard

Another leaderboard/eval site (referred to when discussing ways to monitor model performance).

Tik tokenizer

A tokenizer app (used to show tokenization and token counts of prompts and responses).

Advanced Data Analysis

ChatGPT tool (Python + plotting integration) used to analyze data, create plots and run code in the conversation.

Claude Artifacts

Claude feature that can generate runnable in‑browser apps (used to produce flashcard apps and Mermaid diagrams).

Composer (Cursor)

Cursor's higher‑level agent/assistant that can modify multiple files, run installs, and autonomously update a codebase.

Mermaid diagrams (diagram generation)

Used by Claude Artifacts to render conceptual diagrams from book chapters and other texts.

Deep research

ChatGPT Pro feature that performs long-form research combining internet search + extended reasoning (demoed on supplements).

Python interpreter

The runtime environment ChatGPT can call to compute exact results and run user‑provided code (used to avoid hallucinated math).

Mermaid

A diagramming syntax/library Claude used to produce conceptual diagrams from text (e.g., book chapters).

NotebookLM

Google's NotebookLM (notebook. ai) style tool demoed for generating on‑demand podcasts and interactive audio from documents.

DALL·E (image generation)

Text→image model family referenced when generating images for thumbnails and summarizing headlines (referred to in the transcript as the image generator tied to ChatGPT).

Supplements

NyQuil

Over‑the‑counter night relief medication discussed alongside DayQuil for cold symptoms.

DayQuil

Over‑the‑counter medication mentioned when asking the model about remedies for a runny nose.

AKG

One active ingredient in the Longevity Mix that the presenter asks the model to research (mechanism, studies, safety).

Concepts

Longevity mix

A multi‑supplement product (Brian Johnson's mix) used as a Deep Research example to investigate ingredients.

Companies

Colgate

Colgate toothpaste label scanned and discussed with the LLM to interpret ingredients and safety.

Books

Wealth of Nations

Adam Smith's 1776 book used as an example of reading a long historical text together with an LLM.

Jack Weatherford

Author of 'Genghis Khan and the Making of the Modern World', referenced during a camera demo (book visible on shelf).

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free