How I use LLMs
Key Moments
Practical guide to using LLMs: models, thinking, tools, search, code, and multimodal flows.
Key Insights
There is a diverse, fast‑growing LLM ecosystem with incumbents and competitors across major companies and startups.
Tokens form a context window that acts as the model's working memory; manage it by starting new chats for topic shifts.
Thinking/reinforcement learning models improve difficult tasks like math and coding but may incur delays.
Tooling (web search, deep research, data analysis, Python, artifacts) dramatically expands what LLMs can do.
Multimodal capabilities (voice, images, video) enable natural, real-time interactions beyond text.
ECOSYSTEM OVERVIEW AND MODEL LANDSCAPE
OpenAI's ChatGPT popularized conversational LLMs in 2022, and since then the ecosystem has exploded. The video starts with ChatGPT as the incumbent, feature-rich and long‑standing, but it also surveys Gemini, Claude, Grock, and other players from the U.S., Europe, and beyond. It notes startups like Anthropic's Claud, xAI's Gro, and various regional engines; the landscape is tracked on leaderboards such as chatbot arena and Scale AI's seal leaderboard. The takeaway is not to lock in on one tool, but to explore options and mix models for different tasks.
UNDERSTANDING TOKENS, CONTEXT WINDOWS, AND MODEL VARIANTS
Behind the chat bubbles lies a few core ideas. Text is tokenized into small units; your chat forms a one-dimensional token stream that the model consumes and extends. The context window is the model's working memory, which you should manage by starting new chats to reset it when topics shift. The model is a fixed, self-contained Zip-file of parameters shaped by pre-training and post-training; pre-training absorbs internet data, post-training injects a persona via human demonstrations. Prices and tiers vary by provider and model size, influencing speed and capability.
REASONING MODES: RLHF AND THINKING
Thinking models arise from reinforcement learning from human feedback (RLHF), where the model practices solving problems and learns problem‑solving strategies. These models tend to think for longer, producing step-by-step reasoning that can improve accuracy on hard math or coding tasks but slow down responses. The video demonstrates switching from standard GPT-4‑level models to advanced thinking variants (often labeled Pro or “thinking” modes) for stubborn problems like gradient checks. Different providers (GPT‑4o, Claude, Grock, Gemini) offer their own thinking options with varying trade-offs in speed and reliability.
TOOLING AND WORKFLOWS: SEARCH, DEEP RESEARCH, ANALYSIS, AND DIAGRAMS
Tooling is the core multiplier here: the model can use internet search to fetch fresh information, pull in pages into its context, and cite sources so you can verify outputs. Deep Research extends this with tens of minutes of structured inquiry, multiple sources, and a polished report, akin to a lightweight literature synthesis. The video also shows Advanced Data Analysis to plot data programmatically, and Artifacts to generate custom apps or diagrams inside the UI. For coding workflows, tools like Cursor or in-editor prompts let the model write and modify code in your environment.
MULTIMODAL INPUTS AND OUTPUTS: VOICE, IMAGES, VIDEO
Multimodality expands what you can feed the model and what it can return. Speech input can be captured with a mic that transcribes to text, or, on some platforms, advanced voice mode lets the model handle audio tokens directly. Images and videos are uploaded or streamed; images can be captioned, described, or used as prompts, and video input can be treated similarly via camera feeds. The tools also generate images (DALL·E, Ideogram) and even short videos. The experience varies by app and device, but the trend is toward seamless cross‑modal conversation.
PRACTICAL TAKEAWAYS: MEMORIES, CUSTOM GPTS, AND WORKFLOW STRATEGIES
Finally, the video emphasizes practical patterns you can adopt. Use memory to store preferences and tailor responses; leverage custom instructions to set tone and goals; build custom GPTs for language learning or domain tasks to avoid repeating long prompts. The idea of an 'LLM council'—pulling from multiple providers to cross‑check answers— helps mitigate model bias and coverage gaps. For real work, verify critical outputs with sources, start new chats for topic shifts, and pick tools aligned to the task (search for fresh facts, deep research for literature, or code assistants for development).
Mentioned in This Episode
●Supplements
●Products
●Software & Apps
●Tools
●Companies
●Organizations
●Books
●Studies Cited
●Concepts
●People Referenced
LLM Practical Cheat Sheet — Dos & Don'ts
Practical takeaways from this episode
Do This
Avoid This
Common Questions
Start a new chat whenever you change topics or no longer need the prior conversation context — it clears the context window, reduces distraction, and lowers token cost. (See the advice at 984s).
Topics
Mentioned in this video
A smaller, free‑tier variant of GPT 40 mentioned in the discussion of tiers and pricing.
A leaderboard / ranking site for comparing chat models (mentioned as a way to track models).
Another leaderboard/eval site (referred to when discussing ways to monitor model performance).
A tokenizer app (used to show tokenization and token counts of prompts and responses).
Over‑the‑counter night relief medication discussed alongside DayQuil for cold symptoms.
Over‑the‑counter medication mentioned when asking the model about remedies for a runny nose.
A multi‑supplement product (Brian Johnson's mix) used as a Deep Research example to investigate ingredients.
ChatGPT tool (Python + plotting integration) used to analyze data, create plots and run code in the conversation.
Claude feature that can generate runnable in‑browser apps (used to produce flashcard apps and Mermaid diagrams).
Cursor's higher‑level agent/assistant that can modify multiple files, run installs, and autonomously update a codebase.
Colgate toothpaste label scanned and discussed with the LLM to interpret ingredients and safety.
Used by Claude Artifacts to render conceptual diagrams from book chapters and other texts.
One active ingredient in the Longevity Mix that the presenter asks the model to research (mechanism, studies, safety).
ChatGPT Pro feature that performs long-form research combining internet search + extended reasoning (demoed on supplements).
The runtime environment ChatGPT can call to compute exact results and run user‑provided code (used to avoid hallucinated math).
A diagramming syntax/library Claude used to produce conceptual diagrams from text (e.g., book chapters).
Google's NotebookLM (notebook. ai) style tool demoed for generating on‑demand podcasts and interactive audio from documents.
Adam Smith's 1776 book used as an example of reading a long historical text together with an LLM.
Author of 'Genghis Khan and the Making of the Modern World', referenced during a camera demo (book visible on shelf).
Text→image model family referenced when generating images for thumbnails and summarizing headlines (referred to in the transcript as the image generator tied to ChatGPT).
More from Andrej Karpathy
View all 14 summaries
212 minDeep Dive into LLMs like ChatGPT
242 minLet's reproduce GPT-2 (124M)
134 minLet's build the GPT Tokenizer
60 min[1hr Talk] Intro to Large Language Models
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free