FP8
A precision format (8-bit floating point) used for DeepSeek V3 weights, requiring specific kernel support for inference, and a trend in native quantization during training.
Common Themes
Videos Mentioning FP8

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI
Latent Space
An 8-bit floating-point format that can be used for inference and potentially training, allowing for greater efficiency.

DeepSeek V3, SGLang, and the state of Open Model Inference in 2025 (Quantization, MoEs, Pricing)
Latent Space
A precision format (8-bit floating point) used for DeepSeek V3 weights, requiring specific kernel support for inference, and a trend in native quantization during training.

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE
Latent Space
An 8-bit floating-point format that is being added to llm.c for improved training efficiency.

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind
Latent Space
A floating-point format using 8 bits, implemented by NVIDIA in their Transformer Engine for potentially faster and more efficient deep learning model inference with minimal quality loss.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 2: PyTorch (einops)
Stanford Online
An 8-bit floating-point format introduced more recently, with versions offering different dynamic range and resolution trade-offs, supported by NVIDIA's transformer engine.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs
Stanford Online
An 8-bit floating-point format increasingly used in deep learning for training and inference to reduce memory footprint and increase speed.