Transformer

Concept

Neural network architecture visualized and explained as the core model type for LLMs.

Mentioned in 26 videos

Videos Mentioning Transformer

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

Neural network architecture visualized and explained as the core model type for LLMs.

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Latent Space

Yi Tay's early research at Google brain focused on efficient Transformers, exploring alternatives to attention mechanisms and their broader applications.

LLM Asia Paper Club Survey Round

LLM Asia Paper Club Survey Round

Latent Space

The underlying architecture for large language models, discussed in the context of how tokens are used for reasoning and how activations are processed.

2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024]

2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024]

Latent Space

The primary architecture discussed and contrasted with newer post-Transformer models, known for its quadratic scaling in attention.

Breaking down the OG GPT Paper by Alec Radford

Breaking down the OG GPT Paper by Alec Radford

Latent Space

The core architecture used in GPT, enabling efficient processing of sequential data and attention mechanisms.

[Paper Club] Intro to Diffusion Models and OpenAI sCM: Simple, Stable, Scalable Consistency Models

[Paper Club] Intro to Diffusion Models and OpenAI sCM: Simple, Stable, Scalable Consistency Models

Latent Space

Reference for positional embeddings, similar to Fourier embeddings used in the attention blocks of consistency models.

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258

Lex Fridman

A neural network architecture, specifically mentioned for images where it represents images as non-overlapping patches for masking.

How DeepMind’s New AI Predicts What It Cannot See

How DeepMind’s New AI Predicts What It Cannot See

Two Minute Papers

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

Latent Space

A neural network architecture that has been influential in the development of large language models.

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

Latent Space

The underlying architecture of modern LLMs like GPT-3, mentioned in the context of role prompting potentially working better on older models than current Transformers.

Joscha Bach: Artificial Consciousness and the Nature of Reality | Lex Fridman Podcast #101

Joscha Bach: Artificial Consciousness and the Nature of Reality | Lex Fridman Podcast #101

Lex Fridman

A type of neural network with attentional mechanisms, used in natural language processing to track identity and concepts within text.

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

Latent Space

The dominant architecture for LLMs, discussed as likely to continue reigning supreme.

FlashAttention-2: Making Transformers 800% faster AND exact

FlashAttention-2: Making Transformers 800% faster AND exact

Latent Space

The foundational architecture for many modern NLP models, introduced in 2017, which popularized the attention mechanism.

Joscha Bach: Nature of Reality, Dreams, and Consciousness | Lex Fridman Podcast #212

Joscha Bach: Nature of Reality, Dreams, and Consciousness | Lex Fridman Podcast #212

Lex Fridman

An AI architecture, invented in 2017 for natural language processing, that GPT-3 and its successors use, enabling them to learn relationships between distant words in a large context.

Ishan Misra: Self-Supervised Deep Learning in Computer Vision | Lex Fridman Podcast #206

Ishan Misra: Self-Supervised Deep Learning in Computer Vision | Lex Fridman Podcast #206

Lex Fridman

Discussed as an architecture for AI models, especially in natural language understanding, similar to those used by OpenAI.

Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94

Ilya Sutskever: Deep Learning | Lex Fridman Podcast #94

Lex Fridman

A neural network architecture that makes extensive use of attention mechanisms, becoming a foundational element for breakthroughs in natural language processing due to its efficiency on GPUs and non-recurrent nature.

Dmitri Dolgov: Waymo and the Future of Self-Driving Cars | Lex Fridman Podcast #147

Dmitri Dolgov: Waymo and the Future of Self-Driving Cars | Lex Fridman Podcast #147

Lex Fridman

A type of neural network architecture, noted for its breakthroughs in language models (like GPT-3), and its potential applicability to behavior prediction and decision-making in autonomous driving.

Chris Lattner: The Future of Computing and Programming Languages | Lex Fridman Podcast #131

Chris Lattner: The Future of Computing and Programming Languages | Lex Fridman Podcast #131

Lex Fridman

A neural network architecture that GPT models are based on, characterized by its non-recurrent nature and simplicity, yet highly effective for learning language models.

Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet | Lex Fridman Podcast #434

Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet | Lex Fridman Podcast #434

Lex Fridman

Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115

Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115

Lex Fridman

A novel neural network architecture used in deep learning, particularly for natural language processing, which GPT-3 is based on.

Prelude To Power: 1931 Michael Faraday Celebration

Prelude To Power: 1931 Michael Faraday Celebration

Ri Archives

⚡️ Beyond Transformers with Power Retention

⚡️ Beyond Transformers with Power Retention

Latent Space

The foundational architecture that has dominated LLMs; its history and evolution are compared to the potential adoption of power retention.

Sergey Brin | All-In Summit 2024

Sergey Brin | All-In Summit 2024

All-In Podcast

The foundational paper for language models, invented by Google researchers, which Brin believes they were too timid to deploy initially.

Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306

Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306

Lex Fridman

A neural network architecture that utilizes attention mechanisms, considered a powerful and stable approach for sequence modeling across various modalities.

Page 1 of 2Next