Attention Is All You Need
Study / Research
scientific article published in June 2017
Mentioned in 5 videos
Save the 5 videos on Attention Is All You Need to your own pod.
Sign up free to keep building your knowledge base on Attention Is All You Need as more episodes are added.
Videos Mentioning Attention Is All You Need

Let's reproduce GPT-2 (124M)
Andrej Karpathy
Original Transformer paper referenced to explain positional encodings and encoder/decoder distinctions.

Breaking down the OG GPT Paper by Alec Radford
Latent Space
The paper that introduced the Transformer architecture, which was used by GPT.
![[Paper Club] Intro to Diffusion Models and OpenAI sCM: Simple, Stable, Scalable Consistency Models](https://i.ytimg.com/vi/epwgOz8mZMw/maxresdefault.jpg)
[Paper Club] Intro to Diffusion Models and OpenAI sCM: Simple, Stable, Scalable Consistency Models
Latent Space
Mentioned as the source of positional embeddings comparable to the Fourier embeddings used in consistency models.
![[Paper Club] Molmo + Pixmo + Whisper 3 Turbo - with Vibhu Sapra, Nathan Lambert, Amgadoz](https://i.ytimg.com/vi/8BN9CdIYaqc/maxresdefault.jpg)
[Paper Club] Molmo + Pixmo + Whisper 3 Turbo - with Vibhu Sapra, Nathan Lambert, Amgadoz
Latent Space
A seminal paper in machine learning that introduced the Transformer architecture, which is the basis for Whisper's encoder-decoder model.

Information Theory for Language Models: Jack Morris
Latent Space
The paper that introduced the Transformer architecture, a key innovation in AI.