Attention Is All You Need
Study / Research
Original Transformer paper referenced to explain positional encodings and encoder/decoder distinctions.
Mentioned in 5 videos
Videos Mentioning Attention Is All You Need

Let's reproduce GPT-2 (124M)
Andrej Karpathy
Original Transformer paper referenced to explain positional encodings and encoder/decoder distinctions.

Breaking down the OG GPT Paper by Alec Radford
Latent Space
The paper that introduced the Transformer architecture, which was used by GPT.
![[Paper Club] Intro to Diffusion Models and OpenAI sCM: Simple, Stable, Scalable Consistency Models](https://i.ytimg.com/vi/epwgOz8mZMw/maxresdefault.jpg)
[Paper Club] Intro to Diffusion Models and OpenAI sCM: Simple, Stable, Scalable Consistency Models
Latent Space
Mentioned as the source of positional embeddings comparable to the Fourier embeddings used in consistency models.
![[Paper Club] Molmo + Pixmo + Whisper 3 Turbo - with Vibhu Sapra, Nathan Lambert, Amgadoz](https://i.ytimg.com/vi/8BN9CdIYaqc/maxresdefault.jpg)
[Paper Club] Molmo + Pixmo + Whisper 3 Turbo - with Vibhu Sapra, Nathan Lambert, Amgadoz
Latent Space
A seminal paper in machine learning that introduced the Transformer architecture, which is the basis for Whisper's encoder-decoder model.

Information Theory for Language Models: Jack Morris
Latent Space
The paper that introduced the Transformer architecture, a key innovation in AI.