BERT
deep learning artificial neural network language model
Common Themes
Videos Mentioning BERT

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
Latent Space
Mentioned as an example of models that researchers were fine-tuning in late 2019, before the focus shifted entirely to foundational models.

Breaking down the OG GPT Paper by Alec Radford
Latent Space
Mentioned as a model that emerged after the GPT-1 paper.
![[Paper Club] Embeddings in 2024: OpenAI, Nomic Embed, Jina Embed, cde-small-v1 - with swyx](https://i.ytimg.com/vi/VIqXNRsRRQo/maxresdefault.jpg)
[Paper Club] Embeddings in 2024: OpenAI, Nomic Embed, Jina Embed, cde-small-v1 - with swyx
Latent Space
The architecture used by Nomic Embed, noted as a standard in training processes. The speaker expressed surprise that models are still largely updated versions of BERT.

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit
Latent Space
A transformer-based language model, mentioned in comparison to models like T5, used in early stages of NLP.

Production AI Engineering starts with Evals
Latent Space
A neural network-based technique for natural language processing pre-training, which significantly accelerated text-based information extraction and began to cannibalize Impira's computer vision-based approach.

Is finetuning GPT4o worth it?
Latent Space
A language model mentioned in the context of OpenAI's progress and early AI models.

Information Theory for Language Models: Jack Morris
Latent Space
A language model that Jack Morris was playing with and that was popular in 2019.

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind
Latent Space
A foundational transformer-based language model developed by Google, mentioned as an encoder model used by Michael Royzen before Longformer.

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36
Lex Fridman
A language model that utilizes self-supervised learning, cited as an example of successful NLP models.

Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
Lex Fridman
A transformer-based machine learning technique for natural language processing pre-training developed by Google, demonstrating transfer learning.

Rajat Monga: TensorFlow | Lex Fridman Podcast #22
Lex Fridman
A language model developed by Google, representing the kind of cutting-edge research enabled by TensorFlow.

Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306
Lex Fridman
A neural network-based technique for natural language processing pre-training, mentioned as an idea coming from NLP.

Deep Learning Basics: Introduction and Overview
Lex Fridman
Google's BERT (Bidirectional Encoder Representations from Transformers) model, a breakthrough in natural language processing.

Anthropic Head of Pretraining on Scaling Laws, Compute, and the Future of AI
Y Combinator
A model mentioned as an example of pre-training objectives considered before auto-regressive modeling became dominant.

The Utility of Interpretability — Emmanuel Amiesen
Latent Space
An early Transformer encoder-decoder model used as an example to illustrate how top layers of models can overfit on training objectives, necessitating the use of deeper layers for more general language understanding.

Transformers Explained: The Discovery That Changed AI Forever
Y Combinator
A series of models developed using only the encoder part of the transformer architecture for tasks like masked language modeling.

AI and the Future of Law: The 10 Year "Overnight" Success Story
Y Combinator
A natural language processing model whose paper inspired CaseText to explore large language models for legal applications early on.