BERT

Name: BERT
Author: Google Research

Google ResearchVerified via Wikidata

deep learning artificial neural network language model

Mentioned in 21 videos

Founded

2018

Developer

Google Research

License

Apache Software License 2.0

arxiv.org

Source Code

What podcasters actually say about BERT.

21 mentions, no marketing. Save them all to a pod and ask any question.

Get Started Free

Common Themes

Mindset & Self-Improvement Technology & Innovation Business & Entrepreneurship Health & Longevity Society & Philosophy AI & Machine Learning Science & Mathematics Career & Skills Venture Capital Ai-Ethics

Videos Mentioning BERT

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Latent Space

Mentioned as an example of models that researchers were fine-tuning in late 2019, before the focus shifted entirely to foundational models.

Breaking down the OG GPT Paper by Alec Radford

Latent Space

Mentioned as a model that emerged after the GPT-1 paper.

[Paper Club] Embeddings in 2024: OpenAI, Nomic Embed, Jina Embed, cde-small-v1 - with swyx

Latent Space

The architecture used by Nomic Embed, noted as a standard in training processes. The speaker expressed surprise that models are still largely updated versions of BERT.

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Latent Space

A transformer-based language model, mentioned in comparison to models like T5, used in early stages of NLP.

Production AI Engineering starts with Evals

Latent Space

A neural network-based technique for natural language processing pre-training, which significantly accelerated text-based information extraction and began to cannibalize Impira's computer vision-based approach.

Is finetuning GPT4o worth it?

Latent Space

A language model mentioned in the context of OpenAI's progress and early AI models.

Information Theory for Language Models: Jack Morris

Latent Space

A language model that Jack Morris was playing with and that was popular in 2019.

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

Latent Space

A foundational transformer-based language model developed by Google, mentioned as an encoder model used by Michael Royzen before Longformer.

Yann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36

Lex Fridman

A language model that utilizes self-supervised learning, cited as an example of successful NLP models.

Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35

Lex Fridman

A transformer-based machine learning technique for natural language processing pre-training developed by Google, demonstrating transfer learning.

Rajat Monga: TensorFlow | Lex Fridman Podcast #22

Lex Fridman

A language model developed by Google, representing the kind of cutting-edge research enabled by TensorFlow.

Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306

Lex Fridman

A neural network-based technique for natural language processing pre-training, mentioned as an idea coming from NLP.

Deep Learning Basics: Introduction and Overview

Lex Fridman

Google's BERT (Bidirectional Encoder Representations from Transformers) model, a breakthrough in natural language processing.

Anthropic Head of Pretraining on Scaling Laws, Compute, and the Future of AI

Y Combinator

A model mentioned as an example of pre-training objectives considered before auto-regressive modeling became dominant.

The Utility of Interpretability — Emmanuel Amiesen

Latent Space

An early Transformer encoder-decoder model used as an example to illustrate how top layers of models can overfit on training objectives, necessitating the use of deeper layers for more general language understanding.

Transformers Explained: The Discovery That Changed AI Forever

Y Combinator

A series of models developed using only the encoder part of the transformer architecture for tasks like masked language modeling.

AI and the Future of Law: The 10 Year "Overnight" Success Story

Y Combinator

A natural language processing model whose paper inspired CaseText to explore large language models for legal applications early on.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 1: Overview, Tokenization

Stanford Online

A language model that was commonly fine-tuned on downstream tasks in the past.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space

A language model using masked autoencoding, used as a comparison point for Noetic's earlier masked autoencoder training objective.

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 13: Data (Sources, Datasets)

Stanford Online

An earlier language model trained on Wikipedia and books, noting its use of document-level sequences rather than sentences.

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 6 - Model Training

Stanford Online

A foundational language model mentioned as an example of distillation in the LLM world, where DistilBERT achieves significant parameter reduction while retaining performance.