Key Moments

State of the Art: Training 70B LLMs on 10,000 H100 clusters

Latent Space PodcastLatent Space Podcast
Science & Technology3 min read93 min video
Jun 25, 2024|1,508 views|34|4
Save to Pod
TL;DR

Imbue and Databricks discuss training large LLMs, infra challenges, and evaluation methods.

Key Insights

1

Training large LLMs (70B+) requires massive infrastructure, with networking and hardware reliability being critical challenges.

2

Imbue is releasing infrastructure scripts, evaluation benchmarks, and a hyperparameter optimizer to aid others in training foundation models.

3

Databricks recently released a text-to-image model trained exclusively on Shutterstock data, emphasizing data provenance.

4

Evaluation of LLMs is complex, with significant effort dedicated to cleaning datasets and developing robust benchmarks beyond simple loss metrics.

5

Tool use and function calling are seen as crucial for interacting with structured data, with code generation and SQL being key approaches.

6

Long context utilization in LLMs presents challenges in evaluation due to annotation costs, with 'needle in a haystack' being a well-known but flawed method.

INTRODUCTION OF GUESTS AND RECENT DEVELOPMENTS

The podcast introduces Josh Albrecht (CTO of Imbue) and Jon Frankle (Chief AI Scientist at Databricks). Frankle, a previous guest, discusses Databricks' acquisition and their latest release: a text-to-image model developed in collaboration with Shutterstock. This model is notable for being trained exclusively on known Shutterstock data, emphasizing data provenance and trust for enterprise customers, although it's currently API-only.

IMBUE'S RELEASE OF TRAINING RESOURCES

Josh Albrecht details Imbue's contributions aimed at democratizing foundation model training. They are releasing infrastructure and training scripts for managing hardware failures, advanced evaluation tools including curated benchmarks and human judgments, and a cost-aware hyperparameter optimizer (CARBS) to improve prediction and scaling. These resources are intended to lower the barrier for companies to train their own models.

INFRASTRUCTURE CHALLENGES IN LARGE-SCALE TRAINING

A significant portion of the discussion revolves around the immense infrastructure challenges. Training on clusters with thousands of H100 GPUs involves complex networking, like three-tier architectures, and demands robust fault tolerance. Failures are common, ranging from hardware defects to infiniband cable theft, requiring sophisticated monitoring and automated health checks for thousands of machines. Imbue's approach involves direct collaboration with hardware vendors to fix issues at the firmware level.

THE COMPLEXITY OF MODEL EVALUATION

Both guests emphasize the critical and difficult nature of evaluating LLMs. Imbue has developed cleaned versions of popular benchmarks and internal evaluations, like a code understanding benchmark. They highlight issues with data contamination and ambiguity in standard evaluations, leading them to create their own data and reproduce examples. The focus is on metrics that are both precise and relevant to desired task performance, moving beyond simple loss.

SCALING LAWS AND HYPERPARAMETER OPTIMIZATION WITH CARBS

Jon Frankle elaborates on CARBS (Cost Aware Region Bayesian Search), a hyperparameter optimization tool. Unlike standard optimizers, CARBS accounts for the cost of sampling different configurations, allowing for the identification of scaling laws for various parameters (layers, learning rate, etc.). This predictability is crucial for efficiently training massive models by guiding experiments and ensuring larger scale runs are accurate from the start.

THE ROLE OF CODE AND STRUCTURED DATA IN AGENTS

The conversation touches on agent capabilities, emphasizing code generation and tool use. Imbue views robust code writing and execution as the ultimate tool, providing access to virtually infinite functionalities. Databricks focuses on enabling models to interact with structured data like SQL databases, seeing this as vital for enterprise customers. While knowledge graphs are explored, the simplicity and efficacy of tools like SQL for structured data interaction are highlighted as key.

LONG CONTEXT WINDOWS AND EMERGENT PROPERTIES

Long context windows are discussed as essential for agents, though their evaluation is challenging due to high annotation costs. Methods like 'needle in a haystack' are critiqued for not measuring holistic context utilization. Databricks favors thousand-shot tasks and considers scaling laws. The concept of emergent properties in LLMs is debated, with the idea that some perceived emergence might be an artifact of log-scale evaluation metrics.

THE FUTURE OF LLM DEVELOPMENT AND INFRASTRUCTURE

Looking ahead, Imbue is focused on making their models useful for coding and reasoning in daily workflows, with internal prototypes for future product releases. Databricks aims to continue delivering value to their extensive customer base, with plans for more community-facing science sharing and potentially new model releases. Both emphasize the ongoing need for innovation in infrastructure, evaluation, and model capabilities.

Common Questions

DBRX is Databricks' Mixture-of-Experts language model. It has 132 billion total parameters, with 36 billion active on any input, and was pre-trained on 12 trillion tokens of text and code. Databricks also gave it a dinosaur mascot named DB-Rex.

Topics

Mentioned in this video

More from Latent Space

View all 169 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free