Key Moments

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read84 min video
Jun 20, 2023|65,666 views|1,452|125
Save to Pod
TL;DR

George Hotz (geohot) discusses tinygrad and tinycorp's mission to democratize AI compute, challenging established players.

Key Insights

1

Tinygrad aims to simplify ML frameworks by using a reduced instruction set (RISC) approach, making it more efficient than complex instruction set (CISC) systems like PyTorch and TensorFlow.

2

Tinycorp's "three theses" focus on combating compute accessibility gatekeeping, building efficient ML hardware by optimizing software stacks, and avoiding "true completeness" in favor of focused optimization.

3

George Hotz criticizes the closed-source nature of major AI hardware ecosystems (like Google's TPUs) and advocates for open, accessible, and efficient compute solutions.

4

Tinygrad demonstrates significant developer efficiency and has shown competitive performance on Qualcomm GPUs, with ongoing efforts to improve performance on NVIDIA and AMD hardware.

5

Tinycorp's goal is to commoditize petaflops and build accessible, powerful "luxury AI computers" (Tiny Boxes) for individual developers and home users, focusing on inference capabilities.

6

George Hotz believes the future of AI development lies in "tools supercharging" human coders, not full replacement, and emphasizes the importance of accessible hardware and open-source collaboration.

THE BIRTH OF TINYCORP AND TINYGRAD

George Hotz, known for his early tech exploits, discusses the genesis of Tinycorp and its flagship open-source ML framework, tinygrad. Originally a hobby project, tinygrad evolved into a core focus due to concerns about centralized control over AI compute. Hotz was motivated to create structural solutions to prevent potential future gatekeeping of ML compute resources by large organizations or governments, especially in light of geopolitical uncertainties and the challenges of accessing chips from major vendors like NVIDIA and Qualcomm.

TINYGRAD'S ARCHITECTURE AND PHILOSOPHY

Tinygrad is built on the philosophy of a reduced instruction set (RISC) for ML models, contrasting with the complex instruction set (CISC) approach of frameworks like PyTorch and TensorFlow. This simplification aims for a much smaller and more manageable instruction set, enabling greater efficiency. Hotz likens this to the historical shift from CISC to RISC processors, which led to more streamlined and performant designs. The framework emphasizes minimal boilerplate code, making it highly readable and maintainable, with a goal of achieving near-optimal performance by focusing on core operations and efficient compilation.

CHALLENGING THE STATUS QUO: HARDWARE AND ECOSYSTEMS

A central theme for Tinycorp and tinygrad is democratizing AI compute by challenging the dominance of established players and their closed ecosystems. George Hotz criticizes the proprietary nature of hardware like Google's TPUs and the associated closed-source compilers, arguing that true accessibility requires open standards and transparent development. He highlights the difficulty of building performant ML software stacks for custom hardware if one cannot even optimize for existing, well-supported hardware like NVIDIA GPUs. This drives the focus on building efficient, open software that can leverage diverse hardware.

THE QUEST FOR ACCESSIBLE HARDWARE: TINY BOX

Tinycorp's hardware initiative, the Tiny Box, aims to provide powerful, affordable AI computers for developers and consumers. These machines are designed to offer high teraflops for their cost and power consumption, targeting use cases like running large language models locally. Hotz stresses the engineering challenges in building such systems, from accommodating multiple GPUs to managing power and noise, aiming for a silent, under-desk computing experience. The goal is to create a 'personal data center' that competes with cloud offerings in terms of cost-effectiveness and accessibility.

PERFORMANCE AND OPTIMIZATION: QUALCOMM, AMD, AND BEYOND

Tinygrad has demonstrated promising performance, particularly on Qualcomm GPUs, where it is reportedly faster than Qualcomm's native libraries. While initially facing significant challenges with AMD hardware due to driver issues and compilation problems, there's an ongoing effort to improve support and performance. Hotz emphasizes that the competitiveness on NVIDIA hardware is a continuing goal, with planned support for Tensor Cores expected to significantly close the performance gap. The focus remains on optimizing the software stack to extract maximum performance from available hardware, including exploring new architectures and efficient operations.

THE FUTURE OF AI: BEYOND BIG MODELS AND INTO REAL-WORLD APPLICATIONS

George Hotz expresses skepticism towards simply scaling up models indefinitely, suggesting that smaller models trained for longer or fine-tuned efficiently may offer better practical performance and cost-effectiveness, especially for inference. He envisions AI as tools that supercharge human capabilities, not replace them, drawing parallels to how Photoshop influenced art. The ultimate goal for Tinycorp involves building products like AI girlfriends, not as a novelty, but as a practical application of merging human desires with advanced machine intelligence, emphasizing accessibility and a shift towards information-based aspirations.

Common Questions

Tiny Grad is a restricted instruction system framework developed by George Hotz, designed to be significantly smaller and less complex than frameworks like PyTorch or XLA, aiming for higher performance by optimizing for simplified instruction sets.

Topics

Mentioned in this video

Products
iPhone

George Hotz mentions being the first to unlock the iPhone and trading the first unlocked iPhone for a Nissan 350Z and three new iPhones.

AMD GPU

AMD GPUs are mentioned as a potential platform for ML, but the lack of a good software stack and driver issues are highlighted.

Comma 3

The Comma 3 is used in their robotic prototypes, functioning as a brain for basic robotic tasks.

Nissan 350Z

George Hotz humorously mentions trading the first unlocked iPhone for a Nissan 350Z.

Amazon Inferentia

Amazon's clone of Google's TPU, mentioned as having software that doesn't work as well.

Nvidia Tensor Cores

Support for Nvidia Tensor Cores is identified as a key area for improvement that will significantly close the performance gap for Tiny Grad.

Intel GPU

Intel GPUs are mentioned as having documented hardware and stable kernel drivers, contrasting with AMD's issues.

Apple Silicon M1

M1 chips are mentioned, with PyTorch on M1 being considerably better than on AMD, though still having bugs.

Tiny Box Red

The first version of Tiny Corp's computers will be the 'Tiny Box Red', with potential for other colors if AMD continues to disappoint.

Comma Bodies

Comma Bodies are simple two-wheeled robotic platforms developed at comma, designed to turn robotics into a software problem.

TPU

Tensor Processing Units (TPUs) are mentioned as another successful training chip besides NVIDIA's, with Google also developing its own ML framework (TensorFlow) for them.

Qualcomm GPU

Tiny Grad is competitive on Qualcomm GPUs, being 2x faster than Qualcomm's library in production within open pilot.

H100

The cost of an H100 box is mentioned as a benchmark for comparison with Tiny Corp's offerings.

Software & Apps
PyTorch

PyTorch is discussed as a complex instruction system framework with significant boilerplate. Tiny grad aims to be simpler and more performant.

TensorFlow

TensorFlow is the ML framework Google developed for TPUs, mentioned as crucial for their success in training chips.

NumPy

Numpy's API is mentioned as similar to PyTorch and Pandas, influencing Tiny Grad's design.

NVIDIA Insight Profiler

The difficulty of using the NVIDIA Insight Profiler is highlighted, contrasting with Tiny Grad's simpler debug=2 option.

flash attention

Flash Attention is highlighted as an algorithmic trick that improves efficiency without increasing compute, similar to Hotz's approach with Tiny Grad.

CUDA

CUDA is described as a C-like language that compiles through multiple stages, all of which are term-complete, unlike the approach favored by George Hotz.

ONNX

ONNX is mentioned as a higher-level frontend for Tiny Grad, aiming to pass ONNX Runtime compliance tests.

PyTorch Lightning

PyTorch Lightning is seen as a framework around PyTorch that doesn't solve the fundamental issue of unnecessary memory round trips.

Pandas

Pandas API is mentioned as similar to PyTorch and Numpy, influencing Tiny Grad's design.

Mojo

Mojo is mentioned briefly by George Hotz, who expresses less interest due to its closed-source nature.

MPS PyTorch

George Hotz compared Tiny Grad's Metal backend to Apple's MPS PyTorch implementation, noting discrepancies.

GPT-4

GPT-4's training cost is estimated in 'person-years', and its architecture as a mixture of experts is discussed.

Segment Anything

The 'Segment Anything' model from Facebook AI is mentioned as state-of-the-art in computer vision, which George Hotz wants to build upon.

Transformers

Transformers are discussed for their reliance on semi-weight sharing and dynamic weight generation, rather than just 'attention'.

XLA

XLA is mentioned as a complex instruction system, contrasted with tiny grad's restricted instruction system.

Tiny Grad

Tiny Grad is presented as a compiler that generates the fastest possible programs, with the eventual goal of having ML do this program generation.

Core ML

Core ML is mentioned as a benchmark for Tiny Grad's ONNX support, with Tiny Grad performing better than Core ML but not yet surpassing ONNX Runtime.

ReLU

George Hotz dislikes the object-oriented implementation of ReLU in PyTorch, preferring a functional approach.

Open Pilot

Tiny Grad is used to run models in Open Pilot, demonstrating its production viability on Qualcomm GPUs.

AMD ROCm

George Hotz received a pre-release version of ROCm 5.x from AMD, which reportedly fixed kernel panics. He highlights the importance of AMD releasing this by month's end.

GGML

GGML is a framework focused on Apple Silicon (M1), which George Hotz initially considered but then decided to focus on a more general approach.

Llama

George Hotz dismisses the idea of a trillion-parameter LLaMA, suggesting GPT-4's 220 billion parameters and mixture-of-experts approach are more relevant but not necessarily superior.

Concepts
ARM

ARM processors are mentioned as the most common processors today, following the shift from CISC to RISC.

Systolic Array

A method for matrix multiplication used in TPUs, which George Hotz believes is the wrong choice due to scheduling difficulties for non-dense matrix multiplies.

Metal Backend

George Hotz wrote a Metal backend for Tiny Grad and compared its outputs to PyTorch's MPS backend, finding discrepancies.

Billions of parameters

George Hotz uses 'billions of parameters' to describe the internal models of AI, referencing the scale of models like GPT-4.

RISC-V

RISC-V is mentioned as an open-source instruction set architecture that is less complex than ARM.

Rice's Theorem

Mentioned in relation to CPU complexity and the halting problem, which is relevant to why neural networks can be optimized differently.

GPU Shader

Discussed in the context of load dependencies, contrasting with the more predictable loads in neural networks.

Alibi

Alibi is mentioned as a complex positional embedding used by LLaMA, which George Hotz contrasts with simpler methods or the potential abolishment of positional embeddings.

LLM

Large Language Models are discussed in various contexts, including their limitations, potential for fine-tuning, and interactions with human programmers.

Effective Accelerationism

George Hotz criticizes Effective Accelerationism, viewing it as an ideology taken less seriously by its adherents, particularly on the right.

Halting Problem

Used as an analogy for the complexity in CPUs related to predicting branch execution, which is simplified in neural networks.

TensorFlow Graph

TensorFlow's graph-based computation is contrasted with Tiny Grad's local graph optimizations facilitated by laziness.

Mixture-of-Experts

A technique used in models like GPT-4 where multiple smaller models are combined, discussed as a way to scale beyond parameter limits.

KV Cache

The KV cache invalidation problem with large context windows is mentioned as a drawback of some positional embedding techniques.

YOLO

Large YOLO models are mentioned as being impressive in object detection, and George Hotz wants to build something similar for segmentation.

Batch Normalization

Batch Normalization is listed as one of the 'six tricks' in AI development, with George Hotz wondering if phenomena like 'covaria shift' inspired it.

Self-driving Cars

Self-driving cars are used as an example of a solved software problem in robotics, which George Hotz believes robotics in general can become.

Paperclip Maximizer

The paperclip maximizer is used as an analogy for a perfect form of cancer, representing a risk that won't 'win' due to the 'Goddess of Everything Else' (complexity).

More from Latent Space

View all 191 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free