Why did George Hotz start Tiny Corp?

George Hotz started Tiny Corp to structurally ensure accessibility of ML compute, fearing that large organizations and governments might try to clamp down on it. He aims to make custom chips and ML frameworks available to everyone.

What are the main advantages of tiny grad?

Tiny Grad offers advantages such as reduced complexity (around 10x less than XLA/PyTorch), implicit cache hierarchies, removal of branch predictors, and operation fusing through laziness, leading to more efficient computations.

What are the performance differences between Nvidia, Qualcomm, and AMD for ML?

Nvidia has the best training chips, Qualcomm excels in inference, and both have better software stacks than AMD, which has faced driver stability issues. Tiny Grad performs competitively on Qualcomm but is currently slower on Nvidia and struggle to compile on AMD.

What are the challenges in ML hardware design?

Designing ML hardware involves balancing teraflops with memory bandwidth, managing power consumption, ensuring silent operation, and overcoming complexities like multi-GPU integration and reliable driver support.

Why is quantization important for large language models?

Quantization, especially to formats like INT4, reduces the memory footprint and bandwidth requirements for LLMs, making them runnable on consumer hardware. However, it's crucial to ensure this compression is lossless or minimally lossy.

What is George Hotz's vision for the future of AI and robotics?

He envisions robotics becoming primarily a software problem, similar to self-driving cars. His long-term goal is to merge with machines, seeing his 'girlfriend' project as the ultimate product embodying this idea.

How does George Hotz approach hiring and team building?

Tiny Corp's hiring process involves solving open bounties rather than traditional technical interviews. He values contributors who propose bounties and emphasizes a collaborative, equity-driven culture.

What are the limitations of current LLMs like ChatGPT?

George Hotz finds current LLMs unimpressive, believing their reliance on categorical cross-entropy loss leads to mediocre responses. He dislikes the 'customer support' personality of RLHF-tuned models and prefers the flexibility of models like LLaMA.

What is the 'API Line' concept?

The 'API Line' refers to the point where tasks transition from being managed by humans to being handled by machines. Coding is above this line (tool-assisted), while tasks like driving are below it (machine-directed).

What does George Hotz think about AI alignment?

He believes the AI alignment problem is often misstated, focusing on company/government alignment with users. He suggests that existential risks stem more from misaligned human organizations controlling AI than from AI itself.

What is George Hotz's philosophy on creativity and technology?

He views technology, like Photoshop or AI image generators, as tools that augment human creativity, not replace it. He likens AI advancement to the ongoing evolution of tools throughout history.

Key Moments

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Latent Space Podcast

Science & Technology4 min read84 min video

Jun 20, 2023|65,674 views|1,452|125

george hotz tinygrad ai llms

Save to Pod

Key Moments

TL;DR

George Hotz (geohot) discusses tinygrad and tinycorp's mission to democratize AI compute, challenging established players.

Key Insights

Tinygrad aims to simplify ML frameworks by using a reduced instruction set (RISC) approach, making it more efficient than complex instruction set (CISC) systems like PyTorch and TensorFlow.

Tinycorp's "three theses" focus on combating compute accessibility gatekeeping, building efficient ML hardware by optimizing software stacks, and avoiding "true completeness" in favor of focused optimization.

George Hotz criticizes the closed-source nature of major AI hardware ecosystems (like Google's TPUs) and advocates for open, accessible, and efficient compute solutions.

Tinygrad demonstrates significant developer efficiency and has shown competitive performance on Qualcomm GPUs, with ongoing efforts to improve performance on NVIDIA and AMD hardware.

Tinycorp's goal is to commoditize petaflops and build accessible, powerful "luxury AI computers" (Tiny Boxes) for individual developers and home users, focusing on inference capabilities.

George Hotz believes the future of AI development lies in "tools supercharging" human coders, not full replacement, and emphasizes the importance of accessible hardware and open-source collaboration.

THE BIRTH OF TINYCORP AND TINYGRAD

George Hotz, known for his early tech exploits, discusses the genesis of Tinycorp and its flagship open-source ML framework, tinygrad. Originally a hobby project, tinygrad evolved into a core focus due to concerns about centralized control over AI compute. Hotz was motivated to create structural solutions to prevent potential future gatekeeping of ML compute resources by large organizations or governments, especially in light of geopolitical uncertainties and the challenges of accessing chips from major vendors like NVIDIA and Qualcomm.

TINYGRAD'S ARCHITECTURE AND PHILOSOPHY

Tinygrad is built on the philosophy of a reduced instruction set (RISC) for ML models, contrasting with the complex instruction set (CISC) approach of frameworks like PyTorch and TensorFlow. This simplification aims for a much smaller and more manageable instruction set, enabling greater efficiency. Hotz likens this to the historical shift from CISC to RISC processors, which led to more streamlined and performant designs. The framework emphasizes minimal boilerplate code, making it highly readable and maintainable, with a goal of achieving near-optimal performance by focusing on core operations and efficient compilation.

CHALLENGING THE STATUS QUO: HARDWARE AND ECOSYSTEMS

A central theme for Tinycorp and tinygrad is democratizing AI compute by challenging the dominance of established players and their closed ecosystems. George Hotz criticizes the proprietary nature of hardware like Google's TPUs and the associated closed-source compilers, arguing that true accessibility requires open standards and transparent development. He highlights the difficulty of building performant ML software stacks for custom hardware if one cannot even optimize for existing, well-supported hardware like NVIDIA GPUs. This drives the focus on building efficient, open software that can leverage diverse hardware.

THE QUEST FOR ACCESSIBLE HARDWARE: TINY BOX

Tinycorp's hardware initiative, the Tiny Box, aims to provide powerful, affordable AI computers for developers and consumers. These machines are designed to offer high teraflops for their cost and power consumption, targeting use cases like running large language models locally. Hotz stresses the engineering challenges in building such systems, from accommodating multiple GPUs to managing power and noise, aiming for a silent, under-desk computing experience. The goal is to create a 'personal data center' that competes with cloud offerings in terms of cost-effectiveness and accessibility.

PERFORMANCE AND OPTIMIZATION: QUALCOMM, AMD, AND BEYOND

Tinygrad has demonstrated promising performance, particularly on Qualcomm GPUs, where it is reportedly faster than Qualcomm's native libraries. While initially facing significant challenges with AMD hardware due to driver issues and compilation problems, there's an ongoing effort to improve support and performance. Hotz emphasizes that the competitiveness on NVIDIA hardware is a continuing goal, with planned support for Tensor Cores expected to significantly close the performance gap. The focus remains on optimizing the software stack to extract maximum performance from available hardware, including exploring new architectures and efficient operations.

THE FUTURE OF AI: BEYOND BIG MODELS AND INTO REAL-WORLD APPLICATIONS

George Hotz expresses skepticism towards simply scaling up models indefinitely, suggesting that smaller models trained for longer or fine-tuned efficiently may offer better practical performance and cost-effectiveness, especially for inference. He envisions AI as tools that supercharge human capabilities, not replace them, drawing parallels to how Photoshop influenced art. The ultimate goal for Tinycorp involves building products like AI girlfriends, not as a novelty, but as a practical application of merging human desires with advanced machine intelligence, emphasizing accessibility and a shift towards information-based aspirations.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Books

●Concepts

●People Referenced

Common Questions

Tiny Grad is a restricted instruction system framework developed by George Hotz, designed to be significantly smaller and less complex than frameworks like PyTorch or XLA, aiming for higher performance by optimizing for simplified instruction sets.

Topics

AI & Machine Learning Technology & Innovation Programming & Software AI Development Hardware Design GPU Computing Ml Frameworks Compute Commoditization Python Alternatives

Mentioned in this video

Products

iPhone

George Hotz mentions being the first to unlock the iPhone and trading the first unlocked iPhone for a Nissan 350Z and three new iPhones.

AMD GPU

AMD GPUs are mentioned as a potential platform for ML, but the lack of a good software stack and driver issues are highlighted.

Comma 3

The Comma 3 is used in their robotic prototypes, functioning as a brain for basic robotic tasks.

Nissan 350Z

George Hotz humorously mentions trading the first unlocked iPhone for a Nissan 350Z.

Amazon Inferentia

Amazon's clone of Google's TPU, mentioned as having software that doesn't work as well.

Nvidia Tensor Cores

Support for Nvidia Tensor Cores is identified as a key area for improvement that will significantly close the performance gap for Tiny Grad.

Intel GPU

Intel GPUs are mentioned as having documented hardware and stable kernel drivers, contrasting with AMD's issues.

Apple Silicon M1

M1 chips are mentioned, with PyTorch on M1 being considerably better than on AMD, though still having bugs.

Tiny Box Red

The first version of Tiny Corp's computers will be the 'Tiny Box Red', with potential for other colors if AMD continues to disappoint.

Comma Bodies

Comma Bodies are simple two-wheeled robotic platforms developed at comma, designed to turn robotics into a software problem.

TPU

Tensor Processing Units (TPUs) are mentioned as another successful training chip besides NVIDIA's, with Google also developing its own ML framework (TensorFlow) for them.

Qualcomm GPU

Tiny Grad is competitive on Qualcomm GPUs, being 2x faster than Qualcomm's library in production within open pilot.

H100

The cost of an H100 box is mentioned as a benchmark for comparison with Tiny Corp's offerings.

Organizations

Sony

George Hotz was sued by Sony after breaking into the PS3.

Triton

Triton is mentioned as a technology PyTorch is using to generate kernels on the fly.

Facebook AI Research

Facebook AI Research is presented as an attractive place for researchers who want to build and publish AI, contrasting with OpenAI's perceived ideological leanings.

Harvard

George Hotz mentions doing grad math classes at Harvard, indicating the limits of his mathematical abilities.

Companies

Comma

The company George Hotz started, which was an engineering feat until government intervention led to it becoming a research-only project.

Tesla

George Hotz chose not to go to Tesla to build vision, instead starting comma.

NVIDIA

NVIDIA has the best training chips for AI models, and George Hotz is working with them to buy chips. He also mentions the potential for governments to nationalize NVIDIA and clamp down on ML compute accessibility.

Qualcomm

Qualcomm has the best inference chips for AI models, and George Hotz is working with them to buy chips.

Cerebras

Cerebras is mentioned as a company that has succeeded in making training chips, though not as widely known as NVIDIA.

OpenAI

OpenAI is praised for its early leadership in deep learning and the 'compute is all you need' thesis, but George Hotz questions why researchers would choose it over established labs like Facebook.

Neuralink

Elon Musk's Neuralink is mentioned as a way to merge with machines, which George Hotz contrasts with his own approach of 'merging' through relationships and uploading consciousness.

YouTube

George Hotz notes that thousands of hours of his content are on YouTube, suggesting his 'brain' is already partially uploaded.

Software & Apps

PyTorch

PyTorch is discussed as a complex instruction system framework with significant boilerplate. Tiny grad aims to be simpler and more performant.

TensorFlow

TensorFlow is the ML framework Google developed for TPUs, mentioned as crucial for their success in training chips.

NumPy

Numpy's API is mentioned as similar to PyTorch and Pandas, influencing Tiny Grad's design.

NVIDIA Insight Profiler

The difficulty of using the NVIDIA Insight Profiler is highlighted, contrasting with Tiny Grad's simpler debug=2 option.

flash attention

Flash Attention is highlighted as an algorithmic trick that improves efficiency without increasing compute, similar to Hotz's approach with Tiny Grad.

CUDA

CUDA is described as a C-like language that compiles through multiple stages, all of which are term-complete, unlike the approach favored by George Hotz.

ONNX

ONNX is mentioned as a higher-level frontend for Tiny Grad, aiming to pass ONNX Runtime compliance tests.

PyTorch Lightning

PyTorch Lightning is seen as a framework around PyTorch that doesn't solve the fundamental issue of unnecessary memory round trips.

Pandas

Pandas API is mentioned as similar to PyTorch and Numpy, influencing Tiny Grad's design.

Mojo

Mojo is mentioned briefly by George Hotz, who expresses less interest due to its closed-source nature.

MPS PyTorch

George Hotz compared Tiny Grad's Metal backend to Apple's MPS PyTorch implementation, noting discrepancies.

GPT-4

GPT-4's training cost is estimated in 'person-years', and its architecture as a mixture of experts is discussed.

Segment Anything

The 'Segment Anything' model from Facebook AI is mentioned as state-of-the-art in computer vision, which George Hotz wants to build upon.

Transformers

Transformers are discussed for their reliance on semi-weight sharing and dynamic weight generation, rather than just 'attention'.

XLA

XLA is mentioned as a complex instruction system, contrasted with tiny grad's restricted instruction system.

Tiny Grad

Tiny Grad is presented as a compiler that generates the fastest possible programs, with the eventual goal of having ML do this program generation.

Core ML

Core ML is mentioned as a benchmark for Tiny Grad's ONNX support, with Tiny Grad performing better than Core ML but not yet surpassing ONNX Runtime.

ReLU

George Hotz dislikes the object-oriented implementation of ReLU in PyTorch, preferring a functional approach.

Open Pilot

Tiny Grad is used to run models in Open Pilot, demonstrating its production viability on Qualcomm GPUs.

AMD ROCm

George Hotz received a pre-release version of ROCm 5.x from AMD, which reportedly fixed kernel panics. He highlights the importance of AMD releasing this by month's end.

GGML

GGML is a framework focused on Apple Silicon (M1), which George Hotz initially considered but then decided to focus on a more general approach.

Llama

George Hotz dismisses the idea of a trillion-parameter LLaMA, suggesting GPT-4's 220 billion parameters and mixture-of-experts approach are more relevant but not necessarily superior.

Concepts

ARM

ARM processors are mentioned as the most common processors today, following the shift from CISC to RISC.

Systolic Array

A method for matrix multiplication used in TPUs, which George Hotz believes is the wrong choice due to scheduling difficulties for non-dense matrix multiplies.

Metal Backend

George Hotz wrote a Metal backend for Tiny Grad and compared its outputs to PyTorch's MPS backend, finding discrepancies.

Billions of parameters

George Hotz uses 'billions of parameters' to describe the internal models of AI, referencing the scale of models like GPT-4.

RISC-V

RISC-V is mentioned as an open-source instruction set architecture that is less complex than ARM.

Rice's Theorem

Mentioned in relation to CPU complexity and the halting problem, which is relevant to why neural networks can be optimized differently.

GPU Shader

Discussed in the context of load dependencies, contrasting with the more predictable loads in neural networks.

Alibi

Alibi is mentioned as a complex positional embedding used by LLaMA, which George Hotz contrasts with simpler methods or the potential abolishment of positional embeddings.

LLM

Large Language Models are discussed in various contexts, including their limitations, potential for fine-tuning, and interactions with human programmers.

Effective Accelerationism

George Hotz criticizes Effective Accelerationism, viewing it as an ideology taken less seriously by its adherents, particularly on the right.

Halting Problem

Used as an analogy for the complexity in CPUs related to predicting branch execution, which is simplified in neural networks.

TensorFlow Graph

TensorFlow's graph-based computation is contrasted with Tiny Grad's local graph optimizations facilitated by laziness.

Mixture-of-Experts

A technique used in models like GPT-4 where multiple smaller models are combined, discussed as a way to scale beyond parameter limits.

KV Cache

The KV cache invalidation problem with large context windows is mentioned as a drawback of some positional embedding techniques.

YOLO

Large YOLO models are mentioned as being impressive in object detection, and George Hotz wants to build something similar for segmentation.

Batch Normalization

Batch Normalization is listed as one of the 'six tricks' in AI development, with George Hotz wondering if phenomena like 'covaria shift' inspired it.

Self-driving Cars

Self-driving cars are used as an example of a solved software problem in robotics, which George Hotz believes robotics in general can become.

Paperclip Maximizer

The paperclip maximizer is used as an analogy for a perfect form of cancer, representing a risk that won't 'win' due to the 'Goddess of Everything Else' (complexity).

People

John Carmack

John Carmack's philosophy on building instrumentation into code is mentioned in relation to Tiny Grad's debugging features.

Richard Sutton

Richard Sutton's 'Bitter Lesson' is cited as a foundational concept for AI development, emphasizing the power of scaling computation over hand-engineering.

Elon Musk

Elon Musk is frequently mentioned as a comparator to George Hotz, particularly regarding their systems thinking approach and differing ambitions (physics-based vs. information-theory-based).

Sam Altman

Sam Altman is described as a genuinely good guy not interested in power-seeking, in contrast to Sam Bankman-Fried.

Mark Andreessen

Mark Andreessen is mentioned as a proponent of 'effective accelerationism', whom George Hotz finds critical of the political world but stuck in complaining.

Lisa Su

George Hotz emailed Lisa Su of AMD, and her response and subsequent calls led to improvements in AMD's driver support.

Andrej Karpathy

Andrej Karpathy's early RNN work generating Shakespeare is mentioned as a pivotal moment in George Hotz's understanding of neural networks as compressors.

Sam Bankman-Fried

Sam Bankman-Fried (SBF) is called a 'wolf in sheep's clothing' who adopted Effective Altruism for show, in contrast to Sam Altman.

Venkatesh Rao

George Hotz credits Venkatesh Rao for the 'API Line' concept, though he admits to adapting it.

Supplements

nickel

George Hotz fixed a bug in Nickel (likely referring to a component of the AMD driver stack) and received a quick response from an NVIDIA engineer for a similar issue.

Locations

International Space Station

The cost of the International Space Station is used as a dramatic comparison point for the immense cost of training clusters.

Media

Avatar 2

George Hotz heavily criticizes the writing of Avatar 2, stating it felt like it was written by ChatGPT and offering his own improved script.

Books

Goddess of Everything Else

The 'Goddess of Everything Else' is interpreted by George Hotz as meaning AI won't kill us, but rather a human-aligned government or company might create existential risks.

Legislation & Policy

MIT Licensed Software

Tiny Corp develops MIT licensed software, which contributes to its open and collaborative culture.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free