Key Moments
Ep 18: Petaflops to the People — with George Hotz of tinycorp
Key Moments
George Hotz (geohot) discusses tinygrad and tinycorp's mission to democratize AI compute, challenging established players.
Key Insights
Tinygrad aims to simplify ML frameworks by using a reduced instruction set (RISC) approach, making it more efficient than complex instruction set (CISC) systems like PyTorch and TensorFlow.
Tinycorp's "three theses" focus on combating compute accessibility gatekeeping, building efficient ML hardware by optimizing software stacks, and avoiding "true completeness" in favor of focused optimization.
George Hotz criticizes the closed-source nature of major AI hardware ecosystems (like Google's TPUs) and advocates for open, accessible, and efficient compute solutions.
Tinygrad demonstrates significant developer efficiency and has shown competitive performance on Qualcomm GPUs, with ongoing efforts to improve performance on NVIDIA and AMD hardware.
Tinycorp's goal is to commoditize petaflops and build accessible, powerful "luxury AI computers" (Tiny Boxes) for individual developers and home users, focusing on inference capabilities.
George Hotz believes the future of AI development lies in "tools supercharging" human coders, not full replacement, and emphasizes the importance of accessible hardware and open-source collaboration.
THE BIRTH OF TINYCORP AND TINYGRAD
George Hotz, known for his early tech exploits, discusses the genesis of Tinycorp and its flagship open-source ML framework, tinygrad. Originally a hobby project, tinygrad evolved into a core focus due to concerns about centralized control over AI compute. Hotz was motivated to create structural solutions to prevent potential future gatekeeping of ML compute resources by large organizations or governments, especially in light of geopolitical uncertainties and the challenges of accessing chips from major vendors like NVIDIA and Qualcomm.
TINYGRAD'S ARCHITECTURE AND PHILOSOPHY
Tinygrad is built on the philosophy of a reduced instruction set (RISC) for ML models, contrasting with the complex instruction set (CISC) approach of frameworks like PyTorch and TensorFlow. This simplification aims for a much smaller and more manageable instruction set, enabling greater efficiency. Hotz likens this to the historical shift from CISC to RISC processors, which led to more streamlined and performant designs. The framework emphasizes minimal boilerplate code, making it highly readable and maintainable, with a goal of achieving near-optimal performance by focusing on core operations and efficient compilation.
CHALLENGING THE STATUS QUO: HARDWARE AND ECOSYSTEMS
A central theme for Tinycorp and tinygrad is democratizing AI compute by challenging the dominance of established players and their closed ecosystems. George Hotz criticizes the proprietary nature of hardware like Google's TPUs and the associated closed-source compilers, arguing that true accessibility requires open standards and transparent development. He highlights the difficulty of building performant ML software stacks for custom hardware if one cannot even optimize for existing, well-supported hardware like NVIDIA GPUs. This drives the focus on building efficient, open software that can leverage diverse hardware.
THE QUEST FOR ACCESSIBLE HARDWARE: TINY BOX
Tinycorp's hardware initiative, the Tiny Box, aims to provide powerful, affordable AI computers for developers and consumers. These machines are designed to offer high teraflops for their cost and power consumption, targeting use cases like running large language models locally. Hotz stresses the engineering challenges in building such systems, from accommodating multiple GPUs to managing power and noise, aiming for a silent, under-desk computing experience. The goal is to create a 'personal data center' that competes with cloud offerings in terms of cost-effectiveness and accessibility.
PERFORMANCE AND OPTIMIZATION: QUALCOMM, AMD, AND BEYOND
Tinygrad has demonstrated promising performance, particularly on Qualcomm GPUs, where it is reportedly faster than Qualcomm's native libraries. While initially facing significant challenges with AMD hardware due to driver issues and compilation problems, there's an ongoing effort to improve support and performance. Hotz emphasizes that the competitiveness on NVIDIA hardware is a continuing goal, with planned support for Tensor Cores expected to significantly close the performance gap. The focus remains on optimizing the software stack to extract maximum performance from available hardware, including exploring new architectures and efficient operations.
THE FUTURE OF AI: BEYOND BIG MODELS AND INTO REAL-WORLD APPLICATIONS
George Hotz expresses skepticism towards simply scaling up models indefinitely, suggesting that smaller models trained for longer or fine-tuned efficiently may offer better practical performance and cost-effectiveness, especially for inference. He envisions AI as tools that supercharge human capabilities, not replace them, drawing parallels to how Photoshop influenced art. The ultimate goal for Tinycorp involves building products like AI girlfriends, not as a novelty, but as a practical application of merging human desires with advanced machine intelligence, emphasizing accessibility and a shift towards information-based aspirations.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Organizations
●Books
●Concepts
●People Referenced
Common Questions
Tiny Grad is a restricted instruction system framework developed by George Hotz, designed to be significantly smaller and less complex than frameworks like PyTorch or XLA, aiming for higher performance by optimizing for simplified instruction sets.
Topics
Mentioned in this video
George Hotz mentions being the first to unlock the iPhone and trading the first unlocked iPhone for a Nissan 350Z and three new iPhones.
AMD GPUs are mentioned as a potential platform for ML, but the lack of a good software stack and driver issues are highlighted.
The Comma 3 is used in their robotic prototypes, functioning as a brain for basic robotic tasks.
George Hotz humorously mentions trading the first unlocked iPhone for a Nissan 350Z.
Amazon's clone of Google's TPU, mentioned as having software that doesn't work as well.
Support for Nvidia Tensor Cores is identified as a key area for improvement that will significantly close the performance gap for Tiny Grad.
Intel GPUs are mentioned as having documented hardware and stable kernel drivers, contrasting with AMD's issues.
M1 chips are mentioned, with PyTorch on M1 being considerably better than on AMD, though still having bugs.
The first version of Tiny Corp's computers will be the 'Tiny Box Red', with potential for other colors if AMD continues to disappoint.
Comma Bodies are simple two-wheeled robotic platforms developed at comma, designed to turn robotics into a software problem.
Tensor Processing Units (TPUs) are mentioned as another successful training chip besides NVIDIA's, with Google also developing its own ML framework (TensorFlow) for them.
Tiny Grad is competitive on Qualcomm GPUs, being 2x faster than Qualcomm's library in production within open pilot.
The cost of an H100 box is mentioned as a benchmark for comparison with Tiny Corp's offerings.
George Hotz was sued by Sony after breaking into the PS3.
Triton is mentioned as a technology PyTorch is using to generate kernels on the fly.
Facebook AI Research is presented as an attractive place for researchers who want to build and publish AI, contrasting with OpenAI's perceived ideological leanings.
George Hotz mentions doing grad math classes at Harvard, indicating the limits of his mathematical abilities.
The company George Hotz started, which was an engineering feat until government intervention led to it becoming a research-only project.
George Hotz chose not to go to Tesla to build vision, instead starting comma.
NVIDIA has the best training chips for AI models, and George Hotz is working with them to buy chips. He also mentions the potential for governments to nationalize NVIDIA and clamp down on ML compute accessibility.
Qualcomm has the best inference chips for AI models, and George Hotz is working with them to buy chips.
Cerebras is mentioned as a company that has succeeded in making training chips, though not as widely known as NVIDIA.
OpenAI is praised for its early leadership in deep learning and the 'compute is all you need' thesis, but George Hotz questions why researchers would choose it over established labs like Facebook.
Elon Musk's Neuralink is mentioned as a way to merge with machines, which George Hotz contrasts with his own approach of 'merging' through relationships and uploading consciousness.
George Hotz notes that thousands of hours of his content are on YouTube, suggesting his 'brain' is already partially uploaded.
PyTorch is discussed as a complex instruction system framework with significant boilerplate. Tiny grad aims to be simpler and more performant.
TensorFlow is the ML framework Google developed for TPUs, mentioned as crucial for their success in training chips.
Numpy's API is mentioned as similar to PyTorch and Pandas, influencing Tiny Grad's design.
The difficulty of using the NVIDIA Insight Profiler is highlighted, contrasting with Tiny Grad's simpler debug=2 option.
Flash Attention is highlighted as an algorithmic trick that improves efficiency without increasing compute, similar to Hotz's approach with Tiny Grad.
CUDA is described as a C-like language that compiles through multiple stages, all of which are term-complete, unlike the approach favored by George Hotz.
ONNX is mentioned as a higher-level frontend for Tiny Grad, aiming to pass ONNX Runtime compliance tests.
PyTorch Lightning is seen as a framework around PyTorch that doesn't solve the fundamental issue of unnecessary memory round trips.
Pandas API is mentioned as similar to PyTorch and Numpy, influencing Tiny Grad's design.
Mojo is mentioned briefly by George Hotz, who expresses less interest due to its closed-source nature.
George Hotz compared Tiny Grad's Metal backend to Apple's MPS PyTorch implementation, noting discrepancies.
GPT-4's training cost is estimated in 'person-years', and its architecture as a mixture of experts is discussed.
The 'Segment Anything' model from Facebook AI is mentioned as state-of-the-art in computer vision, which George Hotz wants to build upon.
Transformers are discussed for their reliance on semi-weight sharing and dynamic weight generation, rather than just 'attention'.
XLA is mentioned as a complex instruction system, contrasted with tiny grad's restricted instruction system.
Tiny Grad is presented as a compiler that generates the fastest possible programs, with the eventual goal of having ML do this program generation.
Core ML is mentioned as a benchmark for Tiny Grad's ONNX support, with Tiny Grad performing better than Core ML but not yet surpassing ONNX Runtime.
George Hotz dislikes the object-oriented implementation of ReLU in PyTorch, preferring a functional approach.
Tiny Grad is used to run models in Open Pilot, demonstrating its production viability on Qualcomm GPUs.
George Hotz received a pre-release version of ROCm 5.x from AMD, which reportedly fixed kernel panics. He highlights the importance of AMD releasing this by month's end.
GGML is a framework focused on Apple Silicon (M1), which George Hotz initially considered but then decided to focus on a more general approach.
George Hotz dismisses the idea of a trillion-parameter LLaMA, suggesting GPT-4's 220 billion parameters and mixture-of-experts approach are more relevant but not necessarily superior.
ARM processors are mentioned as the most common processors today, following the shift from CISC to RISC.
A method for matrix multiplication used in TPUs, which George Hotz believes is the wrong choice due to scheduling difficulties for non-dense matrix multiplies.
George Hotz wrote a Metal backend for Tiny Grad and compared its outputs to PyTorch's MPS backend, finding discrepancies.
George Hotz uses 'billions of parameters' to describe the internal models of AI, referencing the scale of models like GPT-4.
RISC-V is mentioned as an open-source instruction set architecture that is less complex than ARM.
Mentioned in relation to CPU complexity and the halting problem, which is relevant to why neural networks can be optimized differently.
Discussed in the context of load dependencies, contrasting with the more predictable loads in neural networks.
Alibi is mentioned as a complex positional embedding used by LLaMA, which George Hotz contrasts with simpler methods or the potential abolishment of positional embeddings.
Large Language Models are discussed in various contexts, including their limitations, potential for fine-tuning, and interactions with human programmers.
George Hotz criticizes Effective Accelerationism, viewing it as an ideology taken less seriously by its adherents, particularly on the right.
Used as an analogy for the complexity in CPUs related to predicting branch execution, which is simplified in neural networks.
TensorFlow's graph-based computation is contrasted with Tiny Grad's local graph optimizations facilitated by laziness.
A technique used in models like GPT-4 where multiple smaller models are combined, discussed as a way to scale beyond parameter limits.
The KV cache invalidation problem with large context windows is mentioned as a drawback of some positional embedding techniques.
Large YOLO models are mentioned as being impressive in object detection, and George Hotz wants to build something similar for segmentation.
Batch Normalization is listed as one of the 'six tricks' in AI development, with George Hotz wondering if phenomena like 'covaria shift' inspired it.
Self-driving cars are used as an example of a solved software problem in robotics, which George Hotz believes robotics in general can become.
The paperclip maximizer is used as an analogy for a perfect form of cancer, representing a risk that won't 'win' due to the 'Goddess of Everything Else' (complexity).
John Carmack's philosophy on building instrumentation into code is mentioned in relation to Tiny Grad's debugging features.
Richard Sutton's 'Bitter Lesson' is cited as a foundational concept for AI development, emphasizing the power of scaling computation over hand-engineering.
Elon Musk is frequently mentioned as a comparator to George Hotz, particularly regarding their systems thinking approach and differing ambitions (physics-based vs. information-theory-based).
Sam Altman is described as a genuinely good guy not interested in power-seeking, in contrast to Sam Bankman-Fried.
Mark Andreessen is mentioned as a proponent of 'effective accelerationism', whom George Hotz finds critical of the political world but stuck in complaining.
George Hotz emailed Lisa Su of AMD, and her response and subsequent calls led to improvements in AMD's driver support.
Andrej Karpathy's early RNN work generating Shakespeare is mentioned as a pivotal moment in George Hotz's understanding of neural networks as compressors.
Sam Bankman-Fried (SBF) is called a 'wolf in sheep's clothing' who adopted Effective Altruism for show, in contrast to Sam Altman.
George Hotz credits Venkatesh Rao for the 'API Line' concept, though he admits to adapting it.
More from Latent Space
View all 191 summaries
86 minNVIDIA's AI Engineers: Brev, Dynamo and Agent Inference at Planetary Scale and "Speed of Light"
72 minCursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
77 minWhy Every Agent Needs a Box — Aaron Levie, Box
42 min⚡️ Polsia: Solo Founder Tiny Team from 0 to 1m ARR in 1 month & the future of Self-Running Companies
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free