Key Moments

The Shape of Compute (Chris Lattner of Modular)

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read79 min video
Jun 13, 2025|47,430 views|1,101|68
Save to Pod
TL;DR

Modular's ML stack breaks CUDA monopoly, offers high performance via Mojo language and Max inference framework.

Key Insights

1

Modular aims to democratize AI by providing a vertically integrated, high-performance software stack that isn't tied to proprietary hardware like CUDA.

2

Mojo, a new programming language, is designed for performance-critical applications, offering Python-like syntax with C++ or Rust-level speed and GPU support.

3

Max, Modular's inference framework, is built to be highly performant, flexible, and optimized for GenAI workloads, integrating seamlessly with Mojo.

4

The company strategically focused on inference first, recognizing its growing importance and scalability to customer bases, rather than just training.

5

Modular emphasizes composability and modular design in its software stack, allowing for easier evolution, adaptation to new hardware, and community contributions.

6

Modular's business model offers its core language and framework for free, with revenue generated from enterprise-level cluster management and support, encouraging widespread adoption.

FROM RESEARCH TO PRODUCTION: MODULAR'S EVOLUTION

Modular, founded over three years ago, is currently in a pivotal phase of its development, transitioning from a secretive R&D phase to open-sourcing its technology and scaling its impact. The initial R&D period, spanning roughly three years, was dedicated to solving complex problems like unlocking heterogeneous compute and simplifying GPU programming. This phase focused on proving the core hypotheses and achieving state-of-the-art performance, even against NVIDIA's best, without relying on CUDA. The goal was to build a system that Chris Lattner himself would deem performant enough before broader release.

THE MOJO PROGRAMMING LANGUAGE: PERFORMANCE AND USABILITY

At the heart of Modular's stack is Mojo, a new programming language designed to address the limitations of existing languages for accelerated hardware. Mojo aims to provide the performance and low-level control necessary for GPUs and other specialized chips, while maintaining a Python-like syntax and user experience. This blend of usability and performance is intended to enable developers to write highly efficient code, extend Python performance, and easily target diverse hardware without the complexity of C++ or the limitations of Python's GIL.

MAX: A FLEXIBLE FRAMEWORK FOR GENAI INFERENCE

Complementing Mojo, Modular has developed Max, a GenAI inference-focused framework. Max is engineered for high performance, low latency, and control, specifically targeting the demands of modern AI models. It integrates seamlessly with Mojo, allowing for customized kernels and automatic kernel fusion, thereby reducing complexity for developers. The framework supports a wide array of model families and is designed to be highly hackable, enabling researchers and developers to build and optimize custom models efficiently.

BREAKING THE CUDA MONOPOLY AND HARDWARE AGNOSTICISM

A central tenet of Modular's strategy is to break NVIDIA's CUDA monopoly and provide a hardware-agnostic software stack. The company aims to achieve state-of-the-art performance on various hardware, including AMD GPUs, and future architectures. By developing Mojo and Max as replacements for proprietary stacks, Modular enables developers to write portable code that can run efficiently across different vendors, reducing vendor lock-in and fostering greater innovation in the AI hardware space by lowering the cost and complexity of hardware adoption.

MODULAR'S STRATEGIC FOCUS ON INFERENCE

Modular made a conscious decision early on to focus on inference rather than training, a choice that was considered contrarian at the time. This focus was driven by the understanding that inference scales with the size of the customer base and the applications of AI, not just the research team. As AI models move into production and affect billions of users, inference becomes the critical bottleneck. By optimizing this part of the stack, Modular positions itself to capture immense value as AI adoption grows and moves from research labs into widespread commercial use and applications.

BUSINESS MODEL AND OPEN SOURCE STRATEGY

Modular's go-to-market strategy involves offering the core Mojo language and Max framework completely free, encouraging widespread adoption and community contributions. Revenue is generated through optional enterprise offerings, such as cluster management, specialized support, and multi-hardware orchestration. This approach allows individuals and smaller teams to leverage high-performance AI tools without upfront costs, while providing scalable solutions for larger organizations. The company's commitment to open source, including full version control history, aims to foster transparency, collaboration, and rapid iteration.

THE IMPORTANCE OF MODULARITY AND COMPOSABILITY

The company's name, Modular, reflects its core design philosophy: building systems that are composable, orthogonal, and flexible. This approach, learned from past experiences like the development of LLVM, is crucial for managing complexity and enabling rapid evolution in the fast-paced AI landscape. By avoiding monolithic designs and focusing on clean, well-defined interfaces between components, Modular can more easily adapt to new hardware, integrate cutting-edge research, and allow specialized teams to contribute effectively, accelerating overall progress in AI development.

EMPOWERING DEVELOPERS AND ACCELERATING AI ADOPTION

Ultimately, Modular aims to empower a broader range of developers to work with AI and GPUs by reducing complexity and providing accessible, high-performance tools. The company believes that by making advanced capabilities easier to use and understand, they can accelerate the pace of AI innovation and lead to more impactful products and applications. This empowering vision extends to upskilling the workforce, enabling more people to program GPUs and contribute the next wave of AI advancements, moving beyond a focus solely on large, well-resourced labs.

Common Questions

Modular is a company developing a new AI stack with the goal of unlocking heterogeneous compute and simplifying GPU programming. Their primary aim is to drive out complexity in the AI stack.

Topics

Mentioned in this video

conceptcontinuous batching

An optimization technique for LLM serving, allowing multiple requests to be processed efficiently.

personChris Lattner

Co-founder of Modular, the guest on the podcast discussing the shape of compute.

toolNVIDIA

A major GPU manufacturer whose hardware and software (CUDA) are central to the discussion, often serving as a benchmark for Modular's technology.

toolCUDA

NVIDIA's parallel computing platform and API, which Modular aims to replace or abstract away with its own stack.

productA100

A specific NVIDIA GPU model for which Modular achieved state-of-the-art performance in an early release.

productH100

A newer NVIDIA GPU architecture for which Modular added support, improving performance and features.

toolAMD

Another GPU manufacturer whose hardware (MI300) Modular is adding support for.

personMax

Modular's AI inference framework, designed to be efficient and controllable, integrating seamlessly with Mojo.

softwarePTX

NVIDIA's parallel thread execution assembly language, which the DeepSeek team reverse-engineered, highlighting the importance of low-level GPU programming.

toolband saw

A woodworking tool that Lattner uses with his kids, emphasizing its safety features.

companyModular

The company founded by Chris Lattner, focusing on AI compute and programming languages.

toolLlama 3

A large language model mentioned as a benchmark for state-of-the-art performance in serving.

personBlackwell

NVIDIA's upcoming GPU architecture, for which Modular is developing support.

organizationSmall AI

A company founded by the co-host, mentioned at the start of the podcast.

softwareOpenBLAS

An open-source math library that Modular sought to surpass on CPUs.

softwareMojo

Modular's programming language designed for AI development, offering high performance and ease of use.

softwareIntel MKL

Intel's Math Kernel Library, which Modular aimed to outperform with its early compiler technology.

softwareLLVM

A compiler infrastructure project that Chris Lattner was instrumental in developing, serving as a precedent for Modular's approach.

toolApple

The company where Chris Lattner developed LLVM and Swift, and where he had to navigate internal resistance to new technologies.

toolGCC

The GNU Compiler Collection, which Lattner encountered resistance from when developing LLVM.

softwareSGLang

Another inference project from Berkeley, characterized as focused and goal-oriented.

toolGoogle

A company where Lattner previously worked, known for its early contributions to AI research and its decision to open-source TensorFlow.

softwareVLM

An open-source inference framework from Berkeley, compared with Modular's approach, noted for its broad but sometimes unreliable hardware support.

softwareOpenCL

A framework for parallel programming across heterogeneous systems, mentioned as an example of previous attempts to solve similar problems.

toolTensorFlow

Google's open-source machine learning framework, credited with democratizing AI and setting the stage for PyTorch's open-source approach.

softwareMistral

A prominent AI company whose adoption of Modular's technology would signify wider industry acceptance.

toolPyTorch

A popular machine learning framework that Lattner sees as a model for how Mojo could be adopted.

toolDeepSeek

A research team that released impressive models and pushed advancements in low-precision training and PTX-level optimization, prompting industry reaction.

toolKubernetes

An open-source system for automating deployment, scaling, and management of containerized applications, relevant to Modular's cluster-level offerings.

toolMeta

A major technology company whose adoption of Modular's technology would indicate significant industry validation.

toolCursor

An AI coding assistant that Chris Lattner uses personally and recommends for its ability to handle large codebases.

personSergey Brin

Co-founder of Google, mentioned as actively involved in AI development.

toolLego robotics table

A project Lattner built for his kids, modular in design, used for learning programming.

More from Latent Space

View all 70 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free