What were the main challenges in developing Modular's technology?

A significant challenge was proving that a startup could achieve state-of-the-art performance without relying on existing proprietary systems like CUDA. Many in the industry believed this was impossible for a new player.

How does Mojo differ from Python and what are its advantages?

Mojo is designed to be a high-performance language in the Python family, offering significantly faster execution speeds, often surpassing even Rust. It aims to make it easier to write performant code, especially for GPUs, and to seamlessly extend existing Python codebases without complex bindings.

What is the Max framework and how does it integrate with Mojo?

Max is an AI inference-focused framework built by Modular that is designed to work directly with Mojo. It emphasizes performance, control, and low latency, aiming to keep Python out of the critical model logic loop and offering features like automatic kernel fusion.

How does Modular's modular design philosophy benefit AI development?

Modular's approach, inspired by LLVM, allows for clean separation of concerns among teams specializing in different areas like parsing, optimization, and code generation. This modularity makes the system more adaptable to rapidly changing AI requirements and easier to evolve compared to monolithic frameworks.

What is the difference between the approaches of Modular, VLM, and SGLang?

Modular focuses on a tightly integrated, co-designed stack for performance and control. VLM, while ambitious, is described as a 'hot mess' with broad but sometimes unreliable hardware support. SGLang is seen as a more focused and goal-oriented project.

What is Modular's business model and how do they make money?

The core Max framework and Mojo language are free to use on NVIDIA and CPUs at any scale. Modular charges for cluster management, enterprise support, and services that help customers scale their AI workloads and manage compute.

What lessons did Chris Lattner learn from developing Swift at Apple?

Key lessons include avoiding a 'hot start' by launching at an early version (Mojo started as 0.1), and ensuring internal adoption by using the technology within the company (Modular is Mojo's first customer), unlike Swift which had limited internal awareness at launch.

Why is focusing on inference strategic for Modular?

Lattner believes inference scales with the customer base and application of AI, whereas training scales with research teams. By focusing on production use cases and the gap between research and deployment, Modular positions itself for broader growth as AI adoption increases.

What is the role of AI coding tools like Cursor in software development?

AI coding tools are seen as valuable assistants that can significantly scale the 'on-ramp' for learning new languages or technologies. They handle much of the mechanical work and boilerplate, allowing developers to focus on core logic and problem-solving, acting like hiring another engineer for the team.

What is the fundamental purpose of writing code?

Code's primary purpose is for humans to understand what it does, not just to instruct the computer. It needs to be readable, expressive, and allow reasoning about constraints like latency, memory, and cost, even as AI tools become more prevalent in code generation.

What are Chris Lattner's personal learnings from leading Modular?

Lattner finds that leading his own startup is more personal than working at established companies, making team departures more impactful. He also emphasizes the learning curve in building infrastructure from scratch and managing team dynamics during periods of intense development and uncertainty.

Key Moments

The Shape of Compute (Chris Lattner of Modular)

Latent Space Podcast

Science & Technology4 min read79 min video

Jun 13, 2025|48,549 views|1,126|69

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

Modular's ML stack breaks CUDA monopoly, offers high performance via Mojo language and Max inference framework.

Key Insights

Modular aims to democratize AI by providing a vertically integrated, high-performance software stack that isn't tied to proprietary hardware like CUDA.

Mojo, a new programming language, is designed for performance-critical applications, offering Python-like syntax with C++ or Rust-level speed and GPU support.

Max, Modular's inference framework, is built to be highly performant, flexible, and optimized for GenAI workloads, integrating seamlessly with Mojo.

The company strategically focused on inference first, recognizing its growing importance and scalability to customer bases, rather than just training.

Modular emphasizes composability and modular design in its software stack, allowing for easier evolution, adaptation to new hardware, and community contributions.

Modular's business model offers its core language and framework for free, with revenue generated from enterprise-level cluster management and support, encouraging widespread adoption.

FROM RESEARCH TO PRODUCTION: MODULAR'S EVOLUTION

Modular, founded over three years ago, is currently in a pivotal phase of its development, transitioning from a secretive R&D phase to open-sourcing its technology and scaling its impact. The initial R&D period, spanning roughly three years, was dedicated to solving complex problems like unlocking heterogeneous compute and simplifying GPU programming. This phase focused on proving the core hypotheses and achieving state-of-the-art performance, even against NVIDIA's best, without relying on CUDA. The goal was to build a system that Chris Lattner himself would deem performant enough before broader release.

THE MOJO PROGRAMMING LANGUAGE: PERFORMANCE AND USABILITY

At the heart of Modular's stack is Mojo, a new programming language designed to address the limitations of existing languages for accelerated hardware. Mojo aims to provide the performance and low-level control necessary for GPUs and other specialized chips, while maintaining a Python-like syntax and user experience. This blend of usability and performance is intended to enable developers to write highly efficient code, extend Python performance, and easily target diverse hardware without the complexity of C++ or the limitations of Python's GIL.

MAX: A FLEXIBLE FRAMEWORK FOR GENAI INFERENCE

Complementing Mojo, Modular has developed Max, a GenAI inference-focused framework. Max is engineered for high performance, low latency, and control, specifically targeting the demands of modern AI models. It integrates seamlessly with Mojo, allowing for customized kernels and automatic kernel fusion, thereby reducing complexity for developers. The framework supports a wide array of model families and is designed to be highly hackable, enabling researchers and developers to build and optimize custom models efficiently.

BREAKING THE CUDA MONOPOLY AND HARDWARE AGNOSTICISM

A central tenet of Modular's strategy is to break NVIDIA's CUDA monopoly and provide a hardware-agnostic software stack. The company aims to achieve state-of-the-art performance on various hardware, including AMD GPUs, and future architectures. By developing Mojo and Max as replacements for proprietary stacks, Modular enables developers to write portable code that can run efficiently across different vendors, reducing vendor lock-in and fostering greater innovation in the AI hardware space by lowering the cost and complexity of hardware adoption.

MODULAR'S STRATEGIC FOCUS ON INFERENCE

Modular made a conscious decision early on to focus on inference rather than training, a choice that was considered contrarian at the time. This focus was driven by the understanding that inference scales with the size of the customer base and the applications of AI, not just the research team. As AI models move into production and affect billions of users, inference becomes the critical bottleneck. By optimizing this part of the stack, Modular positions itself to capture immense value as AI adoption grows and moves from research labs into widespread commercial use and applications.

BUSINESS MODEL AND OPEN SOURCE STRATEGY

Modular's go-to-market strategy involves offering the core Mojo language and Max framework completely free, encouraging widespread adoption and community contributions. Revenue is generated through optional enterprise offerings, such as cluster management, specialized support, and multi-hardware orchestration. This approach allows individuals and smaller teams to leverage high-performance AI tools without upfront costs, while providing scalable solutions for larger organizations. The company's commitment to open source, including full version control history, aims to foster transparency, collaboration, and rapid iteration.

THE IMPORTANCE OF MODULARITY AND COMPOSABILITY

The company's name, Modular, reflects its core design philosophy: building systems that are composable, orthogonal, and flexible. This approach, learned from past experiences like the development of LLVM, is crucial for managing complexity and enabling rapid evolution in the fast-paced AI landscape. By avoiding monolithic designs and focusing on clean, well-defined interfaces between components, Modular can more easily adapt to new hardware, integrate cutting-edge research, and allow specialized teams to contribute effectively, accelerating overall progress in AI development.

EMPOWERING DEVELOPERS AND ACCELERATING AI ADOPTION

Ultimately, Modular aims to empower a broader range of developers to work with AI and GPUs by reducing complexity and providing accessible, high-performance tools. The company believes that by making advanced capabilities easier to use and understand, they can accelerate the pace of AI innovation and lead to more impactful products and applications. This empowering vision extends to upskilling the workforce, enabling more people to program GPUs and contribute the next wave of AI advancements, moving beyond a focus solely on large, well-resourced labs.

Mentioned in This Episode

●Products

●Software & Apps

●Tools

●Companies

●Organizations

●Concepts

●People Referenced

Common Questions

Modular is a company developing a new AI stack with the goal of unlocking heterogeneous compute and simplifying GPU programming. Their primary aim is to drive out complexity in the AI stack.

Topics

AI & Machine Learning Technology & Innovation Programming & Software AI Infrastructure AI Inference GPU Programming Programming Language Design Compiler Technology Heterogeneous Compute LLM Serving

Mentioned in this video

Software & Apps

LLVM

A compiler infrastructure project that Chris Lattner was instrumental in developing, serving as a precedent for Modular's approach.

VLM

An open-source inference framework from Berkeley, compared with Modular's approach, noted for its broad but sometimes unreliable hardware support.

TensorFlow

Google's open-source machine learning framework, credited with democratizing AI and setting the stage for PyTorch's open-source approach.

Mistral

A prominent AI company whose adoption of Modular's technology would signify wider industry acceptance.

PyTorch

A popular machine learning framework that Lattner sees as a model for how Mojo could be adopted.

Kubernetes

An open-source system for automating deployment, scaling, and management of containerized applications, relevant to Modular's cluster-level offerings.

Cursor

An AI coding assistant that Chris Lattner uses personally and recommends for its ability to handle large codebases.

CUDA

NVIDIA's parallel computing platform and API, which Modular aims to replace or abstract away with its own stack.

Llama 3

A large language model mentioned as a benchmark for state-of-the-art performance in serving.

Companies

Apple

The company where Chris Lattner developed LLVM and Swift, and where he had to navigate internal resistance to new technologies.

Google

A company where Lattner previously worked, known for its early contributions to AI research and its decision to open-source TensorFlow.

DeepSeek

A research team that released impressive models and pushed advancements in low-precision training and PTX-level optimization, prompting industry reaction.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free