Key Moments

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

Latent Space PodcastLatent Space Podcast
Science & Technology4 min read69 min video
Feb 19, 2024|998 views|28|2
Save to Pod
TL;DR

Modal offers a serverless cloud platform for developers, focusing on developer productivity and efficient infrastructure for AI and data workloads.

Key Insights

1

Modal was founded to address developer productivity issues in the cloud, particularly for data and AI teams, by simplifying infrastructure management.

2

The platform leverages a custom container runtime and filesystem to achieve fast container startup times, enabling a seamless local-to-cloud development experience.

3

Modal's Python SDK allows developers to define infrastructure and applications in code, offering a programmable and configuration-free approach.

4

The 'serverless native' approach, particularly with GPUs, is a strong fit for AI workloads like model inference, offering cost and latency benefits.

5

While initially focused on data teams, Modal has proven popular with AI engineers for custom models, complex workflows, and large-scale inference.

6

The 'sandbox' feature is evolving into a platform for platforms, enabling other companies to run code and containers on Modal programmatically.

FROM SPOTIFY TO MODAL: A CAREER IN DATA INFRASTRUCTURE

Erik Bernhardsson's journey began at Spotify, where he developed early open-source tools like Annoy (a vector database) and Luigi (a workflow engine). He then served as CTO for Better, scaling their engineering team significantly. These experiences, particularly with data infrastructure and managing large teams, laid the groundwork for his current venture, Modal.

THE BIRTH OF MODAL: REVOLUTIONIZING DEVELOPER PRODUCTIVITY

Modal was created to solve the perceived friction and inefficiency in cloud development, especially for data and AI teams. Bernhardsson observed that existing tools often lacked integration and presented challenges with environment management, GPU access, and slow feedback loops. The core idea was to build a high-performance cloud that makes running code feel as seamless as running it locally.

MODAL'S ARCHITECTURE: FAST CONTAINERS AND PROGRAMMABLE INFRASTRUCTURE

A key innovation at Modal is its custom container runtime and virtual filesystem, built on primitives like runC. This allows for exceptionally fast container cold starts by efficiently handling file access and reducing image bloat. The platform's Python SDK enables developers to define their entire application and infrastructure in code, eliminating configuration files and providing a unified, programmable interface.

SERVERLESS AND AI: A SERENDIPITOUS FIT

The rise of generative AI, particularly models requiring GPUs for inference, proved a serendipitous match for Modal's serverless architecture. The 'bursty' nature of AI workloads, with large computations triggered by small inputs, aligns well with Modal's ability to quickly provision and de-provision resources, offering both cost savings and low latency for tasks like Stable Diffusion image generation.

BEYOND INFERENCE: EXPANDING MODAL'S CAPABILITIES

While model inference became an early killer app, Modal is expanding to support broader AI workloads including training, fine-tuning, and data preprocessing. The platform also caters to diverse use cases beyond AI, such as web scraping, scientific computing, and video processing, demonstrating its generality as a compute platform. The focus remains on providing better compute primitives.

PLATFORM FOR PLATFORMS: THE EVOLVING SANDBOX

Modal's 'sandbox' feature, inspired by code execution environments, is evolving into a 'platform for platforms.' This allows other companies to programmatically leverage Modal's container execution capabilities as a backend service, essentially offering 'Functions as a Service as a Service.' This indicates a future direction where Modal provides core infrastructure that others can build specialized services upon.

NAVIGATING THE AI INFRASTRUCTURE LANDSCAPE

Bernhardsson positions Modal as a 'second layer' cloud provider, similar to Snowflake's model of building on top of major cloud providers but offering enhanced developer experience. He contrasts this with traditional PaaS providers like Heroku, noting the 'graduation problem' where companies outgrow them. Modal aims to capture the full spectrum of users, from hobbyists to enterprises, by focusing on customizability and cost-efficiency.

THE ECONOMICS OF AI COMPUTE AND PRICING STRATEGIES

Modal focuses on custom models and workflows, differentiating itself from the price wars in the commodity LLM inference market. The platform's ability to drive high GPU utilization and its custom infrastructure allow it to charge a premium while still offering cost advantages over less optimized solutions. The goal is to build a sustainable business with healthy margins, rather than solely relying on subsidized compute.

THE VALUE OF TALENT AND TECHNICAL CHALLENGES

Modal actively recruits highly talented individuals, including competitive programmers, to tackle complex infrastructure challenges like resource allocation and scheduling. Bernhardsson believes this 'talent intensity' is crucial for building robust systems. He also notes that while AI may increase engineer productivity, it's unlikely to replace engineers but rather shift demand towards more sophisticated application development.

FUTURE DEVELOPMENTS AND UNDERRATED ASPECTS

Looking ahead, Modal is focused on building primitives for more IO-intensive workloads, such as direct TCP tunnels for real-time applications and video processing. The platform also aims to develop higher-level functional primitives for data manipulation. Bernhardsson highlights Oracle Cloud's GPUs as a surprisingly good and cost-effective option, further diversifying Modal's underlying infrastructure.

Common Questions

Modal provides serverless infrastructure designed to improve developer productivity, particularly for data and AI engineers. It aims to simplify running code in the cloud by offering fast container startups, easy access to GPUs, and a programmatic way to define and scale applications.

Topics

Mentioned in this video

Software & Apps
Modal

A company building serverless infrastructure for AI and data teams, focusing on developer productivity and efficient compute.

AWS Lambda

A serverless compute service by Amazon, mentioned in the context of providing serverless capabilities.

Eric Bot

A prototype built by Modal that downloads Slack history, fine-tunes a model, and creates a chat interface to 'clone' a person.

Gurobi

Optimization solver software that some Modal users employ for mixed-integer programming problems.

Dato

A vector database developed by Erik Bernhardsson, which was used widely in 2012, ahead of the current vector database trend.

Docker

A containerization platform whose image building and startup process is described as resource-inefficient by Erik Bernhardsson.

ffmpeg

A cross-platform solution to record, convert and stream audio and video. Modal can run it for video processing tasks.

Kubernetes

Container orchestration system that Erik Bernhardsson finds less suitable for data teams compared to backend teams.

Plumi

A tool mentioned as an analogy for AWS CDK, which Modal's approach aims to improve upon.

Stable Diffusion

An AI model that became a key early use case and driver of adoption for Modal.

Heroku

A past PaaS provider whose 'graduation problem' (companies outgrowing its capabilities) is a key lesson for Modal.

Luigi

A workflow engine developed by Erik Bernhardsson at Spotify, predating tools like Airflow.

Runc

A low-level container runtime primitive that Modal utilizes.

Guß.ai

A Google product used for container security and isolation within Modal's sandbox feature.

Companies
OpenAI

A leading AI research company whose APIs are a common starting point for many users, but not a primary focus for Modal's competition.

DataDog

A monitoring and analytics platform mentioned as an example of a successful infrastructure company that captures both hobbyists and large enterprise customers.

Spotify

Music streaming service where Erik Bernhardsson started his career, working on scalable music recommendation systems.

Modular

A company with a similar name and a focus on Mojo and an inference engine, though with a different business model (licensed software vs. cloud service).

Oracle Cloud Infrastructure

A cloud provider that Modal uses and finds to be a great value for money, especially for its bare metal GPUs.

Snowflake

A cloud data warehousing company mentioned as an example of a successful second-layer cloud provider.

Replicate

A platform focused on AI model inference that competes with Modal in a small sliver of the market, primarily targeting front-end engineers with off-the-shelf models.

Better

A company where Erik Bernhardsson served as CTO for six years, scaling the engineering team.

Pinecone

A vector database company whose CEO Erik Bernhardsson met, expressing initial fear that Modal might be building a competing vector database.

Cloudflare

A company whose infrastructure growth story, involving significant investment in physical networks, is drawn as a partial analogy to Modal's journey.

Ramp

A user of Modal that fine-tunes 100 models simultaneously and uses Modal for batch embeddings.

More from Latent Space

View all 185 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free