Key Moments
Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 3 - Flow matching
Key Moments
Flow matching offers a new way to generate data by learning to 'transport' initial distributions to target ones, bypassing complex likelihood calculations of diffusion models.
Key Insights
Flow matching transports data distributions by learning a time-dependent vector field, which dictates particle movement between an initial (e.g., Gaussian) and target (data) distribution.
The core idea of flow matching is to directly learn this vector field, unlike diffusion models that learn to reverse a noise-adding process or score-based models that learn the score function (gradient of log-density).
A key simplification in flow matching is the 'conditional flow matching loss,' which is computationally tractable and simplifies the learning objective to matching the learned vector field with a known, simple 'conditional vector field' (x1 - x0).
The 'reflow' procedure can be used to refine the learned vector field and make trajectories straighter, improving inference speed by allowing fewer ODE solver steps.
Flow matching, diffusion, and score matching can be unified under a broader framework of stochastic interpolations, where knowing two of the three components (noise, score, velocity) allows deduction of the third.
Bridging discrete and continuous generation paradigms
This lecture introduces flow matching as a novel generation paradigm, building upon previous discussions of diffusion (DDPM) and score matching. While diffusion models learn to reverse a gradual noising process and score-based models learn the gradient of the log-probability density (the 'score'), flow matching focuses on 'transporting' data from an initial, easy-to-sample distribution to a target data distribution. The core problem remains generating new samples by moving from a simple distribution to a complex one.
Understanding trajectories, flows, and probability paths
Flow matching operates by defining a time interval, typically [0, 1], where time 0 represents the initial distribution (e.g., Gaussian noise, p0) and time 1 represents the target data distribution (p1). A 'trajectory' (xt) describes the path of an observation between these times. The 'flow' (st(x0)) is a function mapping an initial condition to its position at time t. The 'probability path' (pt(x)) represents the distribution of observations at any given time t within this interval. A crucial element is the 'vector field' (ut(x)), which provides the direction and speed for particles to move at a given location and time, essentially guiding the transport process.
The role of vector fields and ODEs
The vector field is central to flow matching. It defines an ordinary differential equation (ODE): dx/dt = u(x, t). Solving this ODE from an initial condition x0 allows us to trace the trajectory xt. A key mathematical property is that if the vector field is Lipschitz continuous, these trajectories are unique for a given initial condition, ensuring predictable particle movement. This contrasts with certain pathological vector fields where trajectories from the same starting point can diverge significantly. The vector field thus acts as a set of instructions for how data points should move through space and time.
Conservation of mass and the continuity equation
The transport process must conserve 'mass,' meaning density should not be created or destroyed. This principle is captured by the continuity equation, which states that the temporal change in probability density at a location is equal to the net inflow of density into that location. Mathematically, this is expressed as ∂pt/∂t = -∇ ⋅ (pt ⋅ ut), where ∇ ⋅ is the divergence operator and pt ⋅ ut is the probability flux. This equation links the evolution of the probability path (macro perspective) with the vector field governing individual particle movement (micro perspective). The vector field ut(x) is said to 'generate' the probability path pt if it satisfies this continuity equation.
Estimating the vector field: from maximum likelihood to flow matching
The goal is to learn the unknown vector field ut(x) that maps p0 to p1. Historically, methods like continuous normalizing flows attempted this by maximizing the likelihood of the training data. However, this involves complex integral calculations and is computationally expensive. Flow matching offers a more direct approach: learning the vector field by minimizing a tractable loss function. Instead of directly optimizing for the likelihood of the target distribution, flow matching aims to match the learned vector field (uθ(x)) to a target vector field. The initial approach for this involved constructing a 'marginal probability path' and its associated 'marginal vector field', which were derived from conditional paths generated from individual data points to a single target point.
Simplifying the loss with conditional flow matching
A significant breakthrough is the development of the 'conditional flow matching loss.' This loss is derived by showing that optimizing the original flow matching loss (which involves expectations over the marginal probability path) is equivalent to optimizing a simpler loss based on conditional probability paths. Specifically, the target vector field for learning simplifies to 'x1 - x0', where x1 is a data point and x0 is a corresponding sample from the initial distribution. This makes the learning objective remarkably tractable: minimize the squared difference between the learned vector field and (x1 - x0), averaged over sampled time steps and data pairs. This simplified loss allows for efficient training using standard deep learning techniques.
The 'reflow' procedure and practical considerations
At inference time, after training the vector field, one solves the ODE numerically to generate samples. However, initial trajectories might be curved, requiring many steps for accurate ODE solvers and potentially leading to inaccuracies. The 'reflow' procedure addresses this by iteratively refining the learned vector field. It uses generated trajectories to create new target pairs (x0, x1) and retrains the model. This process aims to 'straighten' the trajectories, making them more linear and thus amenable to faster inference with fewer ODE steps, trading off some potential quality for speed. The lecture also touches upon potential issues like averaging of velocities at intersection points and the limitations of standard ODE solvers when dealing with complex learned vector fields.
Unifying generative frameworks
The lecture concludes by tying together diffusion, score matching, and flow matching. These frameworks, while distinct in their approach (noise prediction, score estimation, velocity learning), can be unified. They all involve transforming a distribution from a simple initial state to a complex target data distribution. The key difference lies in what is being modeled: noise in diffusion, score in score matching, and velocity in flow matching. A unifying theory, such as 'stochastic interpolations,' suggests that these components are deeply related, and knowing two of them allows for the deduction of the third, indicating a shared underlying mathematical structure across these powerful generative modeling techniques.
Mentioned in This Episode
●Software & Apps
●Books
●Concepts
Common Questions
Flow matching is a generative paradigm focused on transporting an initial data distribution (e.g., Gaussian noise) to a target data distribution (e.g., real images) by learning a vector field. It's a third approach alongside diffusion (DDPM) and score matching, aiming for a direct, deterministic path.
Topics
Mentioned in this video
A generation paradigm covered in Lecture 2, focusing on learning the score (gradient of the log probability) to guide sampling towards high-density regions.
The primary generation paradigm discussed in this lecture, which aims to transport an initial data distribution to a target distribution by learning a vector field.
A type of loss function used in both DDPM and score matching, effectively measuring the squared difference between predicted and actual values (e.g., noise or score).
A sampling technique mentioned in the score matching context, used to move towards high-density regions while also allowing for diversity through a noise term.
An equation that provides a continuous view linking diffusion and score matching, composed of a drift term (deterministic movement) and a diffusion term (stochasticity).
A concept from physics that quantifies the conservation of mass, asserting that the temporal evolution of density is equal to the net inflow minus outflow of density in a given region.
An operator that measures the net outflow of a vector field from an infinitesimal volume, used in the continuity equation to quantify density changes.
An earlier method for learning vector fields by maximizing likelihood, which was found to be computationally expensive and slow.
A mathematical property of functions ensuring that trajectories generated by a vector field are unique, and crucial for a one-to-one mapping between velocity and flow in flow matching models.
Used as the model behind the learned vector field, which, when composed of matrix multiplications and smooth activation functions, naturally results in a Lipschitz continuous field.
A retraining or fine-tuning procedure in flow matching that iteratively 'straightens' the learned trajectories by using newly generated pairs, leading to faster inference.
A deterministic probability distribution where all probability density is concentrated at a single point, used to simplify the problem of transporting to a single point.
The foundational paper for the flow matching method, which the speaker highly recommends reading for deeper understanding.
A paper recommended for students interested in a unified theory that links diffusion, score matching, and flow matching by showing that knowledge of any two of noise, score, or velocity allows deduction of the third.
A specific solver mentioned in the context of diffusion models that is not applicable to flow matching because flow matching's ODEs are all nonlinear.
A numerical solver used in practice to integrate Ordinary Differential Equations (ODEs) for mapping initial data to target data in flow matching.
More from Stanford Online
View all 25 summaries
70 minStanford Robotics Seminar ENGR319 | Spring 2026 | Robot Learning from Human Experience
79 minStanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs
90 minStanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 3: Architectures
87 minStanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 4: Attention Alternatives
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Get Started Free