What are the core concepts and notations in flow matching?

Key concepts include the initial distribution (P0, usually noise), target distribution (P1, clean data), trajectories (XT, the path taken by an observation), flow (S(t, x0), a function mapping initial conditions to points at time t), probability path (P_t(X), the distribution of observations at time t), and the central vector field (U_t(X), indicating direction and speed of particle movement).

How does the vector field in flow matching differ from the score in score matching?

The vector field (velocity) in flow matching provides direct instructions for how particles should move and at what speed. The score, in score matching, acts more like a compass, indicating regions of high density without specifying a direct movement vector.

What is Lipschitz continuity and why is it important for flow matching?

Lipschitz continuity is a mathematical property that ensures that if the learned vector field satisfies it, then the trajectories generated by following that field will be unique for any given initial condition. This guarantees a one-to-one mapping between the velocity field and the flow, ensuring consistent particle paths.

What is the continuity equation in flow matching and what does it represent?

The continuity equation expresses the conservation of mass within a probability distribution, stating that the temporal evolution of density in a region is determined by the net inflow minus outflow of density. It provides a macroscopic view of how the probability path evolves, linking the vector field to the probability distribution.

Why is directly maximizing likelihood to learn the vector field impractical?

Earlier methods like Continuous Normalizing Flows tried to learn the vector field by maximizing likelihood, which required constantly solving expensive numerical integrals during training. This made the approach very slow and impractical, leading to the development of flow matching's direct learning approach.

How does flow matching simplify the problem to learn the vector field?

Instead of directly transporting from the initial distribution to the complex target data distribution, flow matching first formulates a simpler problem: transporting the initial distribution to a single point (Dirac Delta distribution). This allows for a tractable conditional probability path and vector field.

What is the Conditional Flow Matching (CFM) loss and why is it used?

The CFM loss is an extremely tractable loss function derived from the marginal vector field. It allows the model to directly learn the desired velocity field by minimizing the squared distance between the learned velocity and a simple target (x1 - x0), making optimization efficient and effective.

How does training and inference work in flow matching?

During training, the model samples noise (x0), clean images (x1), and time steps, constructs a noisy image (Xt), and learns the vector field by predicting x1 - x0 using the CFM loss. For inference, it samples from a standard normal distribution and numerically solves the ODE by following the learned vector field step-by-step to generate the final image.

What is the 'Reflow' procedure and why is it necessary?

The Reflow procedure is a fine-tuning technique that addresses the issue of learned trajectories becoming curved rather than straight. It involves using the newly generated pairs (from integrating the learned ODE) as new 'straight lines' to retrain the model, making trajectories straighter and enabling faster inference with fewer steps.

What are the limitations and trade-offs of the Reflow procedure?

While Reflow helps straighten trajectories, it introduces discretization errors from numerical integration and approximation errors from the neural network. Performing it too many times can lead to quality degradation. It trades off simplicity and potentially lower quality for significantly faster inference speeds.

How do diffusion, score matching, and flow matching connect?

These three generative frameworks are different lenses to solve the same problem. They can be unified under a broader theory (like stochastic interpolants) which shows that if you know any two of noise, score, or velocity, you can deduce the third, highlighting their deep interconnections.

Key Moments

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 3 - Flow matching

Stanford Online

Education5 min read108 min video

Apr 20, 2026|1,564 views|78|1

Stanford Stanford Online Large Language Models LLM

Save to Pod

Key Moments

TL;DR

Flow matching offers a new way to generate data by learning to 'transport' initial distributions to target ones, bypassing complex likelihood calculations of diffusion models.

Key Insights

Flow matching transports data distributions by learning a time-dependent vector field, which dictates particle movement between an initial (e.g., Gaussian) and target (data) distribution.

The core idea of flow matching is to directly learn this vector field, unlike diffusion models that learn to reverse a noise-adding process or score-based models that learn the score function (gradient of log-density).

A key simplification in flow matching is the 'conditional flow matching loss,' which is computationally tractable and simplifies the learning objective to matching the learned vector field with a known, simple 'conditional vector field' (x1 - x0).

The 'reflow' procedure can be used to refine the learned vector field and make trajectories straighter, improving inference speed by allowing fewer ODE solver steps.

Flow matching, diffusion, and score matching can be unified under a broader framework of stochastic interpolations, where knowing two of the three components (noise, score, velocity) allows deduction of the third.

Bridging discrete and continuous generation paradigms

This lecture introduces flow matching as a novel generation paradigm, building upon previous discussions of diffusion (DDPM) and score matching. While diffusion models learn to reverse a gradual noising process and score-based models learn the gradient of the log-probability density (the 'score'), flow matching focuses on 'transporting' data from an initial, easy-to-sample distribution to a target data distribution. The core problem remains generating new samples by moving from a simple distribution to a complex one.

Understanding trajectories, flows, and probability paths

Flow matching operates by defining a time interval, typically [0, 1], where time 0 represents the initial distribution (e.g., Gaussian noise, p0) and time 1 represents the target data distribution (p1). A 'trajectory' (xt) describes the path of an observation between these times. The 'flow' (st(x0)) is a function mapping an initial condition to its position at time t. The 'probability path' (pt(x)) represents the distribution of observations at any given time t within this interval. A crucial element is the 'vector field' (ut(x)), which provides the direction and speed for particles to move at a given location and time, essentially guiding the transport process.

The role of vector fields and ODEs

The vector field is central to flow matching. It defines an ordinary differential equation (ODE): dx/dt = u(x, t). Solving this ODE from an initial condition x0 allows us to trace the trajectory xt. A key mathematical property is that if the vector field is Lipschitz continuous, these trajectories are unique for a given initial condition, ensuring predictable particle movement. This contrasts with certain pathological vector fields where trajectories from the same starting point can diverge significantly. The vector field thus acts as a set of instructions for how data points should move through space and time.

Conservation of mass and the continuity equation

The transport process must conserve 'mass,' meaning density should not be created or destroyed. This principle is captured by the continuity equation, which states that the temporal change in probability density at a location is equal to the net inflow of density into that location. Mathematically, this is expressed as ∂pt/∂t = -∇ ⋅ (pt ⋅ ut), where ∇ ⋅ is the divergence operator and pt ⋅ ut is the probability flux. This equation links the evolution of the probability path (macro perspective) with the vector field governing individual particle movement (micro perspective). The vector field ut(x) is said to 'generate' the probability path pt if it satisfies this continuity equation.

Estimating the vector field: from maximum likelihood to flow matching

The goal is to learn the unknown vector field ut(x) that maps p0 to p1. Historically, methods like continuous normalizing flows attempted this by maximizing the likelihood of the training data. However, this involves complex integral calculations and is computationally expensive. Flow matching offers a more direct approach: learning the vector field by minimizing a tractable loss function. Instead of directly optimizing for the likelihood of the target distribution, flow matching aims to match the learned vector field (uθ(x)) to a target vector field. The initial approach for this involved constructing a 'marginal probability path' and its associated 'marginal vector field', which were derived from conditional paths generated from individual data points to a single target point.

Simplifying the loss with conditional flow matching

A significant breakthrough is the development of the 'conditional flow matching loss.' This loss is derived by showing that optimizing the original flow matching loss (which involves expectations over the marginal probability path) is equivalent to optimizing a simpler loss based on conditional probability paths. Specifically, the target vector field for learning simplifies to 'x1 - x0', where x1 is a data point and x0 is a corresponding sample from the initial distribution. This makes the learning objective remarkably tractable: minimize the squared difference between the learned vector field and (x1 - x0), averaged over sampled time steps and data pairs. This simplified loss allows for efficient training using standard deep learning techniques.

The 'reflow' procedure and practical considerations

At inference time, after training the vector field, one solves the ODE numerically to generate samples. However, initial trajectories might be curved, requiring many steps for accurate ODE solvers and potentially leading to inaccuracies. The 'reflow' procedure addresses this by iteratively refining the learned vector field. It uses generated trajectories to create new target pairs (x0, x1) and retrains the model. This process aims to 'straighten' the trajectories, making them more linear and thus amenable to faster inference with fewer ODE steps, trading off some potential quality for speed. The lecture also touches upon potential issues like averaging of velocities at intersection points and the limitations of standard ODE solvers when dealing with complex learned vector fields.

Unifying generative frameworks

The lecture concludes by tying together diffusion, score matching, and flow matching. These frameworks, while distinct in their approach (noise prediction, score estimation, velocity learning), can be unified. They all involve transforming a distribution from a simple initial state to a complex target data distribution. The key difference lies in what is being modeled: noise in diffusion, score in score matching, and velocity in flow matching. A unifying theory, such as 'stochastic interpolations,' suggests that these components are deeply related, and knowing two of them allows for the deduction of the third, indicating a shared underlying mathematical structure across these powerful generative modeling techniques.

Mentioned in This Episode

●Software & Apps

●Books

●Concepts

Common Questions

Flow matching is a generative paradigm focused on transporting an initial data distribution (e.g., Gaussian noise) to a target data distribution (e.g., real images) by learning a vector field. It's a third approach alongside diffusion (DDPM) and score matching, aiming for a direct, deterministic path.

Topics

AI & Machine Learning Programming & Software Science & Mathematics Neural Networks Diffusion Models Generative Models Stochastic Processes Optimization Algorithms Machine Learning Theory Score-based Models Flow-based Models

Mentioned in this video

Concepts

Score Matching

A generation paradigm covered in Lecture 2, focusing on learning the score (gradient of the log probability) to guide sampling towards high-density regions.

Flow matching

The primary generation paradigm discussed in this lecture, which aims to transport an initial data distribution to a target distribution by learning a vector field.

L2 Regression

A type of loss function used in both DDPM and score matching, effectively measuring the squared difference between predicted and actual values (e.g., noise or score).

Langevin Sampling

A sampling technique mentioned in the score matching context, used to move towards high-density regions while also allowing for diversity through a noise term.

Stochastic Differential Equation

An equation that provides a continuous view linking diffusion and score matching, composed of a drift term (deterministic movement) and a diffusion term (stochasticity).

Continuity equation

A concept from physics that quantifies the conservation of mass, asserting that the temporal evolution of density is equal to the net inflow minus outflow of density in a given region.

Divergence Operator

An operator that measures the net outflow of a vector field from an infinitesimal volume, used in the continuity equation to quantify density changes.

Continuous Normalizing Flow

An earlier method for learning vector fields by maximizing likelihood, which was found to be computationally expensive and slow.

Lipschitz Continuity

A mathematical property of functions ensuring that trajectories generated by a vector field are unique, and crucial for a one-to-one mapping between velocity and flow in flow matching models.

Neural Network

Used as the model behind the learned vector field, which, when composed of matrix multiplications and smooth activation functions, naturally results in a Lipschitz continuous field.

Reflow procedure

A retraining or fine-tuning procedure in flow matching that iteratively 'straightens' the learned trajectories by using newly generated pairs, leading to faster inference.

Dirac Delta Distribution

A deterministic probability distribution where all probability density is concentrated at a single point, used to simplify the problem of transporting to a single point.

Books

Flow Matching for Generative Modeling

The foundational paper for the flow matching method, which the speaker highly recommends reading for deeper understanding.

Stochastic Interpolants

A paper recommended for students interested in a unified theory that links diffusion, score matching, and flow matching by showing that knowledge of any two of noise, score, or velocity allows deduction of the third.

Software & Apps

DPM-Solver

A specific solver mentioned in the context of diffusion models that is not applicable to flow matching because flow matching's ODEs are all nonlinear.

Euler method

A numerical solver used in practice to integrate Ordinary Differential Equations (ODEs) for mapping initial data to target data in flow matching.

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Get Started Free