Lip Forcing distills a 14B bidirectional video diffusion teacher into autoregressive students that achieve real-time lip synchronization at 31 FPS using two denoising steps without CFG.
super hub Mixed citations
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Mixed citation behavior. Most common role is background (53%).
abstract
We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions \pi_0 and \pi_1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from \pi_0 and \pi_1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of \pi_0 and \pi_1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions \pi_0 and \pi_1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from \pi_0 and \pi_1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily sca
authors
co-cited works
representative citing papers
WavTTS is the first raw-waveform diffusion TTS model using DiT flow matching and multi-scale mel supervision that approaches SOTA latent zero-shot performance while beating prior end-to-end models.
AnyFlow enables any-step video diffusion by distilling flow-map transitions over arbitrary time intervals with on-policy backward simulation.
Data geometry makes time identifiable from noisy interpolants at rate O(1/sqrt(d-k)), rendering the time-blindness gap asymptotically negligible relative to coupling variance.
Flux Matching generalizes score-based generative modeling by using a weaker objective that admits infinitely many non-conservative vector fields with the data as stationary distribution, enabling new design choices beyond traditional score matching.
Derives closed-form posterior covariance for flow matching from divergence of velocity field, enabling post-hoc uncertainty on pre-trained models including one-step generators.
FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.
ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
OP-GRPO is the first off-policy GRPO method for flow-matching models that reuses trajectories via replay buffer and importance sampling corrections, matching on-policy performance with 34.2% of the training steps.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
Normalizing flows are constructed by learning the velocity of a stochastic interpolant via a quadratic loss derived from its probability current, yielding an efficient ODE-based alternative to diffusion models.
Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.
FlexiSLM is the first spoken language model supporting dynamic and controllable frame rates on speech input and output, outperforming fixed-rate 7B models at high quality and enabling faster inference at lower rates like 6.25 Hz.
Panel Flow Matching is a generative method to estimate panel densities from longitudinal data with statistical guarantees under irregular sampling, supporting completion, synthetic data, and classification.
MammoFlow adds geometric alignment and EMD tissue-distribution consistency to a pretrained flow-matching model to generate anatomically paired mammograms, reporting superior quality and a 5% downstream AUC gain.
TempAct introduces a planner-executor RL framework with hierarchical group exploration and rewards to improve temporal consistency in autoregressive video diffusion models.
PRA approximates sequential rollout training in parallel for pixel-space AR models via intermediate states and a pixel decoder, achieving FID 2.58 (135M params) and 1.94 (511M params) on ImageNet-1K 256x256, new SOTA among pixel-space AR models.
PolyFlow converts discrete meshes to continuous per-vertex representations using a topology embedder and applies flow matching for parallel artist-style mesh generation that outperforms autoregressive baselines on Toys4K in Chamfer and Hausdorff distances.
SharpMoE is a plug-and-play post-training method that uses clean latent features and a trajectory routing loss to enable accurate saliency-based routing in diffusion MoE models for improved visual generation.
OTF-CBM replaces static cosine similarity in vision-language CBMs with data-driven optimal transport flow to improve concept alignment, accuracy, and faithfulness.
FAPS is a new function-space posterior sampling method built on flow-matching priors that unifies stochastic-process regression and PDE inverse problems while avoiding explicit prior density evaluation.
CoDMD adds a copula-matching regularizer to DMD for distilling 50-step video diffusion models to 4 steps, reporting VBench scores of 84.46/84.87 on 1.3B/14B Wan-2.1-T2V models.
IFM learns deterministic tangent velocity fields on CP^{d-1} via Pancharatnam phase-aligned paths, recovering marginal transport with endpoint and stability guarantees while showing empirical gains over Euclidean flow matching on quantum benchmarks.
citing papers explorer
-
Flow Annealing Posterior Sampling for Function-Space Regression and Inverse Problems
FAPS is a new function-space posterior sampling method built on flow-matching priors that unifies stochastic-process regression and PDE inverse problems while avoiding explicit prior density evaluation.
-
Training-Free Generative Sampling via Moment-Matched Score Smoothing
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
-
Is Flow Matching Just Trajectory Replay for Sequential Data?
Flow matching on time series targets a closed-form nonparametric velocity field that is a similarity-weighted mixture of observed transition velocities, making neural models approximations to an ideal memory-augmented dynamical system sampler.
-
Flow-Based Conformal Predictive Distributions
Differentiable nonconformity scores induce flows that sample conformal prediction set boundaries, and mixing flows across levels produces conformal predictive distributions whose quantiles match the sets.
-
On The Hidden Biases of Flow Matching Samplers
Empirical flow matching introduces coupled biases from plug-in estimation, including altered statistical targets, non-gradient minimizers, and non-unique dynamics via flux-null fields, with base distribution controlling kinetic energy tails.
-
SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate
SURGE is an unbiased particle filter that fuses diffusion-model simulations with noisy observations via sequential Monte Carlo reweighting over diffusion trajectories.
-
Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
-
Conditional flow matching for physics-constrained inverse problems with finite training data
Conditional flow matching learns a velocity field to sample from measurement-conditioned posteriors in physics inverse problems, with early stopping to prevent variance collapse and selective memorization under finite training data.
-
Notes on generative modeling: flow matching, diffusion, optimal transport and Schr{\"o}dinger bridge
Notes recapitulating high-level principles of generative modeling and showing connections between optimal transport, Schrödinger bridge, and flow matching.