PAINT reframes asynchronous flow-based action chunking as an initial noise selection problem solved via backward Euler inversion and a repainting rule.
Bidirectional decoding: Improving action chunking via closed-loop resampling
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 11representative citing papers
Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.
VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.
AutoSpeed optimizes visuomotor policies over candidate trajectories at varying speeds using a composite cost of prediction error versus horizon length, with DCT-based modulation, yielding shorter execution times and higher success rates while producing speeds that align with task stages.
ReStruct steers robot policies at inference time by reconfiguring task structure with neural automata and synchronous products, claiming up to 25% gains over VLA models in success and preference adherence.
vla.cpp is a unified C++ runtime that serves multiple VLA architectures with flow-matching and diffusion patterns, matching SOTA performance on LIBERO while running on low-memory embedded hardware.
Legato trains flow-based VLA policies with schedule-shaped action-noise mixtures and randomized conditions to achieve smoother trajectories and ~10% faster task completion than real-time chunking across five real-world manipulation tasks.
Real-time chunking (RTC) allows diffusion- and flow-based action chunking policies to execute smoothly and asynchronously, maintaining high success rates on dynamic tasks even with significant inference latency.
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
POTR augments RTC guidance for flow-matching policies by adding a data-prior scale to the weight schedule and constraining the perpendicular component of the guidance vector within a trust region, yielding smoother actions and higher success rates on LIBERO.
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
citing papers explorer
-
Start Right, Arrive Right: Asynchronous Execution via Initial Noise Selection
PAINT reframes asynchronous flow-based action chunking as an initial noise selection problem solved via backward Euler inversion and a repainting rule.
-
DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors
Discrete diffusion policies act as natural asynchronous executors for robotics by treating action generation as iterative unmasking, yielding higher success rates and lower computation than flow-matching real-time chunking in dynamic tasks.
-
VLA-Corrector: Lightweight Detect-and-Correct Inference for Adaptive Action Horizon
VLA-Corrector adds a detect-and-correct inference layer using a latent vision monitor and online gradient guidance to enable adaptive action horizons in chunked VLA policies.
-
AutoSpeed: Annotation-Free Stage-Adaptive Motion Speed Learning for Robot Manipulation
AutoSpeed optimizes visuomotor policies over candidate trajectories at varying speeds using a composite cost of prediction error versus horizon length, with DCT-based modulation, yielding shorter execution times and higher success rates while producing speeds that align with task stages.
-
Inference-Time Robot Behavior Steering through Physically-Aware Reconfiguration of Task-Structure
ReStruct steers robot policies at inference time by reconfiguring task structure with neural automata and synchronous products, claiming up to 25% gains over VLA models in success and preference adherence.
-
vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models
vla.cpp is a unified C++ runtime that serves multiple VLA architectures with flow-matching and diffusion patterns, matching SOTA performance on LIBERO while running on low-memory embedded hardware.
-
Learning Native Continuation for Action Chunking Flow Policies
Legato trains flow-based VLA policies with schedule-shaped action-noise mixtures and randomized conditions to achieve smoother trajectories and ~10% faster task completion than real-time chunking across five real-world manipulation tasks.
-
Real-Time Execution of Action Chunking Flow Policies
Real-time chunking (RTC) allows diffusion- and flow-based action chunking policies to execute smoothly and asynchronously, maintaining high success rates on dynamic tasks even with significant inference latency.
-
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
-
Smoother Action Chunking Flow Policy via Prior-Corrected Orthogonal Trust-Region Guidance
POTR augments RTC guidance for flow-matching policies by adding a data-prior scale to the weight schedule and constraining the perpendicular component of the guidance vector within a trust region, yielding smoother actions and higher success rates on LIBERO.
-
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.