Hirt: Enhancing robotic control with hierarchical robot transformers.arXiv preprint arXiv:2410.05273

Jianke Zhang, Yanjiang Guo, Xiaoyu Chen, Yen-Jen Wang, Yucheng Hu, Chengming Shi, Jianyu Chen · 2024 · arXiv 2410.05273

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 other 1

citation-polarity summary

background 1 unclear 1

representative citing papers

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.

ROSA: A Robotics Foundation Model Serving System for Robot Factories

cs.RO · 2026-07-01 · unverdicted · novelty 6.0

ROSA introduces shared GPU-pool serving, robotics-aware abstractions for multi-model pipelines, and factory-productivity scheduling that improves output by up to 12.06x over dedicated per-robot systems.

ROAD-VLA: Robust Online Adaptation via Self-Distillation for Vision-Language-Action Models

cs.LG · 2026-06-24 · unverdicted · novelty 6.0

ROAD-VLA constructs an advantage-perturbed proximal teacher in action space to convert sparse rewards into dense supervision for online VLA adaptation and reports outperformance versus PPO across seven manipulation environments.

UniFS: Unified Fast-to-Slow Hierarchical Architecture for Vision-Language-Action Models

cs.RO · 2026-06-22 · unverdicted · novelty 6.0

UniFS achieves 98.3% success on LIBERO with 2.1x lower latency than prior fast-slow VLA models by stratifying VLM layer update frequencies, inverting latent interactions, and applying multi-level supervision.

Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?

cs.RO · 2026-04-06 · unverdicted · novelty 6.0

Veo-3 video predictions enable approximate task-level robot trajectories in zero-shot settings but require hierarchical integration with low-level VLA policies for reliable manipulation performance.

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

cs.RO · 2025-10-11 · unverdicted · novelty 6.0

A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

cs.CV · 2024-12-19 · unverdicted · novelty 6.0

Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.

citing papers explorer

Showing 7 of 7 citing papers.

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models cs.RO · 2026-05-12 · unverdicted · none · ref 51 · 2 links
Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.
ROSA: A Robotics Foundation Model Serving System for Robot Factories cs.RO · 2026-07-01 · unverdicted · none · ref 49
ROSA introduces shared GPU-pool serving, robotics-aware abstractions for multi-model pipelines, and factory-productivity scheduling that improves output by up to 12.06x over dedicated per-robot systems.
ROAD-VLA: Robust Online Adaptation via Self-Distillation for Vision-Language-Action Models cs.LG · 2026-06-24 · unverdicted · none · ref 29
ROAD-VLA constructs an advantage-perturbed proximal teacher in action space to convert sparse rewards into dense supervision for online VLA adaptation and reports outperformance versus PPO across seven manipulation environments.
UniFS: Unified Fast-to-Slow Hierarchical Architecture for Vision-Language-Action Models cs.RO · 2026-06-22 · unverdicted · none · ref 51
UniFS achieves 98.3% success on LIBERO with 2.1x lower latency than prior fast-slow VLA models by stratifying VLM layer update frequencies, inverting latent interactions, and applying multi-level supervision.
Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation? cs.RO · 2026-04-06 · unverdicted · none · ref 44
Veo-3 video predictions enable approximate task-level robot trajectories in zero-shot settings but require hierarchical integration with low-level VLA policies for reliable manipulation performance.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation cs.RO · 2025-10-11 · unverdicted · none · ref 47
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations cs.CV · 2024-12-19 · unverdicted · none · ref 43
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.

Hirt: Enhancing robotic control with hierarchical robot transformers.arXiv preprint arXiv:2410.05273

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer