hub

World-in-world: World models in a closed-loop world

Zhang, J · 2025 · arXiv 2510.18135

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

ScratchWorld: Evaluating If World Models Compute Executable Consequences

cs.SE · 2026-06-30 · unverdicted · novelty 7.0

ScratchWorld benchmark finds that language models achieve at most 13.8% value-aware changed-field F1 on replay-verified Scratch state transitions and frequently ignore executable rules.

Pondering the Way: Spatial-perceiving World Action Model for Embodied Navigation

cs.RO · 2026-06-29 · unverdicted · novelty 7.0

SWAM jointly generates intermediate RGB-D sequences and action trajectories from monocular RGB start/goal observations for embodied navigation.

MemoBench: Benchmarking World Modeling in Dynamically Changing Environments

cs.CV · 2026-06-25 · unverdicted · novelty 7.0 · 4 refs

MemoBench is a new diagnostic benchmark with automated and VQA metrics that evaluates memory consistency in video models under disappear-and-reappear in dynamic environments.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.

Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets?

cs.CV · 2025-11-21 · unverdicted · novelty 7.0

Target-Bench shows the best off-the-shelf video world model scores only 0.341 on semantic target-approaching and directional consistency, with fine-tuning on a small robot dataset yielding measurable gains.

ReflectiChain: Epistemic Grounding in LLM-Driven World Models for Supply Chain Resilience

cs.AI · 2026-06-09 · unverdicted · novelty 6.0

A framework combining LLM policy interpretation with a physically conserved graph-latent world model and uncertainty-separated learning achieves 33% higher rationale consistency and 82.3% operability on a 10-node semiconductor benchmark under perturbations.

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

COMAP co-evolves textual world models and agent policies for LLMs through on-policy self-distillation, yielding up to 16.75% relative gains on embodied planning, web navigation, and tool-use tasks.

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

cs.CV · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

Sensor2Sensor uses 4D Gaussian Splatting to create synthetic training pairs and a diffusion model to convert monocular dashcam videos into high-fidelity multi-modal AV sensor data.

Data-Asymmetric Latent Imagination and Reranking for 3D Robotic Imitation Learning

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

DALI-R boosts 3D imitation learning success rates by 6.8% on average from suboptimal trajectories via latent imagination and reranking, with under 0.7x inference cost.

What Makes Video World Model Latents Action-Relevant: Prediction over Reconstruction

cs.CV · 2026-06-05 · unverdicted · novelty 5.0

Temporal video pretraining induces stronger action-relevant structure in video world model latents than pixel reconstruction, as shown by inverse-dynamics probing across encoder families.

OptiWorld: Optimal Control for Video World Generation under Physical Constraints

cs.CV · 2026-05-30 · unverdicted · novelty 5.0

OptiWorld inserts a classical optimal-control layer that extracts a world state, plans an optimal trajectory on a geometric manifold under physical constraints, and renders the video conditioned on that trajectory.

Back to Parsimonious Latents: Learning Task-Centric World Models from Visual Foundations

cs.AI · 2026-05-25 · unverdicted · novelty 5.0

TC-WM converts foundation-model visual embeddings into parsimonious task-sufficient world model latents via linear projection, contrastive physical-state alignment, and embedding reconstruction, with a theoretical identification guarantee.

Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction

cs.RO · 2026-05-18 · unverdicted · novelty 5.0

Robo-Cortex proposes a self-evolving embodied navigation agent using dual-grain cognitive memory and autonomous knowledge induction from trajectories, reporting SPL gains on IGNav, AR, AEQA and preliminary real-robot tests.

WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform

cs.RO · 2026-05-18 · unverdicted · novelty 5.0

WorldArena 2.0 extends embodied world model benchmarks to visuotactile perception, interactive policy training, and diverse real and simulated robotic platforms under a unified protocol.

Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.

Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

cs.CV · 2026-04-10 · unverdicted · novelty 4.0

Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.

World Model for Robot Learning: A Comprehensive Survey

cs.RO · 2026-04-30 · unverdicted · novelty 3.0

A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

citing papers explorer

Showing 18 of 18 citing papers.

ScratchWorld: Evaluating If World Models Compute Executable Consequences cs.SE · 2026-06-30 · unverdicted · none · ref 12
ScratchWorld benchmark finds that language models achieve at most 13.8% value-aware changed-field F1 on replay-verified Scratch state transitions and frequently ignore executable rules.
Pondering the Way: Spatial-perceiving World Action Model for Embodied Navigation cs.RO · 2026-06-29 · unverdicted · none · ref 49
SWAM jointly generates intermediate RGB-D sequences and action trajectories from monocular RGB start/goal observations for embodied navigation.
MemoBench: Benchmarking World Modeling in Dynamically Changing Environments cs.CV · 2026-06-25 · unverdicted · none · ref 77 · 4 links
MemoBench is a new diagnostic benchmark with automated and VQA metrics that evaluates memory consistency in video models under disappear-and-reappear in dynamic environments.
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation cs.CV · 2026-05-25 · unverdicted · none · ref 61
WBench is a benchmark with 289 test cases and 1,058 turns for evaluating interactive world models using 22 automated metrics validated against human judgments.
EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks cs.CV · 2026-04-10 · unverdicted · none · ref 57
EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.
Target-Bench: Can Video World Models Achieve Mapless Path Planning with Semantic Targets? cs.CV · 2025-11-21 · unverdicted · none · ref 38
Target-Bench shows the best off-the-shelf video world model scores only 0.341 on semantic target-approaching and directional consistency, with fine-tuning on a small robot dataset yielding measurable gains.
ReflectiChain: Epistemic Grounding in LLM-Driven World Models for Supply Chain Resilience cs.AI · 2026-06-09 · unverdicted · none · ref 14
A framework combining LLM policy interpretation with a physically conserved graph-latent world model and uncertainty-separated learning achieves 33% higher rationale consistency and 82.3% operability on a 10-node semiconductor benchmark under perturbations.
COMAP: Co-Evolving World Models and Agent Policies for LLM Agents cs.AI · 2026-06-01 · unverdicted · none · ref 31
COMAP co-evolves textual world models and agent policies for LLMs through on-policy self-distillation, yielding up to 16.75% relative gains on embodied planning, web navigation, and tool-use tasks.
Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving cs.CV · 2026-05-21 · unverdicted · none · ref 53 · 2 links
Sensor2Sensor uses 4D Gaussian Splatting to create synthetic training pairs and a diffusion model to convert monocular dashcam videos into high-fidelity multi-modal AV sensor data.
Data-Asymmetric Latent Imagination and Reranking for 3D Robotic Imitation Learning cs.RO · 2026-05-11 · unverdicted · none · ref 9
DALI-R boosts 3D imitation learning success rates by 6.8% on average from suboptimal trajectories via latent imagination and reranking, with under 0.7x inference cost.
What Makes Video World Model Latents Action-Relevant: Prediction over Reconstruction cs.CV · 2026-06-05 · unverdicted · none · ref 15
Temporal video pretraining induces stronger action-relevant structure in video world model latents than pixel reconstruction, as shown by inverse-dynamics probing across encoder families.
OptiWorld: Optimal Control for Video World Generation under Physical Constraints cs.CV · 2026-05-30 · unverdicted · none · ref 27
OptiWorld inserts a classical optimal-control layer that extracts a world state, plans an optimal trajectory on a geometric manifold under physical constraints, and renders the video conditioned on that trajectory.
Back to Parsimonious Latents: Learning Task-Centric World Models from Visual Foundations cs.AI · 2026-05-25 · unverdicted · none · ref 22
TC-WM converts foundation-model visual embeddings into parsimonious task-sufficient world model latents via linear projection, contrastive physical-state alignment, and embedding reconstruction, with a theoretical identification guarantee.
Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction cs.RO · 2026-05-18 · unverdicted · none · ref 41
Robo-Cortex proposes a self-evolving embodied navigation agent using dual-grain cognitive memory and autonomous knowledge induction from trajectories, reporting SPL gains on IGNav, AR, AEQA and preliminary real-robot tests.
WorldArena 2.0: Extending Embodied World Model Benchmarking on Modality, Functionality and Platform cs.RO · 2026-05-18 · unverdicted · none · ref 11
WorldArena 2.0 extends embodied world model benchmarks to visuotactile perception, interactive policy training, and diverse real and simulated robotic platforms under a unified protocol.
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models cs.CV · 2026-05-07 · unverdicted · none · ref 74
Semantic latent spaces from pretrained encoders outperform reconstruction-based spaces for robotic world models on planning and downstream policy performance.
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory cs.CV · 2026-04-10 · unverdicted · none · ref 54
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.
World Model for Robot Learning: A Comprehensive Survey cs.RO · 2026-04-30 · unverdicted · none · ref 69
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datasets, and benchmarks.

World-in-world: World models in a closed-loop world

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer