Proposes WUICC task and WUICC-bench dataset, then evaluates 11 image difference captioning methods plus 2 LLMs on web UI changes.
Rouge: A package for automatic evaluation of sum- maries
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Temporal conditioning in three LLM-based planner architectures for AV scene-to-plan reasoning yields no statistically significant gains on NLP correctness metrics but enables predictive hazard reasoning and stable corrections on BDD-X subsets.
citing papers explorer
-
Beyond Pixel Diffs: Benchmarking Image Change Captioning for Web UI Visual Regression Testing
Proposes WUICC task and WUICC-bench dataset, then evaluates 11 image difference captioning methods plus 2 LLMs on web UI changes.
-
From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning
Temporal conditioning in three LLM-based planner architectures for AV scene-to-plan reasoning yields no statistically significant gains on NLP correctness metrics but enables predictive hazard reasoning and stable corrections on BDD-X subsets.