M-ORE decouples text and visual update statistics in MLLMs and applies recursive low-rank edits in an orthogonal subspace to reduce cross-modal conflict and long-horizon interference.
Llms meet multimodal generation and editing: A survey.arXiv preprint arXiv:2405.19334
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
ScopeEdit decomposes MLLM edits into modality-local and evidence-gated shared branches using orthogonal low-rank spaces and recursive updates to improve scoped cross-modal transfer while preserving locality and efficiency.
AudioX-Turbo distills a Multimodal Diffusion Transformer into a 4-step student model for efficient multimodal anything-to-audio generation, trained on a new 9.2M-sample dataset IF-caps-Pro.
citing papers explorer
-
Modality-Decoupled Online Recursive Editing
M-ORE decouples text and visual update statistics in MLLMs and applies recursive low-rank edits in an orthogonal subspace to reduce cross-modal conflict and long-horizon interference.
-
Multimodal Knowledge Edit-Scoped Generalization for Online Recursive MLLM Editing
ScopeEdit decomposes MLLM edits into modality-local and evidence-gated shared branches using orthogonal low-rank spaces and recursive updates to improve scoped cross-modal transfer while preserving locality and efficiency.
-
AudioX-Turbo: A Unified Framework for Efficient Anything-to-Audio Generation
AudioX-Turbo distills a Multimodal Diffusion Transformer into a 4-step student model for efficient multimodal anything-to-audio generation, trained on a new 9.2M-sample dataset IF-caps-Pro.