arXiv preprint arXiv:2505.12082 , year=

Model Merging in Pre-training of Large Language Models , author= · 2025 · arXiv 2505.12082

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Humanoid-OmniOcc: Stereo-Based Full-View Occupancy Dataset for Embodied AI

cs.RO · 2026-06-22 · unverdicted · novelty 7.0

Humanoid-OmniOcc delivers a large-scale panoramic stereo occupancy dataset for humanoid robots via Real2Sim2Real, with a model that outperforms monocular baselines in both unseen sim scenes and real settings.

Optimizing Visual Generative Models via Distribution-wise Rewards

cs.LG · 2026-07-02 · unverdicted · novelty 6.0

Distribution-wise rewards with subset-replace strategy and post-hoc merging improve FID-50K on SiT (8.30 to 5.77) and EDM2 (3.74 to 3.52) while preserving diversity.

LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning

cs.LG · 2026-06-12 · unverdicted · novelty 5.0

The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.

Anytime Training with Schedule-Free Spectral Optimization

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

SF-NorMuon is a new schedule-free spectral optimizer that closes the gap with tuned AdamW on 125M-772M parameter models across 1-8x Chinchilla horizons while providing stationarity guarantees.

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

ScheduleFree+ scales schedule-free learning to LLMs with fixes for large batches and models, outperforming Warmup-Stable-Decay schedules by up to 31% at 1000 tokens per parameter.

Kwai Keye-VL-2.0 Technical Report

cs.CV · 2026-06-09 · unverdicted · novelty 4.0

Kwai Keye-VL-2.0-30B-A3B is a 30B MoE model with 3B active parameters using DSA adaptation and MOPD distillation that reports SOTA results on video understanding and agent benchmarks.

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities

cs.LG · 2024-08-14 · accept · novelty 4.0

The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.

citing papers explorer

Showing 7 of 7 citing papers.

Humanoid-OmniOcc: Stereo-Based Full-View Occupancy Dataset for Embodied AI cs.RO · 2026-06-22 · unverdicted · none · ref 65
Humanoid-OmniOcc delivers a large-scale panoramic stereo occupancy dataset for humanoid robots via Real2Sim2Real, with a model that outperforms monocular baselines in both unseen sim scenes and real settings.
Optimizing Visual Generative Models via Distribution-wise Rewards cs.LG · 2026-07-02 · unverdicted · none · ref 18
Distribution-wise rewards with subset-replace strategy and post-hoc merging improve FID-50K on SiT (8.30 to 5.77) and EDM2 (3.74 to 3.52) while preserving diversity.
LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning cs.LG · 2026-06-12 · unverdicted · none · ref 59
The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.
Anytime Training with Schedule-Free Spectral Optimization cs.LG · 2026-05-21 · unverdicted · none · ref 27
SF-NorMuon is a new schedule-free spectral optimizer that closes the gap with tuned AdamW on 125M-772M parameter models across 1-8x Chinchilla horizons while providing stationarity guarantees.
ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models cs.LG · 2026-05-18 · unverdicted · none · ref 21
ScheduleFree+ scales schedule-free learning to LLMs with fixes for large batches and models, outperforming Warmup-Stable-Decay schedules by up to 31% at 1000 tokens per parameter.
Kwai Keye-VL-2.0 Technical Report cs.CV · 2026-06-09 · unverdicted · none · ref 89
Kwai Keye-VL-2.0-30B-A3B is a 30B MoE model with 3B active parameters using DSA adaptation and MOPD distillation that reports SOTA results on video understanding and agent benchmarks.
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities cs.LG · 2024-08-14 · accept · none · ref 129
The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.

arXiv preprint arXiv:2505.12082 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer