pith. sign in

hub

Ostquant: Refining large language model quantization with orthogonal and scaling transformations for better distribution fitting.arXiv preprint arXiv:2501.13987, 2025

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

years

2026 10 2025 2

representative citing papers

LoopQ: Quantization for Recursive Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

LoopQ provides a loop-aware PTQ framework for recursive Transformers that mitigates distribution shift, state reuse, and recursive error accumulation, yielding 68.8% higher average accuracy and 87.7% lower perplexity under W4A4 versus static baselines.

Theory-optimal Quantization Based on Flatness

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

The paper introduces the Flatness metric, derives a theory-optimal quantization solution, and presents BDQ that uses bidirectional diagonal transformations to reduce outlier impact, achieving under 1% drop at W4A4 on LLaMA-3-8B.

MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations

cs.AR · 2026-05-29 · unverdicted · novelty 5.0

MixFP4 extends NVFP4 by adaptively selecting between two FP4 micro-formats per block using repurposed scale sign bits and a unified E2M2 compute path, claiming better accuracy than standard NVFP4 at 3.1% area and 1.5% power overhead.

citing papers explorer

Showing 12 of 12 citing papers.