All you need is log

Akshay Balsubramani

arxiv: 2606.27349 · v2 · pith:KLREFKTOnew · submitted 2026-06-25 · 💻 cs.IT · math.IT· math.PR· math.ST· stat.ML· stat.TH

All you need is log

Akshay Balsubramani This is my paper

Pith reviewed 2026-06-30 00:52 UTC · model grok-4.3

classification 💻 cs.IT math.ITmath.PRmath.STstat.MLstat.TH

keywords multi-distribution divergencesRényi divergencesdata processing monotonicityproduct additivitycoincidence divergencesmulti-hypothesis testinginformation measures

0 comments

The pith

Every functional on W-tuples of distributions that is monotone under data processing and additive on independent products equals a positive integral of multi-way coincidence divergences over four strata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the unique family of divergences for comparing any number of probability distributions at once is obtained by integrating the basic multi-way coincidence divergences C_α. These are defined by C_α(π1,…,πW) = -log ∫ π1^α1 ⋯ πW^αW where the exponents sum to one. The integral runs over a four-part parameter space whose strata each correspond to a distinct limiting regime that cannot be reproduced by the others. The same family emerges from axioms, entropy means, hypothesis-testing exponents, and a betting interpretation, recovering the ordinary Rényi divergences when only two distributions are compared.

Core claim

Every functional of W-tuples of distributions that is monotone under data processing and additive on independent products is a positive integral of the multi-way coincidence divergences C_α(π1,…,πW) := -log∫ π1^α1⋯πW^αW (∑αk=1) over a parameter space with four strata: the simplex interior; mixed-sign exponent cones; a tropical boundary at infinity carrying max-divergences; and pairwise Kullback-Leibler edges at the simplex vertices. Each stratum is necessary because it is the destination of an explicit data-processing-monotone, product-additive divergence that the others cannot reproduce, and each arises as a clean limit of simplex-interior atoms. The two-prior case recovers the standard Rén

What carries the argument

The multi-way coincidence divergences C_α(π1,…,πW) := -log∫ π1^α1⋯πW^αW (∑αk=1), which act as the generating atoms whose positive integrals over the four-stratum parameter space exhaust all functionals obeying the two structural axioms.

If this is right

When W equals 2 the construction reduces exactly to the classical Rényi family.
Multi-population fairness measures, multi-prior PAC-Bayes bounds, and multi-hypothesis testing error exponents are all instances of the same integral family.
The family can be derived from Kolmogorov-Nagumo means obeying Rényi entropy axioms or from a multi-lottery betting interpretation.
A conditional extension of the same integral representation holds for conditional distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The four-stratum decomposition may supply a systematic way to interpolate between different multi-distribution measures used in ensemble methods.
Numerical checks for small W could verify whether common ad-hoc multi-distribution scores already belong to the integral family.
The tropical boundary stratum suggests possible max-based approximations that remain monotone and additive.
Extending the characterization to continuous parameter spaces or to quantum states would test the robustness of the axiomatic route.

Load-bearing premise

The functional is required to satisfy monotonicity under data processing and additivity on independent products for arbitrary W-tuples.

What would settle it

Exhibit a concrete functional on three or more distributions that obeys both the data-processing monotonicity and independent-product additivity axioms yet lies outside every positive integral of the C_α over the four strata.

Figures

Figures reproduced from arXiv: 2606.27349 by Akshay Balsubramani.

**Figure 1.** Figure 1: Sym-orbit average vs closed-form symmetric-form representation (Section 7): the relative residual sits near machine precision (10−16) across all 1,200 trials, every one passing the 10−10 pre-registered threshold (dashed). G.2 Convergence-rate slopes for the KL and tropical limit identities The vertex-derivative identity (9) predicts |C(1−ϵ)ek+ϵel (π)/ϵ − D1(πkkπl)| = O(ϵ) as ϵ ↓ 0, i.e. slope +1 on a log-l… view at source ↗

**Figure 2.** Figure 2: Fitted log-log slope versus theoretical slope per(W, X) cell. KL stratum (blue) clusters tightly around +1 (theoretical rate); tropical stratum (vermillion) clusters around −0.97, just above the theoretical −1 due to the sub-exponential log(t)/t correction from the Laplace prefactor. above −1 (median fitted slope −0.95 to −0.98) and are consistent with the predicted slope once the prefactor correction is … view at source ↗

**Figure 3.** Figure 3: Per-(W, X) cell agreement rate between the spectrum inequality spec and the construction-flag cat (whether π ′ was drawn as Kπ). The dashed line marks the pre-registered 95% threshold; the pooled agreement rate is 95.6%, with no false negatives (zero “Kπ violates the inequality”) and a small number of false positives (random π ′ accidentally passing the sampled grid). G.4 Choquet linearity holds on every… view at source ↗

**Figure 4.** Figure 4: Choquet-linearity sweep across six cells of the atom-family cone. Per-cell pass rates for joint DPI, additivity, and ground state. All three axioms pass in every cell; the pre-registered 99% threshold (dashed) is exceeded uniformly. The structural reading: cone-additivity is genuinely cell-uniform, not C-specific. r ⋆ = p ⋆ α⋆ ∝ Q k π α ⋆ k k at the LHS argmax (the saddle-point form of Sion’s theorem), the… view at source ↗

**Figure 5.** Figure 5: Sion minimax identity residual vs grid resolution δ at W = 3. Median and maximum relative residual across 50 trials (10 per X ∈ {4, 5, 6, 8, 10}). A decade in δ yields roughly a decade in residual; V6’s coarser δ ≈ 1/30 residual of 2.4% is consistent with the extrapolation of this curve to δ = 3·10−2 . adult__additivity adult__ground_state adult__joint_dpi bank__additivity bank__ground_state bank__joint_dp… view at source ↗

**Figure 6.** Figure 6: Real-data axiom-stress on natural class-conditional distributions (UCI Adult, UCI Bank, MNIST, CIFAR-10, ImageNet-1K). Per-(dataset, axiom) Wilson 95% confidence interval on the passage rate at relative tolerance 10−6 . The dashed line marks the 99% pre-registered threshold. All fifteen cells achieve passage rate 1.0 with Wilson lower bound at least 0.99. 45 [PITH_FULL_IMAGE:figures/full_fig_p045_6.png] view at source ↗

read the original abstract

Comparing two probability distributions is a basic building block of statistics and machine learning, and the right family is well understood: the R\'enyi divergences of order $\alpha\in[0,\infty]$ are the unique family monotone under data processing and additive on independent products. Many problems instead compare more than two distributions at once -- multi-population fairness, multi-prior PAC-Bayes bounds, multi-hypothesis testing -- and the right multi-distribution generalization of the R\'enyi family has been an open question. We characterize it. Every functional of $W$-tuples of distributions that is monotone under data processing and additive on independent products is a positive integral of multi-way coincidence divergences $C_{\alpha}(\pi_1,\dots,\pi_W) := -\log\int \pi_1^{\alpha_1}\cdots\pi_W^{\alpha_W}$ (with $\sum_k \alpha_k = 1$) over a parameter space with four strata: the simplex interior; mixed-sign exponent cones (the analogue of R\'enyi orders $>1$); a tropical boundary at infinity carrying max-divergences; and pairwise Kullback-Leibler edges at the simplex vertices. Each stratum is necessary -- the destination of an explicit data-processing-monotone, product-additive divergence the others cannot reproduce -- and each is a clean limit of simplex-interior atoms. The same family arises from several independent routes -- the structural axioms, Kolmogorov-Nagumo means with R\'enyi's entropy axiomatics, classical entropy characterizations, multi-hypothesis testing error exponents, and a multi-lottery betting interpretation -- structural evidence that this is the canonical multi-distribution R\'enyi calculus rather than an artefact of any one axiomatic input. The two-prior case recovers the standard R\'enyi result; a worked $W=3$ instance, numerical verification, and a conditional extension round out the treatment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper solves the open multi-distribution Rényi question with an integral characterization over four strata that recovers the two-case and comes from five routes.

read the letter

The core result is that any functional on W-tuples of distributions obeying data-processing monotonicity and product additivity must be a positive integral of the multi-way coincidence divergences C_α over the four strata (simplex interior, mixed-sign cones, tropical max boundary, and KL edges). This directly answers the open question left by the two-distribution Rényi theory.

What stands out is the multiple independent derivations: structural axioms, Kolmogorov-Nagumo means, classical entropy work, hypothesis-testing exponents, and a betting interpretation. The two-prior reduction matches the known Rényi family exactly, and the paper supplies explicit constructions showing each stratum is required because the others cannot reproduce certain monotone additive functionals. That level of cross-check reduces the chance that the representation is an artifact of one axiom set.

The soft spots are modest. The necessity arguments rest on those explicit constructions, which the abstract states are given but which I have not verified line-by-line from the full text. The assumption that monotonicity plus additivity for arbitrary W is the right defining pair is taken as given; if a reader wants a different base property, the characterization would not apply. No circularity appears in the two-prior anchor or the listed routes.

This is for researchers who need a canonical multi-distribution divergence for fairness, multi-prior PAC-Bayes, or multi-hypothesis testing. It is the kind of clean structural result that deserves a serious referee even if some proofs need tightening.

Referee Report

0 major / 3 minor

Summary. The manuscript claims to characterize every functional of W-tuples of distributions that is monotone under data processing and additive on independent products as a positive integral of multi-way coincidence divergences C_α(π1,…,πW) := -log∫ π1^α1⋯πW^αW (∑αk=1) over a parameter space with four strata (simplex interior, mixed-sign exponent cones, tropical boundary at infinity, and pairwise KL edges). Each stratum is shown necessary via explicit constructions, the family arises from five independent routes (structural axioms, Kolmogorov-Nagumo means, entropy characterizations, testing error exponents, betting), the W=2 case recovers Rényi divergences, and a W=3 instance plus conditional extension are provided.

Significance. If the characterization holds, this supplies the canonical multi-distribution Rényi calculus, unifying several strands in information theory with direct applications to multi-population fairness, multi-prior PAC-Bayes, and multi-hypothesis testing. The five-route derivation and explicit necessity constructions constitute strong structural evidence; the recovery of the classical two-distribution case and the conditional extension are additional strengths.

minor comments (3)

The abstract states that each stratum is a clean limit of simplex-interior atoms, but the precise limiting argument (including any required regularity on the measure) should be stated explicitly in the main text rather than left to the constructions.
Notation for the four strata would benefit from a single summary table or diagram listing the parameter domains, the corresponding divergence forms, and the explicit constructions that demonstrate necessity.
The numerical verification for the W=3 case should report the specific ranges of α vectors tested and the quantitative error metric used to confirm agreement with the integral representation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary, significance assessment, and recommendation of minor revision. No specific major comments appear in the report, so there are no individual points requiring point-by-point rebuttal or revision.

Circularity Check

0 steps flagged

Multiple independent routes support the representation; no load-bearing circularity

full rationale

The paper characterizes all functionals of W-tuples satisfying monotonicity under data processing and additivity on independent products as positive integrals of the C_α family over four strata. This is derived directly from the stated axioms. The abstract explicitly lists five independent supporting routes (structural axioms, Kolmogorov-Nagumo means, classical entropy characterizations, hypothesis testing exponents, betting interpretation) and notes that the W=2 case recovers the known Rényi result. Each stratum is shown necessary via explicit constructions. No step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the result is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The claim rests on the two structural axioms of data-processing monotonicity and product additivity; the coincidence divergence C_α is introduced as the atomic functional, with the four strata obtained as limits; no fitted numerical parameters appear.

axioms (2)

domain assumption Monotonicity under data processing for any W-tuple functional
Invoked as the first defining property that forces the integral form (abstract, paragraph 2).
domain assumption Additivity on independent products for any W-tuple functional
Invoked as the second defining property that forces the integral form (abstract, paragraph 2).

invented entities (1)

multi-way coincidence divergence C_α no independent evidence
purpose: Atomic building block whose positive integrals over four strata yield all admissible functionals
Defined directly as -log∫ π1^α1⋯πW^αW with ∑αk=1; no external falsifiable evidence supplied beyond the axiomatic derivation.

pith-pipeline@v0.9.1-grok · 5884 in / 1510 out tokens · 40102 ms · 2026-06-30T00:52:13.085611+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multivariate majorization of continuous statistical experiments
math.ST 2026-06 unverdicted novelty 7.0

Sufficient and almost necessary conditions for large-sample and catalytic majorization of finite statistical experiments on Borel spaces are characterized by inequalities on multivariate Renyi divergences.

Reference graph

Works this paper leans on

49 extracted references · 29 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

On Measures of Information and their Characterizations, volume 115 of Mathematics in Science and Engineering

János Aczél and Zoltán Daróczy. On Measures of Information and their Characterizations, volume 115 of Mathematics in Science and Engineering. Academic Press, New York, 1975

1975
[2]

Functional Equations in Several Variables, volume 31 of Encyclopedia of Mathematics and its Applications

János Aczél and Jean Dhombres. Functional Equations in Several Variables, volume 31 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1989

1989
[3]

Why the Shannon and Hartley entropies are ‘natural’.Advances in Applied Probability, 6(1):131–146, 1974

János Aczél, Bruno Forte, and Che Tat Ng. Why the Shannon and Hartley entropies are ‘natural’.Advances in Applied Probability, 6(1):131–146, 1974. doi: 10.2307/1426210

work page doi:10.2307/1426210 1974
[4]

A resource theory of gambling

Maite Arcos, Renato Renner, and Jonathan Oppenheim. A resource theory of gambling. arXiv preprint arXiv:2510.08418, 2025

work page arXiv 2025
[5]

Koenraad M. R. Audenaert and Milán Mosonyi. Upper bounds on the error probabilities and asymptotic error exponents in quantum multiple state discrimination. Journal of Mathematical Physics , 55(10):102201, 2014. doi: 10.1063/1.4898559

work page doi:10.1063/1.4898559 2014
[6]

Information from coincidences: a mixed partition-function calculus for multiscale typicality

Akshay Balsubramani. Information from coincidences: a mixed partition-function calculus for multiscale typicality
[7]

URL https://arxiv.org/abs/2606.25042

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Expected information as expected utility

José M Bernardo. Expected information as expected utility. The Annals of Statistics, pages 686–690, 1979

1979
[9]

Conditional Rényi divergences and horse betting

Cyril Bleuler, Amos Lapidoth, and Christoph Pfister. Conditional Rényi divergences and horse betting. Entropy, 22 (3):316, 2020. doi: 10.3390/e22030316

work page doi:10.3390/e22030316 2020
[10]

Metamorphictestingoflarge languagemodelsfornaturallanguageprocessing.doi:10.48550/arXiv

Gergely Bunth and Péter Vrana. Equivariant relative submajorization. arXiv preprint, 2021. doi: 10.48550/arXiv. 2108.13217

work page internal anchor Pith review doi:10.48550/arxiv 2021
[11]

Quantum Relative Lorenz Curves

Francesco Buscemi and Gilad Gour. Quantum relative Lorenz curves and resource theories. Journal of Mathematical Physics, 65(1):012203, 2024. Earlier preprint: arXiv:1607.05735 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Disintegration and Bayesian inversion via string diagrams

Kenta Cho and Bart Jacobs. Disintegration and Bayesian inversion via string diagrams. In Mathematical Structures in Computer Science, volume 29, pages 938–971, 2019. doi: 10.1017/S0960129518000488

work page doi:10.1017/s0960129518000488 2019
[13]

Axiomatic characterizations of information measures

Imre Csiszár. Axiomatic characterizations of information measures. Entropy, 10(3):261–273, 2008. doi: 10.3390/ e10030261

2008
[14]

Ducuara, Erkka Haapasalo, and Ryo Takakura

Andrés F. Ducuara, Erkka Haapasalo, and Ryo Takakura. Multivariate Rényi divergences characterise betting games with multiple lotteries. arXiv preprint, 2026. doi: 10.48550/arXiv.2601.17850. Report number YITP-25-40

work page doi:10.48550/arxiv.2601.17850 2026
[15]

On the concept of entropy of a finite probabilistic scheme

Dmitrii Konstantinovich Faddeev. On the concept of entropy of a finite probabilistic scheme. Uspekhi Matematich- eskikh Nauk, 11(1):227–231, 1956

1956
[16]

Matrix majorization in large samples

Muhammad Usman Farooq, T obias Fritz, Erkka Haapasalo, and Marco T omamichel. Matrix majorization in large samples. IEEE Transactions on Information Theory, 70(11):3118–3144, 2024. doi: 10.1109/TIT.2024.3437073. 41

work page doi:10.1109/tit.2024.3437073 2024
[17]

A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics

T obias Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics, 370:107239, 2020. doi: 10.1016/j.aim.2020.107239

work page doi:10.1016/j.aim.2020.107239 2020
[18]

A generalization of strict comparison for resource convertibility, with an application to second laws of thermodynamics

T obias Fritz. A generalization of strict comparison for resource convertibility, with an application to second laws of thermodynamics. Letters in Mathematical Physics, 113(5):99, 2023. doi: 10.1007/s11005-023-01722-7

work page doi:10.1007/s11005-023-01722-7 2023
[19]

Sufficiency of Rényi divergences

Frederik Galke, Lauritz van Luijk, and Henrik Wilming. Sufficiency of Rényi divergences. arXiv preprint, 2024

2024
[20]

Strictly proper scoring rules, prediction, and estimation

Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007

2007
[21]

Entropy and relative entropy from information-theoretic principles

Gilad Gour and Marco T omamichel. Entropy and relative entropy from information-theoretic principles. IEEE Transactions on Information Theory, 67(10):6313–6327, 2021. doi: 10.1109/TIT.2021.3078337

work page doi:10.1109/tit.2021.3078337 2021
[22]

Barycentric decompositions for extensive monotone divergences

Erkka Haapasalo. Barycentric decompositions for extensive monotone divergences. arXiv preprint, 2025. doi: 10. 48550/arXiv.2509.18725

work page arXiv 2025
[23]

An invitation to quantum incompatibility

T eiko Heinosaari, Takayuki Miyadera, and Mikko Tukiainen. An invitation to quantum incompatibility. Journal of Physics A: Mathematical and Theoretical , 49(12):123001, 2016. doi: 10.1088/1751-8113/49/12/123001. Survey; updated version available 2022

work page doi:10.1088/1751-8113/49/12/123001 2016
[24]

A new theorem of information theory

Arthur Hobson. A new theorem of information theory. Journal of Statistical Physics , 1(3):383–391, 1969. doi: 10. 1007/BF01106578

1969
[25]

Frederik B. Jensen. Asymptotic operational interpretations of generalized Rényi divergences. arXiv preprint, 2019

2019
[26]

Johnson and John E

Rodney W. Johnson and John E. Shore. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Transactions on Information Theory , 26(1):26–37, 1980. doi: 10.1109/TIT.1980. 1056144

work page doi:10.1109/tit.1980 1980
[27]

Mathematical Foundations of Information Theory

Aleksandr Iakovlevich Khinchin. Mathematical Foundations of Information Theory. Dover, New York, 1957

1957
[28]

A. N. Kolmogorov. Sur la notion de la moyenne. Atti della Reale Accademia Nazionale dei Lincei , 12:388–391, 1930

1930
[29]

Ashok Kumar and Rajesh Sundaresan

M. Ashok Kumar and Rajesh Sundaresan. Minimization problems based on relativeα-entropy I: forward projection. IEEE Transactions on Information Theory, 62(9):5063–5080, 2016. doi: 10.1109/TIT.2016.2590465

work page doi:10.1109/tit.2016.2590465 2016
[30]

( 1986 )

Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory . Springer Series in Statistics. Springer, 1986. doi: 10.1007/978-1-4612-4946-7

work page doi:10.1007/978-1-4612-4946-7 1986
[31]

Leang and Don H

Chuong B. Leang and Don H. Johnson. On the asymptotics of M-hypothesis Bayesian detection. IEEE Transactions on Information Theory, 43(1):280–282, 1997. doi: 10.1109/18.567705

work page doi:10.1109/18.567705 1997
[32]

Classification based on distance in multivariate Gaussian cases

Kameo Matusita. Classification based on distance in multivariate Gaussian cases. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , 1:299–304, 1967

1967
[33]

Measures of the value of information.Proceedings of the National Academy of Sciences, 42(9):654–655, 1956

John McCarthy. Measures of the value of information.Proceedings of the National Academy of Sciences, 42(9):654–655, 1956

1956
[34]

Geometric relative entropies and barycentric Rényi divergences

Milán Mosonyi, Gergely Bunth, and Péter Vrana. Geometric relative entropies and barycentric Rényi divergences. Linear Algebra and its Applications , 699:159–276, 2024. doi: 10.1016/j.laa.2024.06.005

work page doi:10.1016/j.laa.2024.06.005 2024
[35]

From Blackwell dominance in large samples to Rényi divergences and back again

Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From Blackwell dominance in large samples to Rényi divergences and back again. Econometrica, 89(1):475–506, 2021. doi: 10.3982/ECTA17548

work page doi:10.3982/ecta17548 2021
[36]

Monotone additive statistics

Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. Monotone additive statistics. Econometrica, 92 (4):995–1031, 2024. doi: 10.3982/ECTA19967

work page doi:10.3982/ecta19967 2024
[37]

Über eine Klasse der Mittelwerte

Mitio Nagumo. Über eine Klasse der Mittelwerte. Japanese Journal of Mathematics, 7:71–79, 1930

1930
[38]

The Chernoff lower bound for symmetric quantum hypothesis testing.Annals of Statistics, 37(2):1040–1057, 2009

Michael Nussbaum and Arleta Szkoła. The Chernoff lower bound for symmetric quantum hypothesis testing.Annals of Statistics, 37(2):1040–1057, 2009. doi: 10.1214/08-AOS593. 42

work page doi:10.1214/08-aos593 2009
[39]

The cost of information: the case of constant marginal costs

Luciano Pomatto, Philipp Strack, and Omer Tamuz. The cost of information: the case of constant marginal costs. American Economic Review, 113(5):1360–1393, 2023. doi: 10.1257/aer.20211094

work page doi:10.1257/aer.20211094 2023
[40]

On measures of entropy and information

Alfréd Rényi. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathemat- ical statistics and probability, volume 1: contributions to the theory of statistics , volume 4, pages 547–562. University of California Press, 1961

1961
[41]

A complete characterisation of conditional entropies

Roberto Rubboli, Erkka Haapasalo, and Marco T omamichel. A complete characterisation of conditional entropies. arXiv preprint, 2026. doi: 10.48550/arXiv.2601.23213

work page doi:10.48550/arxiv.2601.23213 2026
[42]

N. P. Salikhov. Asymptotic properties of the rate of mistakes in the problem of distinguishing between several statis- tical hypotheses. Trudy Mat. Inst. Steklov., 124:117–146, 1973. In Russian; English summary in Theory of Probability and its Applications

1973
[43]

Admissible probability measurement procedures

Emir H Shuford Jr, Arthur Albert, and H Edward Massengill. Admissible probability measurement procedures. Psy- chometrika, 31(2):125–145, 1966

1966
[44]

Information radius

Robin Sibson. Information radius. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 14:149–160, 1969. doi: 10.1007/BF00537520

work page doi:10.1007/bf00537520 1969
[45]

Cambridge University Press, Cambridge, 1991

Erik T orgersen.Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics and its Applications . Cambridge University Press, Cambridge, 1991

1991
[46]

T oussaint

Godfried T. T oussaint. Some properties of Matusita’s measure of affinity of several distributions. Annals of the Institute of Statistical Mathematics, 26(1):389–394, 1974. doi: 10.1007/BF02479845

work page doi:10.1007/bf02479845 1974
[47]

Rényi divergence and Kullback–Leibler divergence

Tim van Erven and Peter Harremoës. Rényi divergence and Kullback–Leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014. doi: 10.1109/TIT.2014.2320500

work page doi:10.1109/tit.2014.2320500 2014
[48]

Matrix majorization in large samples with varying support restrictions

Frits Verhagen, Marco T omamichel, and Erkka Haapasalo. Matrix majorization in large samples with varying support restrictions. IEEE Transactions on Information Theory, 71(9):6517–6545, 2025. doi: 10.1109/TIT.2025.3585062

work page doi:10.1109/tit.2025.3585062 2025
[49]

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J. Smola. Deep sets. In Advances in Neural Information Processing Systems 30 (NIPS 2017) , pages 3391–3401, 2017. 43

2017

[1] [1]

On Measures of Information and their Characterizations, volume 115 of Mathematics in Science and Engineering

János Aczél and Zoltán Daróczy. On Measures of Information and their Characterizations, volume 115 of Mathematics in Science and Engineering. Academic Press, New York, 1975

1975

[2] [2]

Functional Equations in Several Variables, volume 31 of Encyclopedia of Mathematics and its Applications

János Aczél and Jean Dhombres. Functional Equations in Several Variables, volume 31 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1989

1989

[3] [3]

Why the Shannon and Hartley entropies are ‘natural’.Advances in Applied Probability, 6(1):131–146, 1974

János Aczél, Bruno Forte, and Che Tat Ng. Why the Shannon and Hartley entropies are ‘natural’.Advances in Applied Probability, 6(1):131–146, 1974. doi: 10.2307/1426210

work page doi:10.2307/1426210 1974

[4] [4]

A resource theory of gambling

Maite Arcos, Renato Renner, and Jonathan Oppenheim. A resource theory of gambling. arXiv preprint arXiv:2510.08418, 2025

work page arXiv 2025

[5] [5]

Koenraad M. R. Audenaert and Milán Mosonyi. Upper bounds on the error probabilities and asymptotic error exponents in quantum multiple state discrimination. Journal of Mathematical Physics , 55(10):102201, 2014. doi: 10.1063/1.4898559

work page doi:10.1063/1.4898559 2014

[6] [6]

Information from coincidences: a mixed partition-function calculus for multiscale typicality

Akshay Balsubramani. Information from coincidences: a mixed partition-function calculus for multiscale typicality

[7] [7]

URL https://arxiv.org/abs/2606.25042

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Expected information as expected utility

José M Bernardo. Expected information as expected utility. The Annals of Statistics, pages 686–690, 1979

1979

[9] [9]

Conditional Rényi divergences and horse betting

Cyril Bleuler, Amos Lapidoth, and Christoph Pfister. Conditional Rényi divergences and horse betting. Entropy, 22 (3):316, 2020. doi: 10.3390/e22030316

work page doi:10.3390/e22030316 2020

[10] [10]

Metamorphictestingoflarge languagemodelsfornaturallanguageprocessing.doi:10.48550/arXiv

Gergely Bunth and Péter Vrana. Equivariant relative submajorization. arXiv preprint, 2021. doi: 10.48550/arXiv. 2108.13217

work page internal anchor Pith review doi:10.48550/arxiv 2021

[11] [11]

Quantum Relative Lorenz Curves

Francesco Buscemi and Gilad Gour. Quantum relative Lorenz curves and resource theories. Journal of Mathematical Physics, 65(1):012203, 2024. Earlier preprint: arXiv:1607.05735 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Disintegration and Bayesian inversion via string diagrams

Kenta Cho and Bart Jacobs. Disintegration and Bayesian inversion via string diagrams. In Mathematical Structures in Computer Science, volume 29, pages 938–971, 2019. doi: 10.1017/S0960129518000488

work page doi:10.1017/s0960129518000488 2019

[13] [13]

Axiomatic characterizations of information measures

Imre Csiszár. Axiomatic characterizations of information measures. Entropy, 10(3):261–273, 2008. doi: 10.3390/ e10030261

2008

[14] [14]

Ducuara, Erkka Haapasalo, and Ryo Takakura

Andrés F. Ducuara, Erkka Haapasalo, and Ryo Takakura. Multivariate Rényi divergences characterise betting games with multiple lotteries. arXiv preprint, 2026. doi: 10.48550/arXiv.2601.17850. Report number YITP-25-40

work page doi:10.48550/arxiv.2601.17850 2026

[15] [15]

On the concept of entropy of a finite probabilistic scheme

Dmitrii Konstantinovich Faddeev. On the concept of entropy of a finite probabilistic scheme. Uspekhi Matematich- eskikh Nauk, 11(1):227–231, 1956

1956

[16] [16]

Matrix majorization in large samples

Muhammad Usman Farooq, T obias Fritz, Erkka Haapasalo, and Marco T omamichel. Matrix majorization in large samples. IEEE Transactions on Information Theory, 70(11):3118–3144, 2024. doi: 10.1109/TIT.2024.3437073. 41

work page doi:10.1109/tit.2024.3437073 2024

[17] [17]

A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics

T obias Fritz. A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics. Advances in Mathematics, 370:107239, 2020. doi: 10.1016/j.aim.2020.107239

work page doi:10.1016/j.aim.2020.107239 2020

[18] [18]

A generalization of strict comparison for resource convertibility, with an application to second laws of thermodynamics

T obias Fritz. A generalization of strict comparison for resource convertibility, with an application to second laws of thermodynamics. Letters in Mathematical Physics, 113(5):99, 2023. doi: 10.1007/s11005-023-01722-7

work page doi:10.1007/s11005-023-01722-7 2023

[19] [19]

Sufficiency of Rényi divergences

Frederik Galke, Lauritz van Luijk, and Henrik Wilming. Sufficiency of Rényi divergences. arXiv preprint, 2024

2024

[20] [20]

Strictly proper scoring rules, prediction, and estimation

Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007

2007

[21] [21]

Entropy and relative entropy from information-theoretic principles

Gilad Gour and Marco T omamichel. Entropy and relative entropy from information-theoretic principles. IEEE Transactions on Information Theory, 67(10):6313–6327, 2021. doi: 10.1109/TIT.2021.3078337

work page doi:10.1109/tit.2021.3078337 2021

[22] [22]

Barycentric decompositions for extensive monotone divergences

Erkka Haapasalo. Barycentric decompositions for extensive monotone divergences. arXiv preprint, 2025. doi: 10. 48550/arXiv.2509.18725

work page arXiv 2025

[23] [23]

An invitation to quantum incompatibility

T eiko Heinosaari, Takayuki Miyadera, and Mikko Tukiainen. An invitation to quantum incompatibility. Journal of Physics A: Mathematical and Theoretical , 49(12):123001, 2016. doi: 10.1088/1751-8113/49/12/123001. Survey; updated version available 2022

work page doi:10.1088/1751-8113/49/12/123001 2016

[24] [24]

A new theorem of information theory

Arthur Hobson. A new theorem of information theory. Journal of Statistical Physics , 1(3):383–391, 1969. doi: 10. 1007/BF01106578

1969

[25] [25]

Frederik B. Jensen. Asymptotic operational interpretations of generalized Rényi divergences. arXiv preprint, 2019

2019

[26] [26]

Johnson and John E

Rodney W. Johnson and John E. Shore. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Transactions on Information Theory , 26(1):26–37, 1980. doi: 10.1109/TIT.1980. 1056144

work page doi:10.1109/tit.1980 1980

[27] [27]

Mathematical Foundations of Information Theory

Aleksandr Iakovlevich Khinchin. Mathematical Foundations of Information Theory. Dover, New York, 1957

1957

[28] [28]

A. N. Kolmogorov. Sur la notion de la moyenne. Atti della Reale Accademia Nazionale dei Lincei , 12:388–391, 1930

1930

[29] [29]

Ashok Kumar and Rajesh Sundaresan

M. Ashok Kumar and Rajesh Sundaresan. Minimization problems based on relativeα-entropy I: forward projection. IEEE Transactions on Information Theory, 62(9):5063–5080, 2016. doi: 10.1109/TIT.2016.2590465

work page doi:10.1109/tit.2016.2590465 2016

[30] [30]

( 1986 )

Lucien Le Cam. Asymptotic Methods in Statistical Decision Theory . Springer Series in Statistics. Springer, 1986. doi: 10.1007/978-1-4612-4946-7

work page doi:10.1007/978-1-4612-4946-7 1986

[31] [31]

Leang and Don H

Chuong B. Leang and Don H. Johnson. On the asymptotics of M-hypothesis Bayesian detection. IEEE Transactions on Information Theory, 43(1):280–282, 1997. doi: 10.1109/18.567705

work page doi:10.1109/18.567705 1997

[32] [32]

Classification based on distance in multivariate Gaussian cases

Kameo Matusita. Classification based on distance in multivariate Gaussian cases. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , 1:299–304, 1967

1967

[33] [33]

Measures of the value of information.Proceedings of the National Academy of Sciences, 42(9):654–655, 1956

John McCarthy. Measures of the value of information.Proceedings of the National Academy of Sciences, 42(9):654–655, 1956

1956

[34] [34]

Geometric relative entropies and barycentric Rényi divergences

Milán Mosonyi, Gergely Bunth, and Péter Vrana. Geometric relative entropies and barycentric Rényi divergences. Linear Algebra and its Applications , 699:159–276, 2024. doi: 10.1016/j.laa.2024.06.005

work page doi:10.1016/j.laa.2024.06.005 2024

[35] [35]

From Blackwell dominance in large samples to Rényi divergences and back again

Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From Blackwell dominance in large samples to Rényi divergences and back again. Econometrica, 89(1):475–506, 2021. doi: 10.3982/ECTA17548

work page doi:10.3982/ecta17548 2021

[36] [36]

Monotone additive statistics

Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. Monotone additive statistics. Econometrica, 92 (4):995–1031, 2024. doi: 10.3982/ECTA19967

work page doi:10.3982/ecta19967 2024

[37] [37]

Über eine Klasse der Mittelwerte

Mitio Nagumo. Über eine Klasse der Mittelwerte. Japanese Journal of Mathematics, 7:71–79, 1930

1930

[38] [38]

The Chernoff lower bound for symmetric quantum hypothesis testing.Annals of Statistics, 37(2):1040–1057, 2009

Michael Nussbaum and Arleta Szkoła. The Chernoff lower bound for symmetric quantum hypothesis testing.Annals of Statistics, 37(2):1040–1057, 2009. doi: 10.1214/08-AOS593. 42

work page doi:10.1214/08-aos593 2009

[39] [39]

The cost of information: the case of constant marginal costs

Luciano Pomatto, Philipp Strack, and Omer Tamuz. The cost of information: the case of constant marginal costs. American Economic Review, 113(5):1360–1393, 2023. doi: 10.1257/aer.20211094

work page doi:10.1257/aer.20211094 2023

[40] [40]

On measures of entropy and information

Alfréd Rényi. On measures of entropy and information. In Proceedings of the fourth Berkeley symposium on mathemat- ical statistics and probability, volume 1: contributions to the theory of statistics , volume 4, pages 547–562. University of California Press, 1961

1961

[41] [41]

A complete characterisation of conditional entropies

Roberto Rubboli, Erkka Haapasalo, and Marco T omamichel. A complete characterisation of conditional entropies. arXiv preprint, 2026. doi: 10.48550/arXiv.2601.23213

work page doi:10.48550/arxiv.2601.23213 2026

[42] [42]

N. P. Salikhov. Asymptotic properties of the rate of mistakes in the problem of distinguishing between several statis- tical hypotheses. Trudy Mat. Inst. Steklov., 124:117–146, 1973. In Russian; English summary in Theory of Probability and its Applications

1973

[43] [43]

Admissible probability measurement procedures

Emir H Shuford Jr, Arthur Albert, and H Edward Massengill. Admissible probability measurement procedures. Psy- chometrika, 31(2):125–145, 1966

1966

[44] [44]

Information radius

Robin Sibson. Information radius. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 14:149–160, 1969. doi: 10.1007/BF00537520

work page doi:10.1007/bf00537520 1969

[45] [45]

Cambridge University Press, Cambridge, 1991

Erik T orgersen.Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics and its Applications . Cambridge University Press, Cambridge, 1991

1991

[46] [46]

T oussaint

Godfried T. T oussaint. Some properties of Matusita’s measure of affinity of several distributions. Annals of the Institute of Statistical Mathematics, 26(1):389–394, 1974. doi: 10.1007/BF02479845

work page doi:10.1007/bf02479845 1974

[47] [47]

Rényi divergence and Kullback–Leibler divergence

Tim van Erven and Peter Harremoës. Rényi divergence and Kullback–Leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014. doi: 10.1109/TIT.2014.2320500

work page doi:10.1109/tit.2014.2320500 2014

[48] [48]

Matrix majorization in large samples with varying support restrictions

Frits Verhagen, Marco T omamichel, and Erkka Haapasalo. Matrix majorization in large samples with varying support restrictions. IEEE Transactions on Information Theory, 71(9):6517–6545, 2025. doi: 10.1109/TIT.2025.3585062

work page doi:10.1109/tit.2025.3585062 2025

[49] [49]

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J. Smola. Deep sets. In Advances in Neural Information Processing Systems 30 (NIPS 2017) , pages 3391–3401, 2017. 43

2017