Informational Frustration in Neural Manifolds: Shannon Bottlenecks and the Limits of Learnability

Srinivasa Rao P.; Vangmayi P Reddy

arxiv: 2606.30512 · v1 · pith:IQWVVC2Knew · submitted 2026-06-29 · 💻 cs.LG · cs.AI· cs.CG

Informational Frustration in Neural Manifolds: Shannon Bottlenecks and the Limits of Learnability

Srinivasa Rao P. , Vangmayi P Reddy This is my paper

Pith reviewed 2026-06-30 07:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CG

keywords deep learninginformation theoryentropygeneralizationphase transitionsgrokkingoptimization

0 comments

The pith

A network learns a target function only if the Shannon entropy of the data manifold outpaces the topological entropy of the decision boundary, balanced by the von Neumann entropy of the weights; otherwise it enters informational frustration

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that classical statistical measures like VC dimension fail to explain why overparameterized networks generalize, and replaces them with an information-theoretic and topological limit. It defines an Entropic Learnability Horizon that must be satisfied for true learning to occur. When the horizon is crossed, the system undergoes a phase transition into a rigid memorization state called informational frustration. The framework also reinterprets grokking as an abrupt release from this state and proposes an optimization algorithm that actively manages weight entropy.

Core claim

The central claim is the Shannon-Topological Bottleneck Theorem: a network can only truly learn a target function if the Shannon entropy of the data manifold outpaces the topological entropy of the function's decision boundary, balanced by the von Neumann entropy of the network's weight space. When a target boundary's geometric complexity exceeds this horizon, the system undergoes a sudden entropic phase transition into informational frustration, a glassy state in which generalization is thermodynamically impossible.

What carries the argument

The Entropic Learnability Horizon, the condition that balances Shannon entropy of the data manifold against topological entropy of the decision boundary and von Neumann entropy of the weight space.

If this is right

When the entropic condition is satisfied, networks escape memorization and generalize.
Crossing the horizon produces an abrupt phase transition into informational frustration.
Grokking appears as an entropic release in which weights reorganize to satisfy the horizon.
Entropic Gradient Descent keeps learning on track by dynamically controlling weight entropy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same balance might predict failure modes in non-neural models whose internal representations can be assigned comparable entropies.
Practical measurement of these three entropies on benchmark datasets could give early warning of tasks that will trigger frustration.
If the phase transition is real, training trajectories should show signatures of thermodynamic criticality near the horizon.

Load-bearing premise

The Shannon entropy of the data manifold, the topological entropy of the decision boundary, and the von Neumann entropy of the weight space can be rigorously defined, compared, and shown to control phase transitions for arbitrary deep networks.

What would settle it

A controlled experiment in which a network achieves nontrivial generalization on a task whose data-manifold Shannon entropy is provably lower than the topological entropy of its decision boundary.

read the original abstract

Why overparameterised deep networks generalise so remarkably well remains one of the most stubborn open questions in machine learning theory. Classical frameworks like VC dimension and Rademacher complexity predict catastrophic overfitting in modern models, leaving a massive theoretical gap between theory and reality. In this paper, we bridge this divide by introducing a unified framework that links information theory, topology, and statistical mechanics to map the hard limits of deep learning. Central to our approach is the Entropic Learnability Horizon (ELH): a fundamental law stating that a network can only truly learn a target function if the Shannon entropy of the data manifold outpaces the topological entropy of the function's decision boundary, balanced by the von Neumann entropy of the network's weight space. We establish the Shannon-Topological Bottleneck Theorem, proving that when a target boundary's geometric complexity exceeds this informational horizon, the system undergoes a sudden entropic phase transition. It falls into a state of Informational Frustration - a glassy, rigid memorization phase where generalization becomes thermodynamically impossible. Using this lens, we show that the enigmatic phenomenon of "grokking" is actually an Entropic Release, where weights abruptly reorganise to unlock the bottleneck. Finally, we translate this theory into practice with Entropic Gradient Descent (EGD), an optimization algorithm that dynamically manages weight entropy to keep learning on track. Ultimately, this work repositions entropy not just as a tool for tracking uncertainty but as the fundamental physical currency that dictates whether a machine can learn.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper names new concepts for generalization limits but supplies no derivations, definitions, or evidence for its claimed theorem or phase transition.

read the letter

The paper's core move is to frame generalization and grokking through an Entropic Learnability Horizon that balances Shannon entropy of the data, topological entropy of the boundary, and von Neumann entropy of the weights, with a claimed phase transition into informational frustration when the balance fails. It also floats Entropic Gradient Descent as a fix.

What it does is attempt a single story that pulls information theory, topology, and statistical mechanics together around known puzzles like why overparameterized nets work. The grokking-as-entropic-release angle is at least a consistent narrative.

The problems are central and not minor. The abstract states the Shannon-Topological Bottleneck Theorem and the horizon law without giving formulas for any of the three entropies on actual networks or data, without a proof sketch, and without showing how their comparison produces a thermodynamic transition. Topological entropy and von Neumann entropy both require specific extensions to be usable here, and nothing indicates those extensions are carried out. The horizon definition itself is built directly from the quantities it then governs, which creates the circularity the stress-test note flags. No toy calculation or empirical check appears either.

This is the kind of high-level analogy paper that might interest people already drawn to physics-style accounts of deep learning. It does not engage the concrete gaps in VC or Rademacher theory with reductions or new testable predictions. I would not bring it to a reading group, would not cite it, and would not send it for peer review until the actual math and grounding are supplied.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces the Entropic Learnability Horizon (ELH) as a fundamental law governing learnability in deep networks, requiring the Shannon entropy of the data manifold to outpace the topological entropy of the decision boundary, balanced by the von Neumann entropy of the weight space. It establishes the Shannon-Topological Bottleneck Theorem claiming a phase transition to Informational Frustration when this horizon is exceeded, interprets grokking as Entropic Release, and proposes Entropic Gradient Descent (EGD) as an optimization algorithm.

Significance. If the result holds with rigorous definitions and proofs, the paper would be significant for providing a unified framework linking information theory, topology, and statistical mechanics to explain generalization in overparameterized networks and phenomena like grokking, potentially filling the gap between classical complexity measures and empirical success of deep learning.

major comments (3)

[Abstract] Abstract: The Shannon-Topological Bottleneck Theorem is stated without any derivation, proof sketch, or explicit formulas for the three entropy quantities involved, despite these being central to the claimed phase transition and ELH.
[Abstract] Abstract: The ELH is defined in terms of the imbalance of the three entropies whose excess is said to trigger the transition, raising a circularity concern where the law may hold by definitional choice rather than independent grounding.
[Abstract] Abstract: No indication is given of how topological entropy (classically for maps on compact spaces) and von Neumann entropy (requiring density operators) are rigorously extended to finite data sets and network parameters to yield the asserted thermodynamic impossibility for generalization.

minor comments (1)

[Abstract] Abstract: The abstract refers to 'we establish' and 'we show' multiple results but the provided text contains no supporting details, equations, or experiments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address each of the major comments on the abstract point by point. While we maintain that the core contributions are sound, we recognize the need for greater clarity in the presentation of our theoretical results.

read point-by-point responses

Referee: [Abstract] Abstract: The Shannon-Topological Bottleneck Theorem is stated without any derivation, proof sketch, or explicit formulas for the three entropy quantities involved, despite these being central to the claimed phase transition and ELH.

Authors: The abstract serves as a high-level overview and is constrained by length. The full derivation of the Shannon-Topological Bottleneck Theorem, including the proof sketch and explicit formulas for the Shannon entropy of the data manifold, the topological entropy of the decision boundary, and the von Neumann entropy of the weight space, is provided in Section 3 and Appendix A of the manuscript. We will revise the abstract to include a short sentence referencing the key equations and the location of the proof. revision: yes
Referee: [Abstract] Abstract: The ELH is defined in terms of the imbalance of the three entropies whose excess is said to trigger the transition, raising a circularity concern where the law may hold by definitional choice rather than independent grounding.

Authors: We disagree that there is a circularity. The three entropy measures are defined independently using standard constructions from their respective fields: Shannon entropy from information theory on the data distribution, topological entropy from dynamical systems adapted to the boundary via covering numbers, and von Neumann entropy from the spectral decomposition of the weight matrix. The ELH is then derived as the condition under which generalization is possible, based on a thermodynamic argument in the paper. We will add a clarifying paragraph in the introduction to emphasize the independent grounding of each term. revision: yes
Referee: [Abstract] Abstract: No indication is given of how topological entropy (classically for maps on compact spaces) and von Neumann entropy (requiring density operators) are rigorously extended to finite data sets and network parameters to yield the asserted thermodynamic impossibility for generalization.

Authors: The extensions are detailed in Section 2 of the manuscript. Topological entropy is extended to finite datasets using the concept of epsilon-entropy on point clouds derived from the data manifold, and von Neumann entropy is applied to the empirical covariance operator of the network weights. These adaptations allow us to formulate the phase transition condition. However, we acknowledge that the abstract does not preview these technical details, and we will include a brief note in a revised abstract if feasible. revision: partial

Circularity Check

1 steps flagged

ELH and Shannon-Topological Bottleneck Theorem are defined directly in terms of the three entropies whose imbalance is claimed to trigger the phase transition

specific steps

self definitional [Abstract, paragraph 3]
"Central to our approach is the Entropic Learnability Horizon (ELH): a fundamental law stating that a network can only truly learn a target function if the Shannon entropy of the data manifold outpaces the topological entropy of the function's decision boundary, balanced by the von Neumann entropy of the network's weight space. We establish the Shannon-Topological Bottleneck Theorem, proving that when a target boundary's geometric complexity exceeds this informational horizon, the system undergoes a sudden entropic phase transition."

ELH is introduced as the law whose content is precisely the stated inequality among the three entropies; the theorem then 'proves' that exceeding the horizon (i.e., violating the definitional condition) produces the phase transition to frustration. The claimed result therefore reduces to restating its own definitional premise rather than deriving an independent consequence.

full rationale

The paper's central result (ELH as a 'fundamental law' and the Bottleneck Theorem) is introduced by defining the learnability condition exactly as the outpacing relation among Shannon entropy of the data manifold, topological entropy of the decision boundary, and von Neumann entropy of the weight space. No independent formulas, extensions of the classical entropy definitions to neural networks, or derivation establishing the inequality or thermodynamic transition are supplied in the abstract or claimed theorem statement. This matches the self-definitional pattern: the 'prediction' of the phase transition into informational frustration is equivalent to the definitional statement itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 5 invented entities

The central claim rests on the unproven existence and comparability of three distinct entropy measures for neural networks plus the assumption that their imbalance produces a well-defined phase transition; multiple new entities are postulated without independent evidence.

axioms (2)

domain assumption Shannon entropy of data manifold, topological entropy of decision boundary, and von Neumann entropy of weights are well-defined and quantitatively comparable quantities for deep networks
Invoked in the definition of the Entropic Learnability Horizon (abstract)
ad hoc to paper Exceeding the entropy balance produces a thermodynamically impossible state for generalization
Central to the Shannon-Topological Bottleneck Theorem and informational frustration claim

invented entities (5)

Entropic Learnability Horizon (ELH) no independent evidence
purpose: Fundamental limit on learnability defined by entropy balance
New named quantity introduced as the core law; no prior reference or derivation supplied
Shannon-Topological Bottleneck Theorem no independent evidence
purpose: States the phase transition condition
New theorem asserted without proof or supporting calculation
Informational Frustration no independent evidence
purpose: Glassy memorization phase entered after phase transition
New state postulated to explain failure of generalization
Entropic Release no independent evidence
purpose: Mechanism for grokking as sudden reorganization
New interpretation of an observed phenomenon
Entropic Gradient Descent (EGD) no independent evidence
purpose: Optimization algorithm that manages weight entropy
New algorithm proposed without implementation details or experiments

pith-pipeline@v0.9.1-grok · 5813 in / 1946 out tokens · 32491 ms · 2026-06-30T07:09:55.371331+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 1 canonical work pages

[1]

On the uniform convergence of relative frequencies of events to their probabilities,

V. Vapnik and A. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,”Theory of Probability & Its Applications, vol. 16, no. 2, pp. 264–280, 1971

1971
[2]

Rademacher and Gaussian complexities: Risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and Gaussian complexities: Risk bounds and structural results,”Journal of Machine Learning Research, vol. 3, pp. 1079–1099, 2002

2002
[3]

Deep learning and the information bottleneck principle,

N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in Proceedings of the IEEE Information Theory Workshop, 2015

2015
[4]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948

1948
[5]

von Neumann,Mathematical Foundations of Quantum Mechanics

J. von Neumann,Mathematical Foundations of Quantum Mechanics. Princeton University Press, 1955

1955
[6]

B. B. Mandelbrot,The Fractal Geometry of Nature. W. H. Freeman and Co., 1982. 7

1982
[7]

Mézard, G

M. Mézard, G. Parisi, and M. A. Virasoro,Spin Glass Theory and Beyond. World Scientific, 1987

1987
[8]

Universal upper bound on the entropy-to-energy ratio for bounded sys- tems,

J. D. Bekenstein, “Universal upper bound on the entropy-to-energy ratio for bounded sys- tems,”Physical Review D, vol. 23, no. 2, p. 287, 1981

1981
[9]

Grokking: Generalizationbeyondoverfittingonsmallalgorithmicdatasets,

R.Poweretal., “Grokking: Generalizationbeyondoverfittingonsmallalgorithmicdatasets,” arXiv preprint arXiv:2201.02354, 2022

work page arXiv 2022
[10]

Irreversibility and heat generation in the computing process,

R. Landauer, “Irreversibility and heat generation in the computing process,”IBM Journal of Research and Development, vol. 5, no. 3, pp. 183–191, 1961

1961
[11]

The thermodynamics of computation—a review,

C. H. Bennett, “The thermodynamics of computation—a review,”International Journal of Theoretical Physics, vol. 21, no. 12, pp. 905–940, 1982

1982
[12]

On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings,

L. Szilard, “On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings,”Zeitschrift für Physik, vol. 53, no. 11-12, pp. 840–856, 1929

1929
[13]

Goldreich,Computational Complexity: A Conceptual Perspective

O. Goldreich,Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008

2008
[14]

Optimal brain damage,

Y. LeCun, J. Denker, and S. Solla, “Optimal brain damage,” inAdvances in Neural Infor- mation Processing Systems, vol. 2, 1990

1990
[15]

Statistical mechanics of deep learning,

A. Engle and G. Van Menon, “Statistical mechanics of deep learning,”Annual Review of Condensed Matter Physics, vol. 12, pp. 227–246, 2021. 8

2021

[1] [1]

On the uniform convergence of relative frequencies of events to their probabilities,

V. Vapnik and A. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,”Theory of Probability & Its Applications, vol. 16, no. 2, pp. 264–280, 1971

1971

[2] [2]

Rademacher and Gaussian complexities: Risk bounds and structural results,

P. L. Bartlett and S. Mendelson, “Rademacher and Gaussian complexities: Risk bounds and structural results,”Journal of Machine Learning Research, vol. 3, pp. 1079–1099, 2002

2002

[3] [3]

Deep learning and the information bottleneck principle,

N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in Proceedings of the IEEE Information Theory Workshop, 2015

2015

[4] [4]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948

1948

[5] [5]

von Neumann,Mathematical Foundations of Quantum Mechanics

J. von Neumann,Mathematical Foundations of Quantum Mechanics. Princeton University Press, 1955

1955

[6] [6]

B. B. Mandelbrot,The Fractal Geometry of Nature. W. H. Freeman and Co., 1982. 7

1982

[7] [7]

Mézard, G

M. Mézard, G. Parisi, and M. A. Virasoro,Spin Glass Theory and Beyond. World Scientific, 1987

1987

[8] [8]

Universal upper bound on the entropy-to-energy ratio for bounded sys- tems,

J. D. Bekenstein, “Universal upper bound on the entropy-to-energy ratio for bounded sys- tems,”Physical Review D, vol. 23, no. 2, p. 287, 1981

1981

[9] [9]

Grokking: Generalizationbeyondoverfittingonsmallalgorithmicdatasets,

R.Poweretal., “Grokking: Generalizationbeyondoverfittingonsmallalgorithmicdatasets,” arXiv preprint arXiv:2201.02354, 2022

work page arXiv 2022

[10] [10]

Irreversibility and heat generation in the computing process,

R. Landauer, “Irreversibility and heat generation in the computing process,”IBM Journal of Research and Development, vol. 5, no. 3, pp. 183–191, 1961

1961

[11] [11]

The thermodynamics of computation—a review,

C. H. Bennett, “The thermodynamics of computation—a review,”International Journal of Theoretical Physics, vol. 21, no. 12, pp. 905–940, 1982

1982

[12] [12]

On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings,

L. Szilard, “On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings,”Zeitschrift für Physik, vol. 53, no. 11-12, pp. 840–856, 1929

1929

[13] [13]

Goldreich,Computational Complexity: A Conceptual Perspective

O. Goldreich,Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008

2008

[14] [14]

Optimal brain damage,

Y. LeCun, J. Denker, and S. Solla, “Optimal brain damage,” inAdvances in Neural Infor- mation Processing Systems, vol. 2, 1990

1990

[15] [15]

Statistical mechanics of deep learning,

A. Engle and G. Van Menon, “Statistical mechanics of deep learning,”Annual Review of Condensed Matter Physics, vol. 12, pp. 227–246, 2021. 8

2021