Informational Frustration in Neural Manifolds: Shannon Bottlenecks and the Limits of Learnability
Pith reviewed 2026-06-30 07:09 UTC · model grok-4.3
The pith
A network learns a target function only if the Shannon entropy of the data manifold outpaces the topological entropy of the decision boundary, balanced by the von Neumann entropy of the weights; otherwise it enters informational frustration
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is the Shannon-Topological Bottleneck Theorem: a network can only truly learn a target function if the Shannon entropy of the data manifold outpaces the topological entropy of the function's decision boundary, balanced by the von Neumann entropy of the network's weight space. When a target boundary's geometric complexity exceeds this horizon, the system undergoes a sudden entropic phase transition into informational frustration, a glassy state in which generalization is thermodynamically impossible.
What carries the argument
The Entropic Learnability Horizon, the condition that balances Shannon entropy of the data manifold against topological entropy of the decision boundary and von Neumann entropy of the weight space.
If this is right
- When the entropic condition is satisfied, networks escape memorization and generalize.
- Crossing the horizon produces an abrupt phase transition into informational frustration.
- Grokking appears as an entropic release in which weights reorganize to satisfy the horizon.
- Entropic Gradient Descent keeps learning on track by dynamically controlling weight entropy.
Where Pith is reading between the lines
- The same balance might predict failure modes in non-neural models whose internal representations can be assigned comparable entropies.
- Practical measurement of these three entropies on benchmark datasets could give early warning of tasks that will trigger frustration.
- If the phase transition is real, training trajectories should show signatures of thermodynamic criticality near the horizon.
Load-bearing premise
The Shannon entropy of the data manifold, the topological entropy of the decision boundary, and the von Neumann entropy of the weight space can be rigorously defined, compared, and shown to control phase transitions for arbitrary deep networks.
What would settle it
A controlled experiment in which a network achieves nontrivial generalization on a task whose data-manifold Shannon entropy is provably lower than the topological entropy of its decision boundary.
read the original abstract
Why overparameterised deep networks generalise so remarkably well remains one of the most stubborn open questions in machine learning theory. Classical frameworks like VC dimension and Rademacher complexity predict catastrophic overfitting in modern models, leaving a massive theoretical gap between theory and reality. In this paper, we bridge this divide by introducing a unified framework that links information theory, topology, and statistical mechanics to map the hard limits of deep learning. Central to our approach is the Entropic Learnability Horizon (ELH): a fundamental law stating that a network can only truly learn a target function if the Shannon entropy of the data manifold outpaces the topological entropy of the function's decision boundary, balanced by the von Neumann entropy of the network's weight space. We establish the Shannon-Topological Bottleneck Theorem, proving that when a target boundary's geometric complexity exceeds this informational horizon, the system undergoes a sudden entropic phase transition. It falls into a state of Informational Frustration - a glassy, rigid memorization phase where generalization becomes thermodynamically impossible. Using this lens, we show that the enigmatic phenomenon of "grokking" is actually an Entropic Release, where weights abruptly reorganise to unlock the bottleneck. Finally, we translate this theory into practice with Entropic Gradient Descent (EGD), an optimization algorithm that dynamically manages weight entropy to keep learning on track. Ultimately, this work repositions entropy not just as a tool for tracking uncertainty but as the fundamental physical currency that dictates whether a machine can learn.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Entropic Learnability Horizon (ELH) as a fundamental law governing learnability in deep networks, requiring the Shannon entropy of the data manifold to outpace the topological entropy of the decision boundary, balanced by the von Neumann entropy of the weight space. It establishes the Shannon-Topological Bottleneck Theorem claiming a phase transition to Informational Frustration when this horizon is exceeded, interprets grokking as Entropic Release, and proposes Entropic Gradient Descent (EGD) as an optimization algorithm.
Significance. If the result holds with rigorous definitions and proofs, the paper would be significant for providing a unified framework linking information theory, topology, and statistical mechanics to explain generalization in overparameterized networks and phenomena like grokking, potentially filling the gap between classical complexity measures and empirical success of deep learning.
major comments (3)
- [Abstract] Abstract: The Shannon-Topological Bottleneck Theorem is stated without any derivation, proof sketch, or explicit formulas for the three entropy quantities involved, despite these being central to the claimed phase transition and ELH.
- [Abstract] Abstract: The ELH is defined in terms of the imbalance of the three entropies whose excess is said to trigger the transition, raising a circularity concern where the law may hold by definitional choice rather than independent grounding.
- [Abstract] Abstract: No indication is given of how topological entropy (classically for maps on compact spaces) and von Neumann entropy (requiring density operators) are rigorously extended to finite data sets and network parameters to yield the asserted thermodynamic impossibility for generalization.
minor comments (1)
- [Abstract] Abstract: The abstract refers to 'we establish' and 'we show' multiple results but the provided text contains no supporting details, equations, or experiments.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's report. We address each of the major comments on the abstract point by point. While we maintain that the core contributions are sound, we recognize the need for greater clarity in the presentation of our theoretical results.
read point-by-point responses
-
Referee: [Abstract] Abstract: The Shannon-Topological Bottleneck Theorem is stated without any derivation, proof sketch, or explicit formulas for the three entropy quantities involved, despite these being central to the claimed phase transition and ELH.
Authors: The abstract serves as a high-level overview and is constrained by length. The full derivation of the Shannon-Topological Bottleneck Theorem, including the proof sketch and explicit formulas for the Shannon entropy of the data manifold, the topological entropy of the decision boundary, and the von Neumann entropy of the weight space, is provided in Section 3 and Appendix A of the manuscript. We will revise the abstract to include a short sentence referencing the key equations and the location of the proof. revision: yes
-
Referee: [Abstract] Abstract: The ELH is defined in terms of the imbalance of the three entropies whose excess is said to trigger the transition, raising a circularity concern where the law may hold by definitional choice rather than independent grounding.
Authors: We disagree that there is a circularity. The three entropy measures are defined independently using standard constructions from their respective fields: Shannon entropy from information theory on the data distribution, topological entropy from dynamical systems adapted to the boundary via covering numbers, and von Neumann entropy from the spectral decomposition of the weight matrix. The ELH is then derived as the condition under which generalization is possible, based on a thermodynamic argument in the paper. We will add a clarifying paragraph in the introduction to emphasize the independent grounding of each term. revision: yes
-
Referee: [Abstract] Abstract: No indication is given of how topological entropy (classically for maps on compact spaces) and von Neumann entropy (requiring density operators) are rigorously extended to finite data sets and network parameters to yield the asserted thermodynamic impossibility for generalization.
Authors: The extensions are detailed in Section 2 of the manuscript. Topological entropy is extended to finite datasets using the concept of epsilon-entropy on point clouds derived from the data manifold, and von Neumann entropy is applied to the empirical covariance operator of the network weights. These adaptations allow us to formulate the phase transition condition. However, we acknowledge that the abstract does not preview these technical details, and we will include a brief note in a revised abstract if feasible. revision: partial
Circularity Check
ELH and Shannon-Topological Bottleneck Theorem are defined directly in terms of the three entropies whose imbalance is claimed to trigger the phase transition
specific steps
-
self definitional
[Abstract, paragraph 3]
"Central to our approach is the Entropic Learnability Horizon (ELH): a fundamental law stating that a network can only truly learn a target function if the Shannon entropy of the data manifold outpaces the topological entropy of the function's decision boundary, balanced by the von Neumann entropy of the network's weight space. We establish the Shannon-Topological Bottleneck Theorem, proving that when a target boundary's geometric complexity exceeds this informational horizon, the system undergoes a sudden entropic phase transition."
ELH is introduced as the law whose content is precisely the stated inequality among the three entropies; the theorem then 'proves' that exceeding the horizon (i.e., violating the definitional condition) produces the phase transition to frustration. The claimed result therefore reduces to restating its own definitional premise rather than deriving an independent consequence.
full rationale
The paper's central result (ELH as a 'fundamental law' and the Bottleneck Theorem) is introduced by defining the learnability condition exactly as the outpacing relation among Shannon entropy of the data manifold, topological entropy of the decision boundary, and von Neumann entropy of the weight space. No independent formulas, extensions of the classical entropy definitions to neural networks, or derivation establishing the inequality or thermodynamic transition are supplied in the abstract or claimed theorem statement. This matches the self-definitional pattern: the 'prediction' of the phase transition into informational frustration is equivalent to the definitional statement itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Shannon entropy of data manifold, topological entropy of decision boundary, and von Neumann entropy of weights are well-defined and quantitatively comparable quantities for deep networks
- ad hoc to paper Exceeding the entropy balance produces a thermodynamically impossible state for generalization
invented entities (5)
-
Entropic Learnability Horizon (ELH)
no independent evidence
-
Shannon-Topological Bottleneck Theorem
no independent evidence
-
Informational Frustration
no independent evidence
-
Entropic Release
no independent evidence
-
Entropic Gradient Descent (EGD)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
On the uniform convergence of relative frequencies of events to their probabilities,
V. Vapnik and A. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,”Theory of Probability & Its Applications, vol. 16, no. 2, pp. 264–280, 1971
1971
-
[2]
Rademacher and Gaussian complexities: Risk bounds and structural results,
P. L. Bartlett and S. Mendelson, “Rademacher and Gaussian complexities: Risk bounds and structural results,”Journal of Machine Learning Research, vol. 3, pp. 1079–1099, 2002
2002
-
[3]
Deep learning and the information bottleneck principle,
N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in Proceedings of the IEEE Information Theory Workshop, 2015
2015
-
[4]
A mathematical theory of communication,
C. E. Shannon, “A mathematical theory of communication,”The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948
1948
-
[5]
von Neumann,Mathematical Foundations of Quantum Mechanics
J. von Neumann,Mathematical Foundations of Quantum Mechanics. Princeton University Press, 1955
1955
-
[6]
B. B. Mandelbrot,The Fractal Geometry of Nature. W. H. Freeman and Co., 1982. 7
1982
-
[7]
Mézard, G
M. Mézard, G. Parisi, and M. A. Virasoro,Spin Glass Theory and Beyond. World Scientific, 1987
1987
-
[8]
Universal upper bound on the entropy-to-energy ratio for bounded sys- tems,
J. D. Bekenstein, “Universal upper bound on the entropy-to-energy ratio for bounded sys- tems,”Physical Review D, vol. 23, no. 2, p. 287, 1981
1981
-
[9]
Grokking: Generalizationbeyondoverfittingonsmallalgorithmicdatasets,
R.Poweretal., “Grokking: Generalizationbeyondoverfittingonsmallalgorithmicdatasets,” arXiv preprint arXiv:2201.02354, 2022
-
[10]
Irreversibility and heat generation in the computing process,
R. Landauer, “Irreversibility and heat generation in the computing process,”IBM Journal of Research and Development, vol. 5, no. 3, pp. 183–191, 1961
1961
-
[11]
The thermodynamics of computation—a review,
C. H. Bennett, “The thermodynamics of computation—a review,”International Journal of Theoretical Physics, vol. 21, no. 12, pp. 905–940, 1982
1982
-
[12]
On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings,
L. Szilard, “On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings,”Zeitschrift für Physik, vol. 53, no. 11-12, pp. 840–856, 1929
1929
-
[13]
Goldreich,Computational Complexity: A Conceptual Perspective
O. Goldreich,Computational Complexity: A Conceptual Perspective. Cambridge University Press, 2008
2008
-
[14]
Optimal brain damage,
Y. LeCun, J. Denker, and S. Solla, “Optimal brain damage,” inAdvances in Neural Infor- mation Processing Systems, vol. 2, 1990
1990
-
[15]
Statistical mechanics of deep learning,
A. Engle and G. Van Menon, “Statistical mechanics of deep learning,”Annual Review of Condensed Matter Physics, vol. 12, pp. 227–246, 2021. 8
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.