pith. sign in

arxiv: 1804.08838 · v1 · pith:AW6HF2JFnew · submitted 2018-04-24 · 💻 cs.LG · cs.NE· stat.ML

Measuring the Intrinsic Dimension of Objective Landscapes

classification 💻 cs.LG cs.NEstat.ML
keywords dimensionintrinsicparametersproblemmanynetworksobjectivesimple
0
0 comments X
read the original abstract

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. On the Policy Gradient Foundations of Group Relative Policy Optimization: Credit Assignment, Gradient Sparsity, and Rank Collapse

    cs.LG 2026-06 conditional novelty 8.0

    GRPO's group-mean baseline assigns identical advantages to all tokens under output-only rewards, inducing gradient sparsity and an intrinsic rank-2 structure proven from the zero-sum constraint and confirmed by SVD on...

  2. Neural Networks Provably Learn Spectral Representations for Group Composition

    cs.LG 2026-06 unverdicted novelty 8.0

    Two-layer neural networks provably converge almost surely to irreducible representations of finite groups when trained on the group composition task, with the dynamics governed by Riemannian gradient ascent on a repre...

  3. Understanding Geometric Representations in Self-Supervised Vision Transformers via Subspace Intervention

    cs.CV 2026-07 unverdicted novelty 7.0

    The subspace intervention framework reveals that pre-training objectives shape how ViTs encode geometric information in compressible low-rank subspaces, with peak precision at intermediate layers.

  4. Learning to Strategically Acquire Resources in Competition

    cs.GT 2026-06 unverdicted novelty 7.0

    A game-theoretic model for multi-agent resource acquisition establishes BNE existence and computability under common priors, convergence conditions for learning dynamics, and simulations on financial data.

  5. Rank Is Not Capacity: Spectral Occupancy for Latent Graph Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Spectra defines and controls effective capacity in graph embeddings via the Shannon effective rank of a trace-normalized kernel spectrum, making capacity a post-fit property rather than a pre-training hyperparameter.

  6. LoRA: Low-Rank Adaptation of Large Language Models

    cs.CL 2021-06 accept novelty 7.0

    Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.

  7. Recoverable but Not Stationary:Local Linear Structures in Weights and Activations

    cs.LG 2026-06 unverdicted novelty 6.0

    Local low-rank task-gradient structures exist in weights and activations but are non-stationary, with initial recovery updates forming a basis capturing 77% of LoRA displacement and parameter steps aligning 0.58 cosin...

  8. Generative Criticality in Large Language Model Temperature Scaling

    cs.LG 2026-06 unverdicted novelty 6.0

    A statistical-field treatment of LLM outputs shows a susceptibility peak, order-parameter shift, and intrinsic-dimension minimum near a characteristic temperature Tc in softmax scaling.

  9. Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States

    quant-ph 2026-05 unverdicted novelty 6.0

    LPQCs are shown to universally approximate distributions over quantum density operators in 1-Wasserstein distance via a hybrid classical-quantum construction, with added multimodal priors and mixture-of-experts archit...

  10. Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

    cs.LG 2026-05 unverdicted novelty 6.0

    Pretraining induces stable leading singular vectors that form a reusable spectral basis inherited by downstream tasks, enabling competitive performance with 0.2% trainable parameters on GLUE.

  11. TLoRA: Task-aware Low Rank Adaptation of Large Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer ...

  12. HyperAdapt: Simple High-Rank Adaptation

    cs.LG 2025-09 unverdicted novelty 6.0

    HyperAdapt performs parameter-efficient fine-tuning by row- and column-wise diagonal scaling to induce high-rank updates with only n+m trainable parameters.

  13. Little by Little: Continual Learning via Incremental Mixture of Rank-1 Associative Memory Experts

    cs.LG 2025-06 unverdicted novelty 6.0

    MoRAM frames continual learning as incremental addition of rank-1 adapters viewed as self-activating key-value associative memory units in a mixture-of-experts setup.

  14. Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

    cs.CL 2023-05 conditional novelty 6.0

    UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

  15. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  16. A General Language Assistant as a Laboratory for Alignment

    cs.CL 2021-12 conditional novelty 6.0

    Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.

  17. Scaling Laws for Transfer

    cs.LG 2021-02 unverdicted novelty 6.0

    Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.

  18. Multi-task Self-Supervised Learning for Human Activity Detection

    cs.LG 2019-07 unverdicted novelty 6.0

    A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for s...

  19. BaRA: Bayesian Adaptive Rank Allocation for Parameter-Efficient Fine-Tuning

    cs.LG 2026-06 unverdicted novelty 5.0

    BaRA adds Bayesian adaptive rank allocation to LoRA fine-tuning by activating sparse instance-specific latent factors, with a generalization bound depending on learned joint effective rank rather than fixed maximum rank.

  20. Comparing Classical Simulation and Sample-Based Learning of Quantum Systems: Learning the Hardness of Quantum Systems from Samples

    quant-ph 2026-05 unverdicted novelty 5.0

    Empirical study finds neural-network learning difficulty (via Hessian eigenvalue and random subspace optimization) correlates with classical simulation hardness parameterized by MPS bond dimension and T-gate count.

  21. Using predefined vector systems to speed up neural network multimillion class classification

    cs.LG 2026-04 unverdicted novelty 5.0

    Predefined vector systems structure neural network latent spaces to allow O(1) label prediction via index searches on embedding vectors, delivering up to 11.6x speedup on multimillion-class tasks while preserving accu...

  22. ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education

    cs.IR 2026-02 conditional novelty 5.0

    ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.

  23. Geometric Analysis of Neural Regression Collapse via Intrinsic Dimension

    cs.LG 2025-10 unverdicted novelty 5.0

    Neural regression collapse occurs when last-layer feature intrinsic dimension falls below target intrinsic dimension, creating over-compressed and under-compressed regimes that govern generalization based on data quan...

  24. What does it mean to understand a neural network?

    cs.LG 2019-07 unverdicted novelty 4.0

    Simple training code produces complex neural networks, suggesting that brain learning rules may be easier to understand than mature brain properties and that neuroscience should shift focus accordingly.