citations
When Pith reviews a paper, every reference becomes its own page. Works cited by many reviewed papers bubble up as foundational candidates for the next Recognition Review. The graph below is the live result.
browse foundational works
The most-cited unreviewed papers in Pith's corpus. A live queue for future review.
submit for recognition review
Already reviewed by Pith? Request a deeper Recognition pass. Cite the result.
trace a paper's references
Open any reviewed paper in the home feed to see its bibliography graph.
foundational works · sample
-
Adam: A Method for Stochastic Optimization
A first-order stochastic optimizer that maintains bias-corrected exponential moving averages of the gradient and its square, dividing the former by the square root of the latter to set per-parameter step sizes.
-
Proximal Policy Optimization Algorithms
A clipped surrogate objective L^CLIP = E[min(r_t A_t, clip(r_t, 1-ε, 1+ε) A_t)] enables multi-epoch minibatch policy updates with TRPO-like stability but first-order optimization.
-
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
DeepSeekMath 7B reaches 51.7% on MATH via continued pretraining on curated web math data and Group Relative Policy Optimization.
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Pure reinforcement learning on LLMs produces emergent reasoning patterns and outperforms supervised models trained on human demonstrations on verifiable math, coding, and STEM tasks.
-
Training Verifiers to Solve Math Word Problems
Introduces GSM8K dataset and demonstrates that verifier-based selection of solutions from multiple candidates outperforms fine-tuning baselines on math word problems.
-
Evaluating Large Language Models Trained on Code
Codex achieves 28.8% pass@1 on HumanEval, rising to 70.2% with 100 samples per problem via repeated sampling.
-
LLaMA: Open and Efficient Foundation Language Models
LLaMA models (7B-65B params) trained only on public data outperform GPT-3 (175B) on most benchmarks and match top closed models like Chinchilla-70B and PaLM-540B.
- Decoupled Weight Decay Regularization
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer (ViT) applies a standard transformer directly to image patches and matches or exceeds state-of-the-art CNN performance on classification benchmarks after large-scale pre-training.
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2 introduces pretrained and chat-optimized LLMs up to 70B parameters that surpass other open chat models on standard benchmarks and human evaluations of helpfulness and safety.