Optimal scaling of MCMC algorithms: exploiting the symmetry of the Metropolis-Hastings formula
Pith reviewed 2026-07-02 01:57 UTC · model grok-4.3
The pith
Symmetry of the Metropolis-Hastings formula yields MCMC proposals whose variance can scale as O(1/d^μ) for arbitrarily small μ>0.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By exploiting the symmetry of the Metropolis-Hastings formula, one obtains a general method for determining the optimal scaling of proposal variance with dimension d for any Metropolised algorithm applied to product targets (including targets whose components are scaled differently). The method produces, among other results, a family of gradient-based proposals whose variance may be chosen as O(1/d^μ) with μ>0 arbitrarily small while still preserving a positive limiting acceptance rate.
What carries the argument
Symmetry of the Metropolis-Hastings acceptance probability, which equates the forward and reverse probabilities in the high-dimensional product limit and thereby reduces the scaling calculation to a one-dimensional problem.
If this is right
- The same symmetry argument supplies optimal scaling for implicit proposals and for proposals obtained from differential-equation integrators.
- The analysis continues to hold when the univariate factors are scaled by different constants.
- Gradient-based proposals can be tuned to a wider range of dimensional scalings than the classical MALA choice.
- Known results for random-walk Metropolis and standard MALA appear as special cases of the general framework.
Where Pith is reading between the lines
- The symmetry reduction may still give useful guidance for targets that are only approximately factorized, such as weakly dependent coordinates in high-dimensional posteriors.
- The approach could be used to design new proposal mechanisms that deliberately preserve a form of reversibility symmetry even when the target is not a product.
Load-bearing premise
The target density must factor exactly as a product of univariate distributions so that the high-dimensional limit can be reduced to independent one-dimensional calculations.
What would settle it
For a product target with d components, generate samples with a gradient-based proposal whose variance is O(1/d^{0.01}); if the empirical acceptance rate tends to zero as d grows, the claimed scaling fails.
Figures
read the original abstract
We present a simple, yet general approach to study the scaling properties as the dimensionality of Metropolised MCMC sampling algorithms increases. The study relies ultimately on the symmetry of the Metropolis-Hastings formula. Our findings contain, as particular cases, many known results for the Random Walk Metropolis, MALA and other algorithms. In addition, they provide, in an easy way, new optimal scaling results for a variety of proposal mechanisms, including implicit proposals and proposals generated with the help of differential equation integrators. The analysis applies to targets that are products of a given, not necessarily univariate distribution, and also to cases where the different terms in the product are scaled differently. We show how to construct gradient-based MALA-like proposals where the variance of the proposal as the dimension $d$ increases may be taken as $O(1/d^\mu)$, with $\mu>0$ arbitrarily small, to be compared with the values $\mu = 1$ for Random Walk Metropolis and $\mu=1/3$ for MALA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a general framework for deriving optimal scaling limits of Metropolis-Hastings MCMC algorithms in high dimensions by exploiting the symmetry property of the MH acceptance ratio. The approach recovers the classical μ=1 scaling for random-walk Metropolis and μ=1/3 for MALA as special cases, and extends to gradient-based, implicit, and integrator-based proposals on product-form targets (possibly with differently scaled components), yielding proposals whose variance can decay as slowly as O(1/d^μ) for arbitrarily small μ>0.
Significance. If the symmetry argument and associated limit theorems are rigorously established, the framework offers a unified and comparatively elementary route to scaling results that previously required separate analyses for each proposal class. The explicit construction of proposals achieving near-constant variance decay on product measures would be a notable technical contribution, provided the product-structure assumption is not overly restrictive for the intended applications.
major comments (2)
- [Abstract (final paragraph)] The central claim that μ can be taken arbitrarily small rests on the high-dimensional limit applied componentwise to the product target. The manuscript should supply the precise statement of the limit theorem (including the required regularity on the univariate factors and on the proposal kernel) that justifies passing the symmetry argument to the limit; without this, it is difficult to confirm that the claimed scaling is not an artifact of the product assumption.
- The abstract states that the analysis applies when the terms in the product are scaled differently. The main text must verify that the symmetry exploitation continues to hold under heterogeneous scalings; if the proof relies on identical marginals, the heterogeneous case would require an additional argument that is currently not visible.
minor comments (1)
- Notation for the proposal variance scaling (O(1/d^μ)) should be introduced with an explicit definition of the dimension-dependent proposal covariance in the first section where the new proposals are constructed.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications.
read point-by-point responses
-
Referee: [Abstract (final paragraph)] The central claim that μ can be taken arbitrarily small rests on the high-dimensional limit applied componentwise to the product target. The manuscript should supply the precise statement of the limit theorem (including the required regularity on the univariate factors and on the proposal kernel) that justifies passing the symmetry argument to the limit; without this, it is difficult to confirm that the claimed scaling is not an artifact of the product assumption.
Authors: We agree that an explicit statement of the limit theorem is required for rigor. In the revised manuscript we will insert a dedicated theorem (new Theorem 2.3) that states the high-dimensional limit under the symmetry argument, with precise regularity assumptions: the univariate factors are C² with positive density and finite fourth moments, and the proposal kernel satisfies local Lipschitz continuity and the symmetry property used in the acceptance ratio. This will confirm that the componentwise application is justified and not an artifact. revision: yes
-
Referee: [—] The abstract states that the analysis applies when the terms in the product are scaled differently. The main text must verify that the symmetry exploitation continues to hold under heterogeneous scalings; if the proof relies on identical marginals, the heterogeneous case would require an additional argument that is currently not visible.
Authors: The symmetry of the Metropolis-Hastings ratio is a product of component-wise ratios and therefore holds verbatim when the components have different scalings. We will add a short subsection (new Section 3.2) that explicitly repeats the symmetry argument with heterogeneous variance factors σ_i²(d) for each component i, showing that the same limiting acceptance probability is obtained after rescaling each proposal variance separately. This argument does not require identical marginals. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper's central derivation applies the symmetry of the Metropolis-Hastings acceptance ratio componentwise to product targets (explicitly scoped in the abstract) in the high-d limit. This produces scaling limits for proposal variances, recovering RWM (μ=1) and MALA (μ=1/3) as special cases while extending to implicit and integrator-based proposals. No step reduces a claimed prediction to a fitted parameter, self-definition, or load-bearing self-citation; the product structure is an external modeling assumption that enables the symmetry argument rather than being derived from it. The approach is presented as first-principles and externally falsifiable via simulation on product measures.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Target distribution factors as a product of univariate distributions (possibly with different scalings)
Reference graph
Works this paper leans on
-
[1]
B´ edard
M. B´ edard. Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab., 17(4):1222–1244, 2007
2007
-
[2]
B´ edard
M. B´ edard. Optimal acceptance rates for Metropolis alg orithms: Moving beyond 0.234. Stochastic Process. Appl. , 118(12):2198–2222, 2008
2008
-
[3]
Beskos, N
A. Beskos, N. S. Pillai, G. O. Roberts, J. M. Sanz-Serna, a nd A. M. Stuart. Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli, 19(5A):1501–1534, 2013
2013
-
[4]
Beskos, F
A. Beskos, F. J. Pinski, J. M. Sanz-Serna, and A. M. Stuart . Hybrid Monte Carlo on Hilbert spaces. Stochastic Process. Appl. , 121(10):2201–2230, 2011
2011
-
[5]
Beskos, G
A. Beskos, G. O. Roberts, and A. M. Stuart. Optimal scalin gs for local Metropolis–Hastings chains on nonproduct targets in high dimensions. Ann. Appl. Probab. , 19(3):863–898, 2009
2009
-
[6]
Beskos and A
A. Beskos and A. M. Stuart. MCMC methods for sampling func tion space. In Proc. 6th Int. Congr. Ind. Appl. Math. , pages 337–364. European Mathematical Society, 2009
2009
-
[7]
Blanes, F
S. Blanes, F. Casas, and J. M. Sanz-Serna. Numerical inte grators for the Hybrid Monte Carlo method. SIAM J. Sci. Comput. , 36(4):A1556–A1580, 2014
2014
-
[8]
Bou-Rabee and J
N. Bou-Rabee and J. M. Sanz-Serna. Geometric integrator s and the Hamiltonian Monte Carlo method. Acta Numer. , 27:113–206, 2018
2018
-
[9]
M. P. Calvo, D. Sanz-Alonso, and J. M. Sanz-Serna. HMC: Re ducing the number of rejections by not using leapfrog and some results on the acceptance rate . J. Comput. Phys. , 437:110333, 2021. 22
2021
-
[10]
C. M. Campos and J. M. Sanz-Serna. Palindromic 3-stage s plitting integrators, a roadmap. J. Comput. Phys. , 346:340–355, 2017
2017
-
[11]
Casas, J
F. Casas, J. M. Sanz-Serna, and L. Shaw. Split Hamiltoni an Monte Carlo revisited. Stat. Comput., 32:86, 2022
2022
-
[12]
Dobson, A
P. Dobson, A. Harrison, T. Klatzer, G. O. Roberts, J. M. S anz-Serna, and K. C. Zygalakis. Metropolis Adjusted Implicit Midpoint Langevin Algorithm : A fully implicit ergodic MCMC method. In preparation
-
[13]
Durmus, G
A. Durmus, G. O. Roberts, G. Vilmart, and K. C. Zygalakis . Fast Langevin based algorithm for MCMC in high dimensions. Ann. Appl. Probab. , 27(4):2195–2237, 2017
2017
-
[14]
Girolami and B
M. Girolami and B. Calderhead. Riemann manifold Langev in and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. , 73(2):123–214, 2011
2011
-
[15]
Neal and G
P. Neal and G. O. Roberts. Optimal scaling for partially updating MCMC algorithms. Ann. Appl. Probab., 16(2):475–515, 2006
2006
-
[16]
N. S. Pillai. Optimal scaling for the proximal Langevin algorithm in high dimensions. J. Mach. Learn. Res., 25(404):1–32, 2024
2024
-
[17]
N. S. Pillai, A. M. Stuart, and A. H. Thi´ ery. Optimal sca ling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. , 22(6):2320–2356, 2012
2012
-
[18]
G. O. Roberts, A. Gelman, and W. R. Gilks. Weak convergen ce and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. , 7(1):110–120, 1997
1997
-
[19]
G. O. Roberts and J. S. Rosenthal. Optimal scaling of dis crete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B Stat. Methodol. , 60(1):255–268, 1998
1998
-
[20]
G. O. Roberts and J. S. Rosenthal. Optimal scaling for va rious Metropolis–Hastings algorithms. Statist. Sci. , 16(4):351–367, 2001
2001
-
[21]
G. O. Roberts and R. L. Tweedie. Exponential convergenc e of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996
1996
-
[22]
J. M. Sanz-Serna and M. P. Calvo. Numerical Hamiltonian Problems . Dover Publications, 2018
2018
-
[23]
Sherlock and G
C. Sherlock and G. O. Roberts. Optimal scaling of the ran dom walk Metropolis on elliptically symmetric unimodal targets. Bernoulli, 15(3):774–798, 2009
2009
-
[24]
J. Yang, G. O. Roberts, and J. S. Rosenthal. Optimal scal ing of random-walk Metropolis algorithms on general target distributions. Stochastic Process. Appl., 130(10):6094–6132, 2020
2020
-
[25]
Zanella, M
G. Zanella, M. B´ edard, and W. S. Kendall. A Dirichlet fo rm approach to MCMC optimal scaling. Stochastic Process. Appl. , 127(12):4053–4082, 2017. 23
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.