pith. sign in

arxiv: 2607.00586 · v1 · pith:LGBQZVCInew · submitted 2026-07-01 · 📊 stat.CO · cs.LG· math.PR

Optimal scaling of MCMC algorithms: exploiting the symmetry of the Metropolis-Hastings formula

Pith reviewed 2026-07-02 01:57 UTC · model grok-4.3

classification 📊 stat.CO cs.LGmath.PR
keywords MCMCoptimal scalingMetropolis-HastingsMALAhigh-dimensional samplingproposal varianceproduct targets
0
0 comments X

The pith

Symmetry of the Metropolis-Hastings formula yields MCMC proposals whose variance can scale as O(1/d^μ) for arbitrarily small μ>0.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an approach to optimal scaling of Metropolis-Hastings MCMC methods that rests on the symmetry property of the acceptance probability. This symmetry simplifies the high-dimensional limit analysis when the target is a product of independent components. The same framework recovers the classical μ=1 result for random-walk Metropolis and the μ=1/3 result for MALA, while producing new scaling results for implicit proposals and for integrators based on differential equations. In particular, it shows how to build gradient-based proposals that keep acceptance rates from collapsing even when the proposal variance shrinks far more slowly than the standard MALA rate.

Core claim

By exploiting the symmetry of the Metropolis-Hastings formula, one obtains a general method for determining the optimal scaling of proposal variance with dimension d for any Metropolised algorithm applied to product targets (including targets whose components are scaled differently). The method produces, among other results, a family of gradient-based proposals whose variance may be chosen as O(1/d^μ) with μ>0 arbitrarily small while still preserving a positive limiting acceptance rate.

What carries the argument

Symmetry of the Metropolis-Hastings acceptance probability, which equates the forward and reverse probabilities in the high-dimensional product limit and thereby reduces the scaling calculation to a one-dimensional problem.

If this is right

  • The same symmetry argument supplies optimal scaling for implicit proposals and for proposals obtained from differential-equation integrators.
  • The analysis continues to hold when the univariate factors are scaled by different constants.
  • Gradient-based proposals can be tuned to a wider range of dimensional scalings than the classical MALA choice.
  • Known results for random-walk Metropolis and standard MALA appear as special cases of the general framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The symmetry reduction may still give useful guidance for targets that are only approximately factorized, such as weakly dependent coordinates in high-dimensional posteriors.
  • The approach could be used to design new proposal mechanisms that deliberately preserve a form of reversibility symmetry even when the target is not a product.

Load-bearing premise

The target density must factor exactly as a product of univariate distributions so that the high-dimensional limit can be reduced to independent one-dimensional calculations.

What would settle it

For a product target with d components, generate samples with a gradient-based proposal whose variance is O(1/d^{0.01}); if the empirical acceptance rate tends to zero as d grows, the claimed scaling fails.

Figures

Figures reproduced from arXiv: 2607.00586 by J.M. Sanz-Serna, K.C. Zygalakis, P. Dobson.

Figure 1
Figure 1. Figure 1: Scaled Square Jumping Distance as a function of the [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
read the original abstract

We present a simple, yet general approach to study the scaling properties as the dimensionality of Metropolised MCMC sampling algorithms increases. The study relies ultimately on the symmetry of the Metropolis-Hastings formula. Our findings contain, as particular cases, many known results for the Random Walk Metropolis, MALA and other algorithms. In addition, they provide, in an easy way, new optimal scaling results for a variety of proposal mechanisms, including implicit proposals and proposals generated with the help of differential equation integrators. The analysis applies to targets that are products of a given, not necessarily univariate distribution, and also to cases where the different terms in the product are scaled differently. We show how to construct gradient-based MALA-like proposals where the variance of the proposal as the dimension $d$ increases may be taken as $O(1/d^\mu)$, with $\mu>0$ arbitrarily small, to be compared with the values $\mu = 1$ for Random Walk Metropolis and $\mu=1/3$ for MALA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript develops a general framework for deriving optimal scaling limits of Metropolis-Hastings MCMC algorithms in high dimensions by exploiting the symmetry property of the MH acceptance ratio. The approach recovers the classical μ=1 scaling for random-walk Metropolis and μ=1/3 for MALA as special cases, and extends to gradient-based, implicit, and integrator-based proposals on product-form targets (possibly with differently scaled components), yielding proposals whose variance can decay as slowly as O(1/d^μ) for arbitrarily small μ>0.

Significance. If the symmetry argument and associated limit theorems are rigorously established, the framework offers a unified and comparatively elementary route to scaling results that previously required separate analyses for each proposal class. The explicit construction of proposals achieving near-constant variance decay on product measures would be a notable technical contribution, provided the product-structure assumption is not overly restrictive for the intended applications.

major comments (2)
  1. [Abstract (final paragraph)] The central claim that μ can be taken arbitrarily small rests on the high-dimensional limit applied componentwise to the product target. The manuscript should supply the precise statement of the limit theorem (including the required regularity on the univariate factors and on the proposal kernel) that justifies passing the symmetry argument to the limit; without this, it is difficult to confirm that the claimed scaling is not an artifact of the product assumption.
  2. The abstract states that the analysis applies when the terms in the product are scaled differently. The main text must verify that the symmetry exploitation continues to hold under heterogeneous scalings; if the proof relies on identical marginals, the heterogeneous case would require an additional argument that is currently not visible.
minor comments (1)
  1. Notation for the proposal variance scaling (O(1/d^μ)) should be introduced with an explicit definition of the dimension-dependent proposal covariance in the first section where the new proposals are constructed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses
  1. Referee: [Abstract (final paragraph)] The central claim that μ can be taken arbitrarily small rests on the high-dimensional limit applied componentwise to the product target. The manuscript should supply the precise statement of the limit theorem (including the required regularity on the univariate factors and on the proposal kernel) that justifies passing the symmetry argument to the limit; without this, it is difficult to confirm that the claimed scaling is not an artifact of the product assumption.

    Authors: We agree that an explicit statement of the limit theorem is required for rigor. In the revised manuscript we will insert a dedicated theorem (new Theorem 2.3) that states the high-dimensional limit under the symmetry argument, with precise regularity assumptions: the univariate factors are C² with positive density and finite fourth moments, and the proposal kernel satisfies local Lipschitz continuity and the symmetry property used in the acceptance ratio. This will confirm that the componentwise application is justified and not an artifact. revision: yes

  2. Referee: [—] The abstract states that the analysis applies when the terms in the product are scaled differently. The main text must verify that the symmetry exploitation continues to hold under heterogeneous scalings; if the proof relies on identical marginals, the heterogeneous case would require an additional argument that is currently not visible.

    Authors: The symmetry of the Metropolis-Hastings ratio is a product of component-wise ratios and therefore holds verbatim when the components have different scalings. We will add a short subsection (new Section 3.2) that explicitly repeats the symmetry argument with heterogeneous variance factors σ_i²(d) for each component i, showing that the same limiting acceptance probability is obtained after rescaling each proposal variance separately. This argument does not require identical marginals. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central derivation applies the symmetry of the Metropolis-Hastings acceptance ratio componentwise to product targets (explicitly scoped in the abstract) in the high-d limit. This produces scaling limits for proposal variances, recovering RWM (μ=1) and MALA (μ=1/3) as special cases while extending to implicit and integrator-based proposals. No step reduces a claimed prediction to a fitted parameter, self-definition, or load-bearing self-citation; the product structure is an external modeling assumption that enables the symmetry argument rather than being derived from it. The approach is presented as first-principles and externally falsifiable via simulation on product measures.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented entities, or non-standard axioms are described. The analysis rests on the domain assumption that targets are product measures, which is standard in MCMC scaling literature but is the structural premise enabling the symmetry argument.

axioms (1)
  • domain assumption Target distribution factors as a product of univariate distributions (possibly with different scalings)
    Stated in the abstract as the setting to which the symmetry-based analysis applies.

pith-pipeline@v0.9.1-grok · 5721 in / 1229 out tokens · 32511 ms · 2026-07-02T01:57:47.968586+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references

  1. [1]

    B´ edard

    M. B´ edard. Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab., 17(4):1222–1244, 2007

  2. [2]

    B´ edard

    M. B´ edard. Optimal acceptance rates for Metropolis alg orithms: Moving beyond 0.234. Stochastic Process. Appl. , 118(12):2198–2222, 2008

  3. [3]

    Beskos, N

    A. Beskos, N. S. Pillai, G. O. Roberts, J. M. Sanz-Serna, a nd A. M. Stuart. Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli, 19(5A):1501–1534, 2013

  4. [4]

    Beskos, F

    A. Beskos, F. J. Pinski, J. M. Sanz-Serna, and A. M. Stuart . Hybrid Monte Carlo on Hilbert spaces. Stochastic Process. Appl. , 121(10):2201–2230, 2011

  5. [5]

    Beskos, G

    A. Beskos, G. O. Roberts, and A. M. Stuart. Optimal scalin gs for local Metropolis–Hastings chains on nonproduct targets in high dimensions. Ann. Appl. Probab. , 19(3):863–898, 2009

  6. [6]

    Beskos and A

    A. Beskos and A. M. Stuart. MCMC methods for sampling func tion space. In Proc. 6th Int. Congr. Ind. Appl. Math. , pages 337–364. European Mathematical Society, 2009

  7. [7]

    Blanes, F

    S. Blanes, F. Casas, and J. M. Sanz-Serna. Numerical inte grators for the Hybrid Monte Carlo method. SIAM J. Sci. Comput. , 36(4):A1556–A1580, 2014

  8. [8]

    Bou-Rabee and J

    N. Bou-Rabee and J. M. Sanz-Serna. Geometric integrator s and the Hamiltonian Monte Carlo method. Acta Numer. , 27:113–206, 2018

  9. [9]

    M. P. Calvo, D. Sanz-Alonso, and J. M. Sanz-Serna. HMC: Re ducing the number of rejections by not using leapfrog and some results on the acceptance rate . J. Comput. Phys. , 437:110333, 2021. 22

  10. [10]

    C. M. Campos and J. M. Sanz-Serna. Palindromic 3-stage s plitting integrators, a roadmap. J. Comput. Phys. , 346:340–355, 2017

  11. [11]

    Casas, J

    F. Casas, J. M. Sanz-Serna, and L. Shaw. Split Hamiltoni an Monte Carlo revisited. Stat. Comput., 32:86, 2022

  12. [12]

    Dobson, A

    P. Dobson, A. Harrison, T. Klatzer, G. O. Roberts, J. M. S anz-Serna, and K. C. Zygalakis. Metropolis Adjusted Implicit Midpoint Langevin Algorithm : A fully implicit ergodic MCMC method. In preparation

  13. [13]

    Durmus, G

    A. Durmus, G. O. Roberts, G. Vilmart, and K. C. Zygalakis . Fast Langevin based algorithm for MCMC in high dimensions. Ann. Appl. Probab. , 27(4):2195–2237, 2017

  14. [14]

    Girolami and B

    M. Girolami and B. Calderhead. Riemann manifold Langev in and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. , 73(2):123–214, 2011

  15. [15]

    Neal and G

    P. Neal and G. O. Roberts. Optimal scaling for partially updating MCMC algorithms. Ann. Appl. Probab., 16(2):475–515, 2006

  16. [16]

    N. S. Pillai. Optimal scaling for the proximal Langevin algorithm in high dimensions. J. Mach. Learn. Res., 25(404):1–32, 2024

  17. [17]

    N. S. Pillai, A. M. Stuart, and A. H. Thi´ ery. Optimal sca ling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. , 22(6):2320–2356, 2012

  18. [18]

    G. O. Roberts, A. Gelman, and W. R. Gilks. Weak convergen ce and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. , 7(1):110–120, 1997

  19. [19]

    G. O. Roberts and J. S. Rosenthal. Optimal scaling of dis crete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B Stat. Methodol. , 60(1):255–268, 1998

  20. [20]

    G. O. Roberts and J. S. Rosenthal. Optimal scaling for va rious Metropolis–Hastings algorithms. Statist. Sci. , 16(4):351–367, 2001

  21. [21]

    G. O. Roberts and R. L. Tweedie. Exponential convergenc e of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996

  22. [22]

    J. M. Sanz-Serna and M. P. Calvo. Numerical Hamiltonian Problems . Dover Publications, 2018

  23. [23]

    Sherlock and G

    C. Sherlock and G. O. Roberts. Optimal scaling of the ran dom walk Metropolis on elliptically symmetric unimodal targets. Bernoulli, 15(3):774–798, 2009

  24. [24]

    J. Yang, G. O. Roberts, and J. S. Rosenthal. Optimal scal ing of random-walk Metropolis algorithms on general target distributions. Stochastic Process. Appl., 130(10):6094–6132, 2020

  25. [25]

    Zanella, M

    G. Zanella, M. B´ edard, and W. S. Kendall. A Dirichlet fo rm approach to MCMC optimal scaling. Stochastic Process. Appl. , 127(12):4053–4082, 2017. 23