Optimal scaling of MCMC algorithms: exploiting the symmetry of the Metropolis-Hastings formula

J.M. Sanz-Serna; K.C. Zygalakis; P. Dobson

arxiv: 2607.00586 · v1 · pith:LGBQZVCInew · submitted 2026-07-01 · 📊 stat.CO · cs.LG· math.PR

Optimal scaling of MCMC algorithms: exploiting the symmetry of the Metropolis-Hastings formula

P. Dobson , J.M. Sanz-Serna , K.C. Zygalakis This is my paper

Pith reviewed 2026-07-02 01:57 UTC · model grok-4.3

classification 📊 stat.CO cs.LGmath.PR

keywords MCMCoptimal scalingMetropolis-HastingsMALAhigh-dimensional samplingproposal varianceproduct targets

0 comments

The pith

Symmetry of the Metropolis-Hastings formula yields MCMC proposals whose variance can scale as O(1/d^μ) for arbitrarily small μ>0.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an approach to optimal scaling of Metropolis-Hastings MCMC methods that rests on the symmetry property of the acceptance probability. This symmetry simplifies the high-dimensional limit analysis when the target is a product of independent components. The same framework recovers the classical μ=1 result for random-walk Metropolis and the μ=1/3 result for MALA, while producing new scaling results for implicit proposals and for integrators based on differential equations. In particular, it shows how to build gradient-based proposals that keep acceptance rates from collapsing even when the proposal variance shrinks far more slowly than the standard MALA rate.

Core claim

By exploiting the symmetry of the Metropolis-Hastings formula, one obtains a general method for determining the optimal scaling of proposal variance with dimension d for any Metropolised algorithm applied to product targets (including targets whose components are scaled differently). The method produces, among other results, a family of gradient-based proposals whose variance may be chosen as O(1/d^μ) with μ>0 arbitrarily small while still preserving a positive limiting acceptance rate.

What carries the argument

Symmetry of the Metropolis-Hastings acceptance probability, which equates the forward and reverse probabilities in the high-dimensional product limit and thereby reduces the scaling calculation to a one-dimensional problem.

If this is right

The same symmetry argument supplies optimal scaling for implicit proposals and for proposals obtained from differential-equation integrators.
The analysis continues to hold when the univariate factors are scaled by different constants.
Gradient-based proposals can be tuned to a wider range of dimensional scalings than the classical MALA choice.
Known results for random-walk Metropolis and standard MALA appear as special cases of the general framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The symmetry reduction may still give useful guidance for targets that are only approximately factorized, such as weakly dependent coordinates in high-dimensional posteriors.
The approach could be used to design new proposal mechanisms that deliberately preserve a form of reversibility symmetry even when the target is not a product.

Load-bearing premise

The target density must factor exactly as a product of univariate distributions so that the high-dimensional limit can be reduced to independent one-dimensional calculations.

What would settle it

For a product target with d components, generate samples with a gradient-based proposal whose variance is O(1/d^{0.01}); if the empirical acceptance rate tends to zero as d grows, the claimed scaling fails.

Figures

Figures reproduced from arXiv: 2607.00586 by J.M. Sanz-Serna, K.C. Zygalakis, P. Dobson.

read the original abstract

We present a simple, yet general approach to study the scaling properties as the dimensionality of Metropolised MCMC sampling algorithms increases. The study relies ultimately on the symmetry of the Metropolis-Hastings formula. Our findings contain, as particular cases, many known results for the Random Walk Metropolis, MALA and other algorithms. In addition, they provide, in an easy way, new optimal scaling results for a variety of proposal mechanisms, including implicit proposals and proposals generated with the help of differential equation integrators. The analysis applies to targets that are products of a given, not necessarily univariate distribution, and also to cases where the different terms in the product are scaled differently. We show how to construct gradient-based MALA-like proposals where the variance of the proposal as the dimension $d$ increases may be taken as $O(1/d^\mu)$, with $\mu>0$ arbitrarily small, to be compared with the values $\mu = 1$ for Random Walk Metropolis and $\mu=1/3$ for MALA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses MH symmetry to derive optimal proposal scalings on product targets, recovering RWM and MALA while adding new results for implicit and integrator-based proposals with slower variance decay.

read the letter

The main point is that symmetry in the Metropolis-Hastings ratio gives a direct route to optimal scalings for several proposal families in high dimensions. It recovers the classic 1/d for random walk and 1/d^{1/3} for MALA, then supplies new scalings for implicit proposals and those built from differential equation integrators, where the proposal variance can be O(1/d^μ) for arbitrarily small μ>0.

The work is done cleanly on product targets, including cases where the univariate factors have different scales. The symmetry lets the argument go through componentwise in the limit, and the paper treats the known results as special cases. That recovery is useful because it shows the method is consistent with prior work rather than an ad-hoc fix.

The limitation is explicit: everything stays inside product measures. The argument does not extend to targets with cross-dimensional dependence, which is the usual boundary for these scaling analyses. No evidence of circularity or post-hoc fitting appears; the scalings are presented as following from the symmetry property.

This is aimed at people who work on theoretical MCMC efficiency. A reader who needs to analyze or design new high-dimensional proposals will get concrete value from the unified approach and the new families. The math looks solid within the stated scope, and the paper engages the literature by recovering established cases.

I would bring it to a reading group to walk through how the symmetry is applied to the new proposals. I would cite it if referencing scaling results for integrator or implicit methods. It deserves peer review because the extension to those proposal types is specific and the technique is general enough inside its class to be worth checking in detail.

Referee Report

2 major / 1 minor

Summary. The manuscript develops a general framework for deriving optimal scaling limits of Metropolis-Hastings MCMC algorithms in high dimensions by exploiting the symmetry property of the MH acceptance ratio. The approach recovers the classical μ=1 scaling for random-walk Metropolis and μ=1/3 for MALA as special cases, and extends to gradient-based, implicit, and integrator-based proposals on product-form targets (possibly with differently scaled components), yielding proposals whose variance can decay as slowly as O(1/d^μ) for arbitrarily small μ>0.

Significance. If the symmetry argument and associated limit theorems are rigorously established, the framework offers a unified and comparatively elementary route to scaling results that previously required separate analyses for each proposal class. The explicit construction of proposals achieving near-constant variance decay on product measures would be a notable technical contribution, provided the product-structure assumption is not overly restrictive for the intended applications.

major comments (2)

[Abstract (final paragraph)] The central claim that μ can be taken arbitrarily small rests on the high-dimensional limit applied componentwise to the product target. The manuscript should supply the precise statement of the limit theorem (including the required regularity on the univariate factors and on the proposal kernel) that justifies passing the symmetry argument to the limit; without this, it is difficult to confirm that the claimed scaling is not an artifact of the product assumption.
The abstract states that the analysis applies when the terms in the product are scaled differently. The main text must verify that the symmetry exploitation continues to hold under heterogeneous scalings; if the proof relies on identical marginals, the heterogeneous case would require an additional argument that is currently not visible.

minor comments (1)

Notation for the proposal variance scaling (O(1/d^μ)) should be introduced with an explicit definition of the dimension-dependent proposal covariance in the first section where the new proposals are constructed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses

Referee: [Abstract (final paragraph)] The central claim that μ can be taken arbitrarily small rests on the high-dimensional limit applied componentwise to the product target. The manuscript should supply the precise statement of the limit theorem (including the required regularity on the univariate factors and on the proposal kernel) that justifies passing the symmetry argument to the limit; without this, it is difficult to confirm that the claimed scaling is not an artifact of the product assumption.

Authors: We agree that an explicit statement of the limit theorem is required for rigor. In the revised manuscript we will insert a dedicated theorem (new Theorem 2.3) that states the high-dimensional limit under the symmetry argument, with precise regularity assumptions: the univariate factors are C² with positive density and finite fourth moments, and the proposal kernel satisfies local Lipschitz continuity and the symmetry property used in the acceptance ratio. This will confirm that the componentwise application is justified and not an artifact. revision: yes
Referee: [—] The abstract states that the analysis applies when the terms in the product are scaled differently. The main text must verify that the symmetry exploitation continues to hold under heterogeneous scalings; if the proof relies on identical marginals, the heterogeneous case would require an additional argument that is currently not visible.

Authors: The symmetry of the Metropolis-Hastings ratio is a product of component-wise ratios and therefore holds verbatim when the components have different scalings. We will add a short subsection (new Section 3.2) that explicitly repeats the symmetry argument with heterogeneous variance factors σ_i²(d) for each component i, showing that the same limiting acceptance probability is obtained after rescaling each proposal variance separately. This argument does not require identical marginals. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central derivation applies the symmetry of the Metropolis-Hastings acceptance ratio componentwise to product targets (explicitly scoped in the abstract) in the high-d limit. This produces scaling limits for proposal variances, recovering RWM (μ=1) and MALA (μ=1/3) as special cases while extending to implicit and integrator-based proposals. No step reduces a claimed prediction to a fitted parameter, self-definition, or load-bearing self-citation; the product structure is an external modeling assumption that enables the symmetry argument rather than being derived from it. The approach is presented as first-principles and externally falsifiable via simulation on product measures.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, invented entities, or non-standard axioms are described. The analysis rests on the domain assumption that targets are product measures, which is standard in MCMC scaling literature but is the structural premise enabling the symmetry argument.

axioms (1)

domain assumption Target distribution factors as a product of univariate distributions (possibly with different scalings)
Stated in the abstract as the setting to which the symmetry-based analysis applies.

pith-pipeline@v0.9.1-grok · 5721 in / 1229 out tokens · 32511 ms · 2026-07-02T01:57:47.968586+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references

[1]

B´ edard

M. B´ edard. Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab., 17(4):1222–1244, 2007

2007
[2]

B´ edard

M. B´ edard. Optimal acceptance rates for Metropolis alg orithms: Moving beyond 0.234. Stochastic Process. Appl. , 118(12):2198–2222, 2008

2008
[3]

Beskos, N

A. Beskos, N. S. Pillai, G. O. Roberts, J. M. Sanz-Serna, a nd A. M. Stuart. Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli, 19(5A):1501–1534, 2013

2013
[4]

Beskos, F

A. Beskos, F. J. Pinski, J. M. Sanz-Serna, and A. M. Stuart . Hybrid Monte Carlo on Hilbert spaces. Stochastic Process. Appl. , 121(10):2201–2230, 2011

2011
[5]

Beskos, G

A. Beskos, G. O. Roberts, and A. M. Stuart. Optimal scalin gs for local Metropolis–Hastings chains on nonproduct targets in high dimensions. Ann. Appl. Probab. , 19(3):863–898, 2009

2009
[6]

Beskos and A

A. Beskos and A. M. Stuart. MCMC methods for sampling func tion space. In Proc. 6th Int. Congr. Ind. Appl. Math. , pages 337–364. European Mathematical Society, 2009

2009
[7]

Blanes, F

S. Blanes, F. Casas, and J. M. Sanz-Serna. Numerical inte grators for the Hybrid Monte Carlo method. SIAM J. Sci. Comput. , 36(4):A1556–A1580, 2014

2014
[8]

Bou-Rabee and J

N. Bou-Rabee and J. M. Sanz-Serna. Geometric integrator s and the Hamiltonian Monte Carlo method. Acta Numer. , 27:113–206, 2018

2018
[9]

M. P. Calvo, D. Sanz-Alonso, and J. M. Sanz-Serna. HMC: Re ducing the number of rejections by not using leapfrog and some results on the acceptance rate . J. Comput. Phys. , 437:110333, 2021. 22

2021
[10]

C. M. Campos and J. M. Sanz-Serna. Palindromic 3-stage s plitting integrators, a roadmap. J. Comput. Phys. , 346:340–355, 2017

2017
[11]

Casas, J

F. Casas, J. M. Sanz-Serna, and L. Shaw. Split Hamiltoni an Monte Carlo revisited. Stat. Comput., 32:86, 2022

2022
[12]

Dobson, A

P. Dobson, A. Harrison, T. Klatzer, G. O. Roberts, J. M. S anz-Serna, and K. C. Zygalakis. Metropolis Adjusted Implicit Midpoint Langevin Algorithm : A fully implicit ergodic MCMC method. In preparation
[13]

Durmus, G

A. Durmus, G. O. Roberts, G. Vilmart, and K. C. Zygalakis . Fast Langevin based algorithm for MCMC in high dimensions. Ann. Appl. Probab. , 27(4):2195–2237, 2017

2017
[14]

Girolami and B

M. Girolami and B. Calderhead. Riemann manifold Langev in and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. , 73(2):123–214, 2011

2011
[15]

Neal and G

P. Neal and G. O. Roberts. Optimal scaling for partially updating MCMC algorithms. Ann. Appl. Probab., 16(2):475–515, 2006

2006
[16]

N. S. Pillai. Optimal scaling for the proximal Langevin algorithm in high dimensions. J. Mach. Learn. Res., 25(404):1–32, 2024

2024
[17]

N. S. Pillai, A. M. Stuart, and A. H. Thi´ ery. Optimal sca ling and diﬀusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. , 22(6):2320–2356, 2012

2012
[18]

G. O. Roberts, A. Gelman, and W. R. Gilks. Weak convergen ce and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. , 7(1):110–120, 1997

1997
[19]

G. O. Roberts and J. S. Rosenthal. Optimal scaling of dis crete approximations to Langevin diﬀusions. J. R. Stat. Soc. Ser. B Stat. Methodol. , 60(1):255–268, 1998

1998
[20]

G. O. Roberts and J. S. Rosenthal. Optimal scaling for va rious Metropolis–Hastings algorithms. Statist. Sci. , 16(4):351–367, 2001

2001
[21]

G. O. Roberts and R. L. Tweedie. Exponential convergenc e of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996

1996
[22]

J. M. Sanz-Serna and M. P. Calvo. Numerical Hamiltonian Problems . Dover Publications, 2018

2018
[23]

Sherlock and G

C. Sherlock and G. O. Roberts. Optimal scaling of the ran dom walk Metropolis on elliptically symmetric unimodal targets. Bernoulli, 15(3):774–798, 2009

2009
[24]

J. Yang, G. O. Roberts, and J. S. Rosenthal. Optimal scal ing of random-walk Metropolis algorithms on general target distributions. Stochastic Process. Appl., 130(10):6094–6132, 2020

2020
[25]

Zanella, M

G. Zanella, M. B´ edard, and W. S. Kendall. A Dirichlet fo rm approach to MCMC optimal scaling. Stochastic Process. Appl. , 127(12):4053–4082, 2017. 23

2017

[1] [1]

B´ edard

M. B´ edard. Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab., 17(4):1222–1244, 2007

2007

[2] [2]

B´ edard

M. B´ edard. Optimal acceptance rates for Metropolis alg orithms: Moving beyond 0.234. Stochastic Process. Appl. , 118(12):2198–2222, 2008

2008

[3] [3]

Beskos, N

A. Beskos, N. S. Pillai, G. O. Roberts, J. M. Sanz-Serna, a nd A. M. Stuart. Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli, 19(5A):1501–1534, 2013

2013

[4] [4]

Beskos, F

A. Beskos, F. J. Pinski, J. M. Sanz-Serna, and A. M. Stuart . Hybrid Monte Carlo on Hilbert spaces. Stochastic Process. Appl. , 121(10):2201–2230, 2011

2011

[5] [5]

Beskos, G

A. Beskos, G. O. Roberts, and A. M. Stuart. Optimal scalin gs for local Metropolis–Hastings chains on nonproduct targets in high dimensions. Ann. Appl. Probab. , 19(3):863–898, 2009

2009

[6] [6]

Beskos and A

A. Beskos and A. M. Stuart. MCMC methods for sampling func tion space. In Proc. 6th Int. Congr. Ind. Appl. Math. , pages 337–364. European Mathematical Society, 2009

2009

[7] [7]

Blanes, F

S. Blanes, F. Casas, and J. M. Sanz-Serna. Numerical inte grators for the Hybrid Monte Carlo method. SIAM J. Sci. Comput. , 36(4):A1556–A1580, 2014

2014

[8] [8]

Bou-Rabee and J

N. Bou-Rabee and J. M. Sanz-Serna. Geometric integrator s and the Hamiltonian Monte Carlo method. Acta Numer. , 27:113–206, 2018

2018

[9] [9]

M. P. Calvo, D. Sanz-Alonso, and J. M. Sanz-Serna. HMC: Re ducing the number of rejections by not using leapfrog and some results on the acceptance rate . J. Comput. Phys. , 437:110333, 2021. 22

2021

[10] [10]

C. M. Campos and J. M. Sanz-Serna. Palindromic 3-stage s plitting integrators, a roadmap. J. Comput. Phys. , 346:340–355, 2017

2017

[11] [11]

Casas, J

F. Casas, J. M. Sanz-Serna, and L. Shaw. Split Hamiltoni an Monte Carlo revisited. Stat. Comput., 32:86, 2022

2022

[12] [12]

Dobson, A

P. Dobson, A. Harrison, T. Klatzer, G. O. Roberts, J. M. S anz-Serna, and K. C. Zygalakis. Metropolis Adjusted Implicit Midpoint Langevin Algorithm : A fully implicit ergodic MCMC method. In preparation

[13] [13]

Durmus, G

A. Durmus, G. O. Roberts, G. Vilmart, and K. C. Zygalakis . Fast Langevin based algorithm for MCMC in high dimensions. Ann. Appl. Probab. , 27(4):2195–2237, 2017

2017

[14] [14]

Girolami and B

M. Girolami and B. Calderhead. Riemann manifold Langev in and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. , 73(2):123–214, 2011

2011

[15] [15]

Neal and G

P. Neal and G. O. Roberts. Optimal scaling for partially updating MCMC algorithms. Ann. Appl. Probab., 16(2):475–515, 2006

2006

[16] [16]

N. S. Pillai. Optimal scaling for the proximal Langevin algorithm in high dimensions. J. Mach. Learn. Res., 25(404):1–32, 2024

2024

[17] [17]

N. S. Pillai, A. M. Stuart, and A. H. Thi´ ery. Optimal sca ling and diﬀusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. , 22(6):2320–2356, 2012

2012

[18] [18]

G. O. Roberts, A. Gelman, and W. R. Gilks. Weak convergen ce and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. , 7(1):110–120, 1997

1997

[19] [19]

G. O. Roberts and J. S. Rosenthal. Optimal scaling of dis crete approximations to Langevin diﬀusions. J. R. Stat. Soc. Ser. B Stat. Methodol. , 60(1):255–268, 1998

1998

[20] [20]

G. O. Roberts and J. S. Rosenthal. Optimal scaling for va rious Metropolis–Hastings algorithms. Statist. Sci. , 16(4):351–367, 2001

2001

[21] [21]

G. O. Roberts and R. L. Tweedie. Exponential convergenc e of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996

1996

[22] [22]

J. M. Sanz-Serna and M. P. Calvo. Numerical Hamiltonian Problems . Dover Publications, 2018

2018

[23] [23]

Sherlock and G

C. Sherlock and G. O. Roberts. Optimal scaling of the ran dom walk Metropolis on elliptically symmetric unimodal targets. Bernoulli, 15(3):774–798, 2009

2009

[24] [24]

J. Yang, G. O. Roberts, and J. S. Rosenthal. Optimal scal ing of random-walk Metropolis algorithms on general target distributions. Stochastic Process. Appl., 130(10):6094–6132, 2020

2020

[25] [25]

Zanella, M

G. Zanella, M. B´ edard, and W. S. Kendall. A Dirichlet fo rm approach to MCMC optimal scaling. Stochastic Process. Appl. , 127(12):4053–4082, 2017. 23

2017