pith. sign in

arxiv: 2607.02285 · v1 · pith:CPAH7UAPnew · submitted 2026-07-02 · 📊 stat.ME

Goodness of Fit Tests Based on Joint Densities of Multiple Sample Statistics

Pith reviewed 2026-07-03 07:33 UTC · model grok-4.3

classification 📊 stat.ME
keywords goodness-of-fit testsjoint distributionssample statisticsprincipal componentshighest density regionsk-nearest neighborsorder statisticspermutation tests
0
0 comments X

The pith

Goodness-of-fit tests built from joint distributions of multiple sample statistics are competitive with and often more powerful than existing methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes goodness-of-fit tests that use simulated confidence sets built from the joint distributions of multiple sample statistics. These tests apply to absolutely continuous null distributions with known parameters. One class employs hyperrectangular confidence sets for principal components of order statistics and related vectors. A second class constructs confidence sets of arbitrary shape through highest density regions estimated by a k-nearest-neighbor method. Simulations show the procedures are competitive with and often more powerful than classical, Zhang, and graphical tests against a wide range of alternatives.

Core claim

The central claim is that goodness-of-fit tests can be constructed from simulated confidence sets for the joint distributions of multiple sample statistics, including order statistics, empirical distribution function values, moments, and combinations of classical statistics. One class uses hyperrectangular sets for principal-component vectors; the second uses highest density regions detected via k-nearest neighbors. These procedures are competitive with and often more powerful than existing methods in simulations, with extensions to two-sample permutation tests and transformations to target distributions such as the standard normal.

What carries the argument

Simulated confidence sets for the joint distribution of multiple sample statistics, which capture dependencies to define rejection regions for testing fit to a known continuous distribution.

If this is right

  • Under a normal null the first principal component of the order statistics corresponds to the sample mean while the second relates to a linear analogue of variance.
  • The methods extend to two-sample testing through permutation procedures based on the joint distributions of several statistics.
  • Mapping data to another target distribution such as the standard normal can be advantageous when powerful tests exist for that distribution.
  • Tests can be formed from combinations of classical goodness-of-fit statistics inside the highest-density-region framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The k-nearest-neighbor approach to highest density regions may extend more readily to higher-dimensional statistic vectors than kernel-density methods.
  • The observed correspondence between principal components and mean-variance analogues under normality suggests that analogous geometric interpretations could be derived for other null distributions.
  • If the joint-distribution approach generalizes, it could be applied to settings with estimated parameters by adjusting the simulation step accordingly.

Load-bearing premise

The null distributions are absolutely continuous with known parameters, which allows direct simulation of the joint distributions of the chosen sample statistics.

What would settle it

A simulation study against the alternatives examined in the paper in which the new tests show consistently lower power than Zhang or classical methods would falsify the claim of competitiveness and superior performance.

Figures

Figures reproduced from arXiv: 2607.02285 by Roman Guchenko.

Figure 2.1
Figure 2.1. Figure 2.1: Histograms for empirical distributions of order statistics for samples from standard normal [PITH_FULL_IMAGE:figures/full_fig_p017_2_1.png] view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: Red - numerically estimated functional 0 [PITH_FULL_IMAGE:figures/full_fig_p020_2_2.png] view at source ↗
Figure 2.3
Figure 2.3. Figure 2.3: Pairwise contour plots for 2d projections of empirical joint density of order statistics for [PITH_FULL_IMAGE:figures/full_fig_p022_2_3.png] view at source ↗
Figure 2.4
Figure 2.4. Figure 2.4: Pairwise hexbin plots for 2d projections of empirical joint density of order statistics for [PITH_FULL_IMAGE:figures/full_fig_p024_2_4.png] view at source ↗
Figure 2.5
Figure 2.5. Figure 2.5: Histograms for empirical distributions of principal components of order statistics for sam [PITH_FULL_IMAGE:figures/full_fig_p026_2_5.png] view at source ↗
Figure 2.6
Figure 2.6. Figure 2.6: Pairwise contour plots for 2d projections of empirical joint density of principal components [PITH_FULL_IMAGE:figures/full_fig_p028_2_6.png] view at source ↗
Figure 2.7
Figure 2.7. Figure 2.7: Pairwise hexbin plots for 2d projections of empirical joint density of principal components [PITH_FULL_IMAGE:figures/full_fig_p029_2_7.png] view at source ↗
Figure 2.8
Figure 2.8. Figure 2.8: Like figure 2.6, but hyperrectangle bounds are variance-based. 29 [PITH_FULL_IMAGE:figures/full_fig_p031_2_8.png] view at source ↗
Figure 2.9
Figure 2.9. Figure 2.9: Like figure 2.7, but hyperrectangle bounds are variance-based. For var 1 vs var 2 plot we see that the confidence rectangle is significantly closer to the most dense black-painted area than the corresponding confidence rectangle on figure 2.7. For var 9 vs var 10 plot we see the opposite: much looser confidence rectangle that is significantly farther from the most dense black area if compared to the corr… view at source ↗
Figure 2.10
Figure 2.10. Figure 2.10: Order statistics from N(1, 1), hyperectangle bounds for N(0, 1). 35 [PITH_FULL_IMAGE:figures/full_fig_p037_2_10.png] view at source ↗
Figure 2.11
Figure 2.11. Figure 2.11: Order statistics from N(0, 1) (red hexes) and N(1, 1) (blue hexes), hyperrectangle bounds for N(0, 1) (red rectangles). We see that on figures 2.10 and 2.11 all order statistics from N(1, 1) are equally shifted against order statistics from N(0, 1). 36 [PITH_FULL_IMAGE:figures/full_fig_p038_2_11.png] view at source ↗
Figure 2.12
Figure 2.12. Figure 2.12: Principal components of order statistics from [PITH_FULL_IMAGE:figures/full_fig_p039_2_12.png] view at source ↗
Figure 2.13
Figure 2.13. Figure 2.13: Principal components of order statistics from [PITH_FULL_IMAGE:figures/full_fig_p040_2_13.png] view at source ↗
Figure 2.14
Figure 2.14. Figure 2.14: Order statistics from N(0, 1.5 2 ), hyperrectangle bounds for N(0, 1). 39 [PITH_FULL_IMAGE:figures/full_fig_p041_2_14.png] view at source ↗
Figure 2.15
Figure 2.15. Figure 2.15: Order statistics from N(0, 1) (red hexes) and N(0, 1.5 2 ) (blue hexes), hyperrectangle bounds for N(0, 1) (red rectangles). On figures 2.14 and 2.15 we see that all marginal distributions of order statistics from N(0, 1.5 2 ) have larger scale than corresponding marginal distributions of order statistics from N(0, 1). First order statistic of N(0, 1.5 2 ) is significantly shifted to the left relative t… view at source ↗
Figure 2.16
Figure 2.16. Figure 2.16: Principal components of order statistics from [PITH_FULL_IMAGE:figures/full_fig_p043_2_16.png] view at source ↗
Figure 2.17
Figure 2.17. Figure 2.17: Principal components of order statistics from [PITH_FULL_IMAGE:figures/full_fig_p044_2_17.png] view at source ↗
Figure 2.18
Figure 2.18. Figure 2.18: Order statistics for N(0, 0.6 2 ), hyperrectangle bounds for N(0, 1). 43 [PITH_FULL_IMAGE:figures/full_fig_p045_2_18.png] view at source ↗
Figure 2.19
Figure 2.19. Figure 2.19: Order statistics from N(0, 1) (red hexes) and N(0, 0.6 2 ) (blue hexes), hyperectangle bounds for N(0,1) (red rectangles) We see that almost all blue hexes are inside red rectangles! That explains near zero power of hyperrectangle bounds for order statistics. 44 [PITH_FULL_IMAGE:figures/full_fig_p046_2_19.png] view at source ↗
Figure 2.20
Figure 2.20. Figure 2.20: Principal components of order statistics for [PITH_FULL_IMAGE:figures/full_fig_p047_2_20.png] view at source ↗
Figure 2.21
Figure 2.21. Figure 2.21: Principal components of order statistics from [PITH_FULL_IMAGE:figures/full_fig_p048_2_21.png] view at source ↗
Figure 2.22
Figure 2.22. Figure 2.22: Order statistics; N(0, 1) (red hexes), Gamma(1, 1)−1 (blue hexes); hyperrectangle bounds for N(0, 1) (red rectangles). Here we see that blue hexes lie outside of confidence rectangles mostly for pairwise plots associated with 6th to 10th order statistics. 48 [PITH_FULL_IMAGE:figures/full_fig_p050_2_22.png] view at source ↗
Figure 2.23
Figure 2.23. Figure 2.23: Principal components of order statistics; [PITH_FULL_IMAGE:figures/full_fig_p051_2_23.png] view at source ↗
Figure 2.24
Figure 2.24. Figure 2.24: Order statistics; N(0, 1) (red hexes), Cauchy(0, 0.2) (blue hexes); hyperrectangle bounds for N(0, 1) (red rectangles). We see that on plots associated with 4th to 7th order stats blue hexes mostly lie inside the bounds. For 1st, 2nd, 3rd, 8th, 9th and 10th order stats that are on the tails we see the most rejections. 50 [PITH_FULL_IMAGE:figures/full_fig_p052_2_24.png] view at source ↗
Figure 2.25
Figure 2.25. Figure 2.25: Principal components of order statistics; [PITH_FULL_IMAGE:figures/full_fig_p053_2_25.png] view at source ↗
Figure 2.26
Figure 2.26. Figure 2.26: Order statistics; N(0, 1) (red hexes), t(4) (blue hexes); hyperrectangle bounds for N(0, 1) (red rectangles). Here again it looks like tail-associated order statistics play more significant role than center ones when we try to distinguish between N(0, 1) and t(4). 52 [PITH_FULL_IMAGE:figures/full_fig_p054_2_26.png] view at source ↗
Figure 2.27
Figure 2.27. Figure 2.27: Principal components of order statistics; [PITH_FULL_IMAGE:figures/full_fig_p055_2_27.png] view at source ↗
Figure 2.28
Figure 2.28. Figure 2.28: G(a) distributions for different values of a. Now let us proceed to the powers: 1 res.gamma <- calc.rejections.sm.pc.bounds( 2 sample.generation.function.H0 = rnorm, 3 sample.generation.functions.H1 = 4 lapply( 5 c(seq(0.1, 0.9, 0.1), 1:5, 10, 100, 1000), 6 function(shape) 7 function(nsamples) 8 (rgamma(nsamples, shape = shape, rate = 1) - shape) / sqrt(shape) 9 ), 10 get.stat.matrix = get.sorted.sample… view at source ↗
Figure 2.29
Figure 2.29. Figure 2.29: Histograms for empirical distributions of order statistics for samples from standard uni [PITH_FULL_IMAGE:figures/full_fig_p084_2_29.png] view at source ↗
Figure 2.30
Figure 2.30. Figure 2.30: Red — estimated 0.95-confidence bound for empirical cumulative distribution function [PITH_FULL_IMAGE:figures/full_fig_p086_2_30.png] view at source ↗
Figure 2.31
Figure 2.31. Figure 2.31: Pairwise contour plots for 2d projections of empirical joint density of order statistics for [PITH_FULL_IMAGE:figures/full_fig_p087_2_31.png] view at source ↗
Figure 2.32
Figure 2.32. Figure 2.32: Pairwise hexbin plots for 2d projections of empirical joint density of order statistics for [PITH_FULL_IMAGE:figures/full_fig_p088_2_32.png] view at source ↗
Figure 2.33
Figure 2.33. Figure 2.33: Histograms for empirical distributions of principal components of order statistics for [PITH_FULL_IMAGE:figures/full_fig_p091_2_33.png] view at source ↗
Figure 2.34
Figure 2.34. Figure 2.34: Pairwise contour plots for 2d projections of empirical joint density of principal compo [PITH_FULL_IMAGE:figures/full_fig_p093_2_34.png] view at source ↗
Figure 2.35
Figure 2.35. Figure 2.35: Pairwise hexbin plots for 2d projections of empirical joint density of principal components [PITH_FULL_IMAGE:figures/full_fig_p094_2_35.png] view at source ↗
Figure 2.36
Figure 2.36. Figure 2.36: Beta(a, a), a > 1; is symmetric, has more points in the center; Beta(0.8, 0.8) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 20000 40000 60000 Beta(0.5, 0.5) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 40000 80000 120000 Beta(0.2, 0.2) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 50000 150000 250000 [PITH_FULL_IMAGE:figures/full_fig_p097_2_36.png] view at source ↗
Figure 2.37
Figure 2.37. Figure 2.37: Beta(a, a), 0 < a < 1; is symmetric, has more points near the edges; Beta(1, 0.8) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 20000 60000 Beta(1, 0.5) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 50000 150000 Beta(1, 0.2) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0e+00 2e+05 4e+05 [PITH_FULL_IMAGE:figures/full_fig_p097_2_37.png] view at source ↗
Figure 2.38
Figure 2.38. Figure 2.38: Beta(1, a), 0 < a < 1; has significant spike near the right edge; 95 [PITH_FULL_IMAGE:figures/full_fig_p097_2_38.png] view at source ↗
Figure 2.39
Figure 2.39. Figure 2.39: Beta(1, a), a > 1; has sloping near the left edge. All histograms were done on beta samples of size 1e6. Finally, let us proceed with calculating the powers of tests based on uniform order statistics: 1 res.uniform <- calc.rejections.sm.pc.bounds.with.interval.types( 2 sample.generation.function.H0 = runif, # null samples are from uniform 3 sample.generation.functions.H1 = c( # alternative samples are b… view at source ↗
Figure 2.40
Figure 2.40. Figure 2.40: Beta(a, b), a, b > 1, a ̸= b; asymmetric with more points near the center; Beta(0.3, 0.4) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 50000 150000 250000 Beta(0.3, 0.6) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0 100000 200000 300000 Beta(0.3, 0.8) x Frequency 0.0 0.2 0.4 0.6 0.8 1.0 0e+00 1e+05 2e+05 3e+05 [PITH_FULL_IMAGE:figures/full_fig_p100_2_40.png] view at source ↗
Figure 2.41
Figure 2.41. Figure 2.41: Beta(a, b), 0 < a, b < 1, a ̸= b; asymmetric with more points near the edges; 98 [PITH_FULL_IMAGE:figures/full_fig_p100_2_41.png] view at source ↗
Figure 2.42
Figure 2.42. Figure 2.42: Beta(a, b), 0 < a < 1, b > 1; asymmetric with a long slope and a spike near one of the edges. We introduce the following grid for parameters of beta distribution: 1 beta.1d.grid <- c(seq(0.1, 0.9, 0.1), 1:10) 2 beta.parameters.grid <- expand.grid(V1 = beta.1d.grid, V2 = beta.1d.grid) 3 beta.parameters.grid <- beta.parameters.grid[beta.parameters.grid$V1 <= beta.parameters.grid$V2,] The corresponding set… view at source ↗
Figure 2.43
Figure 2.43. Figure 2.43: “OS(U)” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p102_2_43.png] view at source ↗
Figure 2.44
Figure 2.44. Figure 2.44: “OS.MT(U)” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p102_2_44.png] view at source ↗
Figure 2.45
Figure 2.45. Figure 2.45: “PC1(U)” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p103_2_45.png] view at source ↗
Figure 2.46
Figure 2.46. Figure 2.46: “PC2(U)” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p103_2_46.png] view at source ↗
Figure 2.47
Figure 2.47. Figure 2.47: “OS” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p108_2_47.png] view at source ↗
Figure 2.48
Figure 2.48. Figure 2.48: “PC1” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p108_2_48.png] view at source ↗
Figure 2.49
Figure 2.49. Figure 2.49: “PC2” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p109_2_49.png] view at source ↗
Figure 2.50
Figure 2.50. Figure 2.50: “KS” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p113_2_50.png] view at source ↗
Figure 2.51
Figure 2.51. Figure 2.51: “CvM” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p113_2_51.png] view at source ↗
Figure 2.52
Figure 2.52. Figure 2.52: “AD” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p114_2_52.png] view at source ↗
Figure 2.53
Figure 2.53. Figure 2.53: ZK test powers on beta grid of alternatives (2.31). Tile plot for ZA: beta parameters grid 1 plot.powers.on.grid(beta.parameters.grid, res.unif.beta.grid.Zhang.table$Za.means) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 V1 V2 val 0.00 0.25 0.50 0.75 1.00 [PITH_FULL_IMAGE:figures/full_fig_p117_2_53.png] view at source ↗
Figure 2.54
Figure 2.54. Figure 2.54: ZA test powers on beta grid of alternatives (2.31). 115 [PITH_FULL_IMAGE:figures/full_fig_p117_2_54.png] view at source ↗
Figure 2.55
Figure 2.55. Figure 2.55: ZC test powers on beta grid of alternatives (2.31). Figures 2.53–2.55 should be compared to figures 2.43–2.46 (uniform order statistics based tests), figures 2.47– 2.49 (normal order statistics based tests), and figures 2.50–2.52 (classical tests). We see that figure 2.53 for ZK test resembles figure 2.44 for “OS.MT(U)” test [PITH_FULL_IMAGE:figures/full_fig_p118_2_55.png] view at source ↗
Figure 2.56
Figure 2.56. Figure 2.56: “ECDFV” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p122_2_56.png] view at source ↗
Figure 2.57
Figure 2.57. Figure 2.57: “ECDFV.PC1” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p122_2_57.png] view at source ↗
Figure 2.58
Figure 2.58. Figure 2.58: “ECDFV.PC2” test powers on beta grid of alternatives ( [PITH_FULL_IMAGE:figures/full_fig_p123_2_58.png] view at source ↗
Figure 2.59
Figure 2.59. Figure 2.59: OS(U)+PWD On figure 2.59 we see that “OS(U)+PWD” test looks better than “OS(U)” test for symmetric Beta(a, a), a > 1 alternatives, and for other alternatives it looks more or less the same as “OS(U)” test. This means that adding pairwise distances as additional statistics to the original uniform order statistics based test fixes its weak point. 5Which stands for “order statistics (uniform) + pairwise di… view at source ↗
Figure 2.60
Figure 2.60. Figure 2.60: OS+PWD On figure 2.60 we see that “OS+PWD” test has even better power for symmetric Beta(a, a), a > 1 alternatives compared to “OS(U)+PWD” test (figure 2.59). Everything else looks the same. For asymmetric Beta(a, b), a, b > 1, a ̸= b alternatives test’s power is less than for symmetric alternatives. We conclude that this “better for symmetric than asymmetric” behavior is a general characteristic of “un… view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: Joint distribution of KS and AD dis￾tances to normal CDF for normal samples of size n = 10. KS distance AD distance 0 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 Counts 1 1231 2461 3691 4921 6151 7381 8611 9842 11072 12302 13532 14762 15992 17222 18452 19682 [PITH_FULL_IMAGE:figures/full_fig_p133_3_1.png] view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: Histogram for log values of Q for joint distribution of KS and AD distances. KS distance AD distance 0 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 Counts 1 90 178 266 355 444 532 620 709 798 886 974 1063 1152 1240 1328 1417 [PITH_FULL_IMAGE:figures/full_fig_p134_3_3.png] view at source ↗
Figure 3.6
Figure 3.6. Figure 3.6: Zoomed-in version of figure 3.5. Finally, we check the power of principal component based bounds: 1 p.comp.H1 <- predict(statistics.H0.p.comp, statistics.H1) 2 check.hyperrectangle.bounds(p.comp.H1, res.hyperrectangle.bounds.KS.AD.pc1) 3 check.hyperrectangle.bounds(p.comp.H1, res.hyperrectangle.bounds.KS.AD.pc2) 134 [PITH_FULL_IMAGE:figures/full_fig_p136_3_6.png] view at source ↗
Figure 3.7
Figure 3.7. Figure 3.7: OS(U).kNN [PITH_FULL_IMAGE:figures/full_fig_p141_3_7.png] view at source ↗
Figure 3.8
Figure 3.8. Figure 3.8: OS(N).kNN 142 [PITH_FULL_IMAGE:figures/full_fig_p144_3_8.png] view at source ↗
Figure 3.9
Figure 3.9. Figure 3.9: 3DIST.kNN [PITH_FULL_IMAGE:figures/full_fig_p157_3_9.png] view at source ↗
read the original abstract

We propose goodness-of-fit tests based on simulated confidence sets for joint distributions of multiple sample statistics, focusing on absolutely continuous null distributions with known parameters. One class of tests uses hyperrectangular confidence sets for principal components of order statistics and related statistic vectors. Extending earlier work on horizontal and vertical confidence bands for cumulative distribution functions, these tests are compared with some classical, Zhang, and related graphical tests. Simulations show that the proposed procedures are competitive with, and often more powerful than, existing methods. We also study the geometry of principal-component-based statistics; under a normal null distribution, the first principal component corresponds to the sample mean, while the second is related to a linear analogue of variance. A second class of tests uses confidence sets of arbitrary shape constructed through highest density regions. Unlike earlier kernel-density-based approaches, we use a k-nearest-neighbor method for detecting highest density regions, which is better suited to higher-dimensional statistic vectors. We study tests based on order statistics, empirical distribution function values, moments, and combinations of classical goodness-of-fit statistics. The resulting procedures are powerful against a wide range of alternatives. We also outline a two-sample extension via permutation tests based on joint distributions of several statistics and compare moment-based versions with energy-distance permutation tests. Finally, we discuss transformations other than the probability integral transform, showing that mapping data to another target distribution, such as the standard normal, can be advantageous when powerful tests are available for that distribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes goodness-of-fit tests for absolutely continuous null distributions with known parameters that rely on simulated confidence sets for the joint distribution of multiple sample statistics (order statistics, EDF values, moments, and classical GOF statistics). One class constructs hyperrectangular sets from principal components of these vectors; a second uses k-nearest-neighbor highest-density regions. The paper includes geometric analysis under normality (PC1 corresponds to the mean, PC2 to a linear variance analogue), power comparisons via simulation against classical, Zhang, and graphical tests, a permutation-based two-sample extension, and discussion of non-PIT transformations.

Significance. If the simulation results hold, the framework offers a flexible way to combine information across multiple statistics via joint density estimation, potentially yielding higher power against diverse alternatives than single-statistic or graphical methods. Explicit scoping to known-parameter continuous nulls enables direct Monte Carlo simulation of the joint distributions, which is a methodological strength. The k-NN approach for high-dimensional HDRs and the permutation two-sample procedure are practical extensions. The geometric interpretation under normality provides insight into what the PC-based tests are actually detecting.

major comments (1)
  1. The central empirical claim (Abstract) that the proposed procedures are 'competitive with, and often more powerful than, existing methods' rests entirely on simulations whose design (sample sizes, alternatives, number of Monte Carlo replications, handling of multiple statistics, and exact power metrics) is not summarized. Without these details it is impossible to assess whether the reported advantages are robust or sensitive to post-hoc selection of statistics or alternatives.
minor comments (3)
  1. The abstract mentions 'combinations of classical goodness-of-fit statistics' but does not list which classical statistics are included; a brief enumeration would clarify the scope of the second class of tests.
  2. The geometry discussion for the normal case (first PC = sample mean, second related to linear variance analogue) is stated without reference to the precise definition of the statistic vector or the PCA implementation; adding the relevant equation or definition would make the claim self-contained.
  3. The two-sample permutation extension is outlined only briefly; a short statement of how the joint distribution is estimated under the permutation null would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript's contributions and for the recommendation of minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: The central empirical claim (Abstract) that the proposed procedures are 'competitive with, and often more powerful than, existing methods' rests entirely on simulations whose design (sample sizes, alternatives, number of Monte Carlo replications, handling of multiple statistics, and exact power metrics) is not summarized. Without these details it is impossible to assess whether the reported advantages are robust or sensitive to post-hoc selection of statistics or alternatives.

    Authors: We agree that the abstract would benefit from a concise summary of the simulation design to allow readers to evaluate the scope and robustness of the empirical results. In the revised manuscript we will expand the abstract to include the following details: sample sizes n=10,20,50,100; alternatives consisting of location, scale, and shape shifts under normal nulls together with t, chi-squared, uniform, and beta distributions; 10,000 Monte Carlo replications for power estimation (with 5,000 used for critical-value calibration); and handling of multiple statistics via pre-specified fixed combinations (order statistics, EDF ordinates, moments, and classical GOF statistics) chosen on the basis of the geometric analysis in Section 3 rather than data-driven selection. The power metric is the empirical rejection rate at nominal level 0.05. These additions will make the design transparent while preserving the abstract's brevity; the full experimental protocol remains in Section 4. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes Monte Carlo-based goodness-of-fit procedures that simulate joint distributions of chosen statistics (order statistics, EDF values, moments) directly from the known absolutely continuous null. Power claims rest on external simulation comparisons to named existing tests rather than any internal derivation that reduces performance metrics to quantities defined by the method itself. No equations equate a 'prediction' to a fitted input, no self-citation chain supplies a uniqueness theorem or ansatz, and the geometry remarks follow from standard PCA properties. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method depends on simulation of joint distributions under the null, which requires choices of which statistics to combine and parameters for density-region detection; the absolute continuity and known-parameter assumptions are stated explicitly.

free parameters (2)
  • choice and number of sample statistics
    The vector of statistics whose joint distribution is used (order statistics, moments, EDF values, etc.) must be selected.
  • k for nearest-neighbor density estimation
    Parameter controlling highest-density-region construction in the second test class.
axioms (1)
  • domain assumption Null distribution is absolutely continuous with known parameters
    Explicitly stated as the focus of the proposed tests.

pith-pipeline@v0.9.1-grok · 5788 in / 1233 out tokens · 38845 ms · 2026-07-03T07:33:40.342227+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    Adler and Jonathan E

    Robert J. Adler and Jonathan E. Taylor. Random Fields and Geometry . Springer, 2007

  2. [2]

    Monte Carlo implementation of a guiding-center Fokker-Planck kinetic equation

    Sivan Aldor-Noiman, Lawrence D. Brown, Andreas Buja, Wolfgang Rolke, and Robert A. Stine. The power to see: A new graphical test of normality. The American Statistician , 67(4):249–260, 2013. doi: 10.1080/00031305.2013.847865

  3. [3]

    T. W. Anderson and D. A. Darling. Asymptotic theory of certain ”goodness-of-fit” criteria based on stochastic processes. Annals of Mathematical Statistics , 23(2):193–212, 1952. doi:10.1214/aoms/ 1177729437

  4. [4]

    T. W. Anderson and D. A. Darling. A test of goodness of fit. Journal of the American Statistical Association, 49(268):765–769, 1954. doi:10.1080/01621459.1954.10501232

  5. [5]

    Testing density forecasts, with applications to risk management

    Jeremy Berkowitz. Testing density forecasts, with applications to risk management. Journal of Business & Economic Statistics , 19(4):465–474, 2001. doi:10.1198/07350010152596718

  6. [6]

    FNN: Fast Nearest Neighbor Search Algorithms and Applications , 2024

    Alina Beygelzimer, Sham Kakadet, John Langford, Sunil Arya, David Mount, and Shengqiao Li. FNN: Fast Nearest Neighbor Search Algorithms and Applications , 2024. R package version 1.1.4.1. URL: https: //CRAN.R-project.org/package=FNN

  7. [7]

    Calibration for simultaneity: (re)sampling methods for simultaneous inference with applications to function estimation and functional data

    Andreas Buja and Wolfgang Rolke. Calibration for simultaneity: (re)sampling methods for simultaneous inference with applications to function estimation and functional data. Working paper, 2006. URL: http://stat.wharton.upenn.edu/~buja/PAPERS/paper-sim.pdf

  8. [8]

    hexbin: Hexagonal Binning Routines, 2024

    Dan Carr, Nicholas Lewin-Koh, Martin Maechler, and Deepayan Sarkar. hexbin: Hexagonal Binning Routines, 2024. R package version 1.28.5. URL: https://CRAN.R-project.org/package=hexbin

  9. [9]

    Covington and Jeffrey W

    Christian T. Covington and Jeffrey W. Miller. A powerful goodness-of-fit test using the probability integral transform of order statistics, 2025. URL: https://arxiv.org/abs/2510.22854, arXiv:2510.22854

  10. [10]

    On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74,

    Harald Cram´ er. On the composition of elementary errors.Scandinavian Actuarial Journal, 1928(1):13–74,

  11. [11]

    doi:10.1080/03461238.1928.10416862

  12. [12]

    H. A. David and H. N. Nagaraja. Order Statistics. Wiley, 2003

  13. [13]

    Karhunen-lo` eve expansions of mean-centered wiener processes.IMS Lecture Notes, 2006

    Paul Deheuvels. Karhunen-lo` eve expansions of mean-centered wiener processes.IMS Lecture Notes, 2006

  14. [14]

    Alternative approaches for estimating highest-density regions

    Nina Deliu and Brunero Liseo. Alternative approaches for estimating highest-density regions. Inter- national Statistical Review , 94(1):97–120, 2026. URL: https://onlinelibrary.wiley.com/doi/abs/10. 1111/insr.12592, arXiv:2401.00245, doi:10.1111/insr.12592

  15. [15]

    Omnibus goodness-of-fit tests for univariate continuous distri- butions based on trigonometric moments

    Alain Desgagn´ e and Fr´ ed´ eric Ouimet. Omnibus goodness-of-fit tests for univariate continuous distri- butions based on trigonometric moments. Statistica Neerlandica, 80(2):e70025, 2026. URL: https:// onlinelibrary.wiley.com/doi/abs/10.1111/stan.70025, arXiv:https://onlinelibrary.wiley.com/ doi/pdf/10.1111/stan.70025, doi:10.1111/stan.70025

  16. [16]

    Rcpp: Seamless r and c++ integration

    Dirk Eddelbuettel and Romain Francois. Rcpp: Seamless r and c++ integration. Journal of Statistical Software, 40(8):1–18, 2011. doi:10.18637/jss.v040.i08

  17. [17]

    Rcpp: Seamless R and C++ Integration , 2026

    Dirk Eddelbuettel, Romain Francois, et al. Rcpp: Seamless R and C++ Integration , 2026. R package version 1.1.1. URL: https://CRAN.R-project.org/package=Rcpp

  18. [18]

    Rob J. Hyndman. Computing and graphing highest density regions. The American Statistician, 50(2):120– 126, 1996. doi:10.1080/00031305.1996.10474359

  19. [19]

    Programming languages — c++, 2020

    International Organization for Standardization. Programming languages — c++, 2020

  20. [20]

    ¨Uber lineare methoden in der wahrscheinlichkeitsrechnung

    Kari Karhunen. ¨Uber lineare methoden in der wahrscheinlichkeitsrechnung. 1947. In German

  21. [21]

    King, Xibin Zhang, and Muhammad Akram

    Maxwell L. King, Xibin Zhang, and Muhammad Akram. Hypothesis testing based on a vector of statis- tics. Journal of Econometrics , 219(2):425–455, 2020. Annals Issue: Econometric Estimation and Test- ing: Essays in Honour of Maxwell King. URL: https://www.sciencedirect.com/science/article/pii/ S0304407620301056, doi:10.1016/j.jeconom.2020.03.010. 164

  22. [22]

    A. N. Kolmogorov. On the empirical determination of a distribution law. Giornale dell’Istituto Italiano degli Attuari, 4:83–91, 1933. Originally published in Italian as ”Sulla determinazione empirica di una legge di distribuzione”

  23. [23]

    Fonctions al´ eatoires de second ordre

    Michel Lo` eve. Fonctions al´ eatoires de second ordre. 1948. In French

  24. [24]

    Springer, 1977

    Michel Lo` eve.Probability Theory I . Springer, 1977

  25. [25]

    Marhuenda, D

    Y. Marhuenda, D. Morales, and M. C. Pardo. A comparison of uniformity tests. Statistics, 39(4):315–327,

  26. [26]

    doi:10.1080/02331880500178562

  27. [27]

    2022.07.003

    David Novoa-Paradela, Oscar Fontenla-Romero, and Bertha Guijarro-Berdi˜ nas. A one-class classification method based on expanded non-convex hulls. Information Fusion, 89:1–15, 2023. doi:10.1016/j.inffus. 2022.07.023

  28. [28]

    Chatgpt (gpt-5.5)

    OpenAI. Chatgpt (gpt-5.5). https://chatgpt.com/, 2026. Large language model accessed May 15, 2026

  29. [29]

    ggrastr: Rasterize Layers for ’ggplot2’ ,

    Viktor Petukhov, Teun van den Brand, and Evan Biederstedt. ggrastr: Rasterize Layers for ’ggplot2’ ,

  30. [30]

    URL: https://CRAN.R-project.org/package=ggrastr

    R package version 1.0.2. URL: https://CRAN.R-project.org/package=ggrastr

  31. [31]

    R: A Language and Environment for Statistical Computing

    R Core Team. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria, 2026. URL: https://www.R-project.org/

  32. [32]

    Ramsay and Bernard W

    James O. Ramsay and Bernard W. Silverman. Functional Data Analysis . Springer, 2005

  33. [33]

    Supplemental studies for simultaneous goodness-of-fit testing, 2020

    Wolfgang Rolke. Supplemental studies for simultaneous goodness-of-fit testing, 2020. URL: https:// arxiv.org/abs/2007.04727, arXiv:2007.04727

  34. [34]

    Simulation studies for goodness-of-fit and two-sample methods for univariate data

    Wolfgang Rolke. Simulation studies for goodness-of-fit and two-sample methods for univariate data. arXiv preprint arXiv:2411.05839, November 2024. URL: https://arxiv.org/abs/2411.05839

  35. [35]

    Rom˜ ao, R

    X. Rom˜ ao, R. Delgado, and A. Costa. An empirical power comparison of univariate goodness-of-fit tests for normality. Journal of Statistical Computation and Simulation , 80(5):545–591, 2010. doi:10.1080/ 00949650902740824

  36. [36]

    Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison

    Teemu S¨ ailynoja, Paul-Christian B¨ urkner, and Aki Vehtari. Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison. Statistics and Computing , 32(2):1–21, 2022. doi:10.1007/s11222-022-10090-6

  37. [37]

    Lattice: Multivariate Data Visualization with R

    Deepayan Sarkar. Lattice: Multivariate Data Visualization with R . Springer, New York, 2008. URL: http://lmdvr.r-forge.r-project.org

  38. [38]

    GGally: Extension to ’ggplot2’ , 2024

    Barret Schloerke, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg, and Jason Crowley. GGally: Extension to ’ggplot2’ , 2024. R package version 2.2.1. URL: https: //CRAN.R-project.org/package=GGally

  39. [39]

    Shorack and Jon A

    Galen R. Shorack and Jon A. Wellner. Empirical Processes with Applications to Statistics . SIAM, 1986

  40. [40]

    Shorack and Jon A

    Galen R. Shorack and Jon A. Wellner. Empirical Processes with Applications to Statistics . Wiley Series in Probability and Mathematical Statistics. Wiley, New York, 1986

  41. [41]

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B

    Michael A. Stephens. Edf statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347):730–737, 1974. doi:10.1080/01621459.1974.10480196

  42. [42]

    W. N. Venables and B. D. Ripley. Modern Applied Statistics with S . Springer, New York, fourth edition,

  43. [43]

    URL: https://www.stats.ox.ac.uk/pub/MASS4/

    ISBN 0-387-95457-0. URL: https://www.stats.ox.ac.uk/pub/MASS4/

  44. [44]

    Wahrscheinlichkeitsrechnung und ihre Anwendung auf die Statistik und theoretische Physik

    Richard von Mises. Wahrscheinlichkeitsrechnung und ihre Anwendung auf die Statistik und theoretische Physik. F. Deuticke, Leipzig, 1931. In German; English translation published as Probability, Statistics and Truth (1939, Macmillan)

  45. [45]

    Probability, Statistics and Truth

    Richard von Mises. Probability, Statistics and Truth . Macmillan, New York, 1939. English translation of Wahrscheinlichkeitsrechnung und ihre Anwendung auf die Statistik und theoretische Physik (1931)

  46. [46]

    Application of equal local levels to improve q-q plot testing bands with r package qqconf

    Eric Weine, Mary Sara McPeek, and Mark Abney. Application of equal local levels to improve q-q plot testing bands with r package qqconf. Journal of Statistical Software , 106(10):1–33, 2023. URL: https://www.jstatsoft.org/article/view/v106i10, doi:10.18637/jss.v106.i10

  47. [47]

    ggplot2: Elegant Graphics for Data Analysis

    Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag New York, 2016. URL: https://ggplot2.tidyverse.org

  48. [48]

    Powerful Goodness-of-Fit Tests and Multi-Sample Tests

    Jin Zhang. Powerful Goodness-of-Fit Tests and Multi-Sample Tests . PhD thesis, York University, Toronto, Canada, 2001

  49. [49]

    Powerful goodness-of-fit tests based on the likelihood ratio

    Jin Zhang. Powerful goodness-of-fit tests based on the likelihood ratio. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 64(2):281–294, 2002. doi:10.1111/1467-9868.00337. 165