stat — Pith

Top Pith

1

cs.IT 2026-06-26

Multi-distribution functionals reduce to integrals of coincidence divergences

by Akshay Balsubramani

Monotonicity under data processing and additivity on independent products force every such functional to an integral over four strata

abstract click to expand

Comparing two probability distributions is a basic building block of statistics and machine learning, and the right family is well understood: the R\'enyi divergences of order $\alpha\in[0,\infty]$ are the unique family monotone under data processing and additive on independent products. Many problems instead compare more than two distributions at once -- multi-population fairness, multi-prior PAC-Bayes bounds, multi-hypothesis testing -- and the right multi-distribution generalization of the R\'enyi family has been an open question. We characterize it. Every functional of $W$-tuples of distributions that is monotone under data processing and additive on independent products is a positive integral of multi-way coincidence divergences $C_{\alpha}(\pi_1,\dots,\pi_W) := -\log\int \pi_1^{\alpha_1}\cdots\pi_W^{\alpha_W}$ (with $\sum_k \alpha_k = 1$) over a parameter space with four strata: the simplex interior; mixed-sign exponent cones (the analogue of R\'enyi orders $>1$); a tropical boundary at infinity carrying max-divergences; and pairwise Kullback-Leibler edges at the simplex vertices. Each stratum is necessary -- the destination of an explicit data-processing-monotone, product-additive divergence the others cannot reproduce -- and each is a clean limit of simplex-interior atoms. The same family arises from several independent routes -- the structural axioms, Kolmogorov-Nagumo means with R\'enyi's entropy axiomatics, classical entropy characterizations, multi-hypothesis testing error exponents, and a multi-lottery betting interpretation -- structural evidence that this is the canonical multi-distribution R\'enyi calculus rather than an artefact of any one axiomatic input. The two-prior case recovers the standard R\'enyi result; a worked $W=3$ instance, numerical verification, and a conditional extension round out the treatment.

1 0

Top Pith

5

cs.LG 2026-05-22 2 theorems

Stronger backdoor triggers can raise clean accuracy in high dimensions

by Donald Flynn, Hadas Yaron Goldhirsh +2 more

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

Proportional-regime analysis shows attack success peaks then falls while clean performance improves with training trigger strength.

abstract click to expand

Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to \kappa$), varying the training trigger strength $\alpha$ against a fixed test trigger. Three phenomena emerge: (i) clean test accuracy increases with $\alpha$; (ii) attack success peaks at a finite $\alpha$ and then declines; and (iii) the most damaging trigger direction is the minimum eigenvector of the data covariance. We prove all three results in closed form for the squared loss, and extend (i) and (ii) to general convex GLM losses via a Gaussian-proxy fixed-point system. We identify a finite-sample noise floor proportional to $\kappa$ as the mechanism behind (i), invisible to classical $n \gg p$ analysis. Experiments on CIFAR-10 and Gaussian surrogates match the theory closely; ResNet-18 experiments show the same phenomena beyond the convex setting.

0

Top Pith

5

stat.ML 2026-05-20 2 theorems

Contradiction graph decides VC dimension threshold for any m

by Jesse Campbell, Daniel Ibaibarriaga +1 more

Contradiction Graphs Determine VC Dimension

Vertices are realizable label sequences of length m; edges mark label disagreements on shared points, fixing whether dimension meets or tops

abstract click to expand

We study the contradiction graphs associated with binary concept classes. For a class $H \subseteq \{0,1\}^X$, the order-$m$ contradiction graph $G_m(H)$ has as vertices the $H$-realizable labeled sequences of length $m$, with two vertices adjacent when the two sequences assign opposite labels to some common domain point. Our main result is that the single graph $G_m(H)$ determines the threshold predicate $\mathrm{VCdim}(H)\ge m$. Consequently, the full sequence $(G_m(H))_{m \ge 1}$ determines the exact VC dimension and, in particular, detects finite versus infinite VC dimension, answering a question posed by Alon et al. (2024).

0

cs.AI 2026-07-03

Simple threshold monitor matches advanced LLM safety checks

by Mona Schirmer, Metod Jazbec +4 more

Online Safety Monitoring for LLMs

Risk-calibrated thresholding on external verifier signals performs competitively on reasoning and red teaming tasks.

abstract click to expand

Despite alignment training, LLMs remain prone to generating unsafe outputs at deployment time. Monitoring outputs online and raising an alarm when safety can no longer be assumed is therefore critical. We study a simple real-time monitor that turns a verifier signal from an external model into an alarm decision by thresholding, with the threshold calibrated via risk control. In experiments on mathematical reasoning and red teaming datasets, we show that this simple design is competitive with more advanced monitors based on sequential hypothesis testing.

0

stat.AP 2026-07-03

CESM low-energy matches contrast at 0.874 patient AUC

by Sara Antonijevic, Brani Vidakovic

Masked complex non-decimated wavelet features for patient-level classification of contrast-enhanced mammography

Leakage-free wavelet classifier shows the two image types are equivalent yet use separate phase and magnitude channels.

abstract click to expand

Contrast-enhanced spectral mammography (CESM) acquires two images of each breast, a low-energy image and a recombined contrast image, but two questions central to building a classifier on them remain unsettled: whether the two image types carry comparable malignancy signal, and how a patient's several images should be combined into a single decision. Both are hard to answer reliably, because most published CESM classifiers split cross-validation folds at the image level, letting images of the same patient fall in both training and test sets and inflating reported performance. We pair a masked complex non-decimated wavelet feature bank with an elastic-net logistic classifier, evaluated under repeated patient-grouped nested cross-validation with patient-cluster bootstrap inference on the CDD-CESM dataset (1,880 images, 308 patients); under this leakage-free evaluation the inflation from testing on previously seen patients is negligible. On normal-versus-malignant detection, the two acquisitions are statistically indistinguishable in patient-level AUC under the proposed evaluation framework. Under single-image fusion the contrast image reaches a patient-level AUC of 0.874 (95% CI 0.827-0.918) and the low-energy image is statistically indistinguishable from it, yet the two encode malignancy through disjoint, interpretable channels: phase coherence on the low-energy image and magnitude distribution on the contrast image. The framework matches a pretrained ResNet-50 representation at the patient level, but whereas the frozen deep representation is not directly interpretable at the level of individual predictors, every predictor in the wavelet representation carries an explicit physical meaning. The result is a transparent, leakage-free baseline against which future CESM classifiers can be measured.

0

stat.ME 2026-07-03

The paper develops design-based inference for experiments that randomly assign…

by Jiawei Fu, Cyrus Samii +1 more

Inference for Group Interaction Experiments

In a sparse-sampling regime, standard cluster methods account for dependencies from interference and group formation.

abstract click to expand

A common experimental research design is one in which individuals are randomly allocated into groups that then interact under different group-level treatment conditions. We develop design-based inference for such "group interaction" experiments, covering scenarios in which groups are either fixed or randomly formed and in which potential outcomes are either fixed relative to others' group assignments or subject to interference. For each scenario, we characterize the causal estimand that the design targets and the inferential strategy appropriate to it. Working in a sparse-sampling asymptotic regime, we show that cluster-robust inference remains consistent and accounts for dependencies from various sources when interference is present, delivering valid inference on marginalized exposure effects. When interference is absent and groups are formed randomly, the design reduces to an individually randomized experiment, and individual-level heteroskedasticity-robust inference suffices for the average treatment effect. Our results on the asymptotic distribution of commonly used estimators rely on a novel coupling strategy that may be useful for design-based inference in other complex experiments.

0

stat.ML 2026-07-03

LLM personas split into frame-stable aggregates and frame-sensitive geometry

by Yuan Yuan

The Dual Nature of LLM Persona: Aggregated Tendencies and Frame-Dependent Geometry

Aggregate trait scores resist frame changes while correlation structure drops 42% on mismatch and recovers with alignment.

abstract click to expand

Evaluations of LLM personas via psychometric questionnaires typically rely on aggregate scores, discarding within-instance correlation structure. We test whether this geometric structure is intrinsic or frame-dependent. Constructing within-instance correlation matrices from IPIP-50 responses, we analyze geometry on SPD manifolds under manipulated question orderings in GPT-4o simulating American and Chinese-American personas. We find that persona expression comprises two dissociable components: aggregated features (Big Five scores) degrade under randomization (21% drop) but are frame-robust; geometric features (SPD manifold) collapse under frame misalignment (42% drop) but recover substantially (to 84%) under shared frames, surpassing aggregated features (76%). This collapse-recovery pattern reveals that persona geometry is not intrinsic but a frame-dependent coordination pattern encoding information invisible to aggregation. Our findings establish a dual-nature framework for LLM personas, frame-dependent geometry versus frame-robust aggregates, necessitating frame-aware evaluation and challenging static trait conceptions.

0

stat.ME 2026-07-03

Bayesian and quasi-Bayesian estimates merge for Poisson decisions

by Stefano Favaro, Sandra Fortini

Merging of Bayes and quasi-Bayes empirical Bayes procedures for Poisson compound decisions

Concentration rates of marginal PMFs produce matching regret decay, so the faster quasi-Bayesian method performs equivalently in the multidi

abstract click to expand

The Poisson compound decision problem is a long-standing problem in statistics, in which empirical Bayes methods are used to estimate Poisson means under a mixture model. We study this problem from the viewpoint of $g$-modeling, comparing two nonparametric strategies for estimating the unknown mixing distribution: a Bayesian empirical Bayes strategy, based on the Dirichlet process posterior, and a quasi-Bayesian empirical Bayes strategy, based on Newton's algorithm. The latter is computationally attractive, but its relationship with the Bayesian strategy requires theoretical justification. Under a Poisson mixture model with a ``true'', or oracle, mixing distribution, we establish concentration rates for the marginal probability mass functions induced by the Bayesian and quasi-Bayesian estimates. These rates are then translated into rates of decay for the corresponding regrets, interpreted as excess Bayes risks, and used to prove a frequentist merging result between the Bayesian and quasi-Bayesian empirical Bayes strategies. We also extend the analysis to the multidimensional Poisson compound decision problem. Numerical experiments on synthetic data illustrate that the quasi-Bayesian strategy achieves accuracy comparable to the Bayesian strategy, while requiring substantially fewer computational resources, especially in the multidimensional setting.

0

stat.ME 2026-07-03

CAP matches empirical risk at first order and removes second-order bias

by Yijian Huang

Cross-Audit Projection for Model Risk Prediction

Resampling audit plus asymptotic projection corrects over-optimism in binary classification risk estimates without sacrificing leading accur

abstract click to expand

For training-data-based model risk prediction, $K$-fold cross-validation~(CV) is widely used to mitigate the well-known over-optimism of the empirical risk and is often regarded as reliable. However, for binary classification via empirical risk minimization, our numerical studies reveal a surprising phenomenon: $K$-fold CV may perform poorly in estimating class-specific risks, even worse than the empirical estimator. We perform a higher-order asymptotic analysis showing that $K$-fold CV may converge at a slower rate, whereas the empirical estimator exhibits a second-order asymptotic bias that explains its over-optimism. These findings motivate a novel two-step procedure for model risk prediction, termed cross-audit projection (CAP). The cross-audit step adopts the same resampling scheme as $K$-fold CV to estimate over-optimism in subsamples, while the asymptotic-theory-informed projection step adjusts for the reduced sample size in bias correction of the empirical risk. The resulting CAP estimator is first-order asymptotically equivalent to the empirical risk while achieving second-order asymptotic unbiasedness. An accompanying inference procedure is also developed. Simulation studies support theoretical advantages of CAP and demonstrate favorable finite-sample performance. An application to breast cancer detection further illustrates the proposed method.

0

stat.AP 2026-07-03

iDiD estimator lets time-constant direct effects count as valid instruments

by Tran Trong Khoi Le, Emilie Sbidian +1 more

Instrumented difference-in-differences under case-control sampling

After modeling retrospective sampling bias, instruments whose direct outcome effect does not change over time identify trend effects in case

abstract click to expand

Case-control designs are fundamental in epidemiology for the efficient study of rare outcomes. Although instrumental variable (IV) methods have been extended to this setting to address unmeasured confounding, they typically rely on the exclusion restriction assumption, which may be violated when the IV candidates directly affect the outcome through pathways independent of the exposure. In this paper, we propose a novel instrumented difference-in-differences (iDiD) approach tailored to case-control designs. Grounded in structural mean modeling, the proposed method accommodates IV candidates that have time-invariant direct effect on the outcome. When retrospective case-control datasets are collected, the candidate can still be used as a valid instrument on the trend scale when selection bias induced by retrospective sampling is efficiently taken into account. We assess finite-sample performance of this method through extensive simulations, then apply it to evaluate the risk of serious infection of biologic treatments for psoriasis, using French national claim database.

0

stat.AP 2026-07-03

EVPI extensions show global validation of ADNEX model is complete

by Laure Wynants, Kim Zhipei Wang +6 more

Value-of-Information Analysis for External Validation of Risk Prediction Models in Multicenter Studies and Systematic Reviews

Accounting for center differences reveals when local adoption decisions still need more data to confirm net benefit.

abstract click to expand

External validation studies have finite sample sizes, creating uncertainty about whether a prediction model's Net Benefit (NB) exceeds default strategies' NB. The expected value of perfect information (EVPI) quantifies consequences of uncertainty. Current EVPI methods focus on single studies, ignoring between-center heterogeneity. We extend EVPI and expected value of partial perfect information (EVPPI) to account for between-cluster heterogeneity in multicenter studies and meta-analyses. We distinguish between the global and local optimal strategy and between observed and unobserved clusters. We define EVPIglobal, EVPIcluster_j, EVPIcluster, and EVPPIcluster,prevalence, implemented in the MetaNB R package, and illustrate them using a systematic review across 36 centers of the ADNEX model for ovarian cancer diagnosis. Assuming one global decision regarding ADNEX adoption, there is no need for further data to confirm ADNEX is superior overall (EVPIglobal 0). Meta-analysis borrows information across observed clusters, resulting in consistent local superiority of ADNEX and nonzero but typically lower EVPIcluster_j than when considering local data alone. There is 0.03 probability default strategies are superior in unobserved centers. Eliminating uncertainty on performance and prevalence in each (EVPIcluster) would gain 1134 net avoided false positives (FP) per year, assuming 350000 tumors annually with 20% malignancies. Determining only local prevalence with certainty (EVPPIcluster, prevalence) would gain net 158 avoided FP per year. EVPI extensions disentangle sources of uncertainty and quantify the need for further validation to determine the global or locally optimal strategy. Considering uncertainty and heterogeneity in clinical utility across clusters is essential to decide whether additional validation studies are warranted.

0

stat.ME 2026-07-03

Multipliers create nondegenerate tests for Fréchet regression

by Leheng Cai, Xu Guo +1 more

MATCH: Multiplier-Assisted Tests for Conditional Hypotheses in Non-Euclidean Data

Sample splitting and random multipliers on held-out losses produce Gaussian limits without tangent coordinates or residual terms.

abstract click to expand

We propose a new procedure MATCH (Multiplier-Assisted Tests for Conditional Hypotheses) to test whether the non-Euclidean data match the target model, which is a general framework for significance and specification testing in Fr\'echet regression. MATCH covers global significance, partial significance, and the adequacy of global Fr\'echet regression, providing a unified way to compare unrestricted conditional Fr\'echet means with restricted alternatives. One of the key challenges is that the ordinary held-out loss difference is first-order degenerate under the null: the oracle losses coincide, and plug-in statistics is dominated by nuisance estimation error. MATCH uses sample splitting and independent random multipliers on held-out losses to create a nondegenerate Gaussian leading term without residuals or tangent-space coordinates. To improve data use and stability, we further develop cross-fitted tests and repeated cross-fitting with p-value merging. We establish asymptotic null validity, consistency under fixed alternatives, and local power guarantees. Simulations for distributional, symmetric positive-definite (SPD) matrix-valued, and spherical responses support the theoretical findings, and applications to county-level household income distributions and North Atlantic tropical-cyclone locations demonstrate the practical use of the proposed tests.

0

stat.ME 2026-07-03

Joint distributions of sample statistics power new GOF tests

by Roman Guchenko

Goodness of Fit Tests Based on Joint Densities of Multiple Sample Statistics

Simulations show the procedures match or exceed classical and Zhang methods for continuous nulls with known parameters.

abstract click to expand

We propose goodness-of-fit tests based on simulated confidence sets for joint distributions of multiple sample statistics, focusing on absolutely continuous null distributions with known parameters. One class of tests uses hyperrectangular confidence sets for principal components of order statistics and related statistic vectors. Extending earlier work on horizontal and vertical confidence bands for cumulative distribution functions, these tests are compared with some classical, Zhang, and related graphical tests. Simulations show that the proposed procedures are competitive with, and often more powerful than, existing methods. We also study the geometry of principal-component-based statistics; under a normal null distribution, the first principal component corresponds to the sample mean, while the second is related to a linear analogue of variance. A second class of tests uses confidence sets of arbitrary shape constructed through highest density regions. Unlike earlier kernel-density-based approaches, we use a k-nearest-neighbor method for detecting highest density regions, which is better suited to higher-dimensional statistic vectors. We study tests based on order statistics, empirical distribution function values, moments, and combinations of classical goodness-of-fit statistics. The resulting procedures are powerful against a wide range of alternatives. We also outline a two-sample extension via permutation tests based on joint distributions of several statistics and compare moment-based versions with energy-distance permutation tests. Finally, we discuss transformations other than the probability integral transform, showing that mapping data to another target distribution, such as the standard normal, can be advantageous when powerful tests are available for that distribution.

0

math.ST 2026-07-03

AEW achieves T log(M)/(n+1) excess risk in expectation

by Mikael M{o}ller H{o}gsgaard, Patrick Rebeschini +1 more

Aggregation with Exponential Weights is Optimal in Expectation

The bound holds for large constant temperatures on bounded Lipschitz strongly convex losses without Bernstein assumptions

abstract click to expand

The aggregation with exponential weights (AEW) estimator is not fully understood in the basic setting of model selection aggregation with squared loss. In particular, whether it is minimax-rate optimal in expectation for large enough fixed temperatures and under random design has been an open problem since its introduction, which was explicitly posed by Lecu\'{e} and Mendelson (2013). In this paper, we settle this problem by showing that \emph{without} requiring a Bernstein-type assumption, the AEW indeed achieves the excess risk $T \log (M) / (n+1)$ in expectation, whenever the temperature $T$ satisfies $(L^2/T)\exp(B/T)\leq \mu /2$. Here, the number of dictionary elements is $M$, the estimator has observed $n$ i.i.d. samples from any distribution, and the loss is assumed to be bounded by $B$, $L$-Lipschitz continuous and $\mu$-strongly convex. For squared loss, we show that $T\geq 4 b^2$ suffices when the predictions and labels are $[0,b]$-valued. Because AEW is known to be suboptimal in expectation for temperatures below some constant, this shows that AEW has a sharp phase transition when the temperature is large enough but constant, as conjectured by Lecu\'{e} and Mendelson.

0

stat.ML 2026-07-03

Additive MLP-GNN separates chemical and structural solubility drivers

by Sampreeti Bhattacharya, Arkaprava Roy

An Additive MLP-GNN Framework for Characterizing Chemical and Structural Contributions to Aqueous Solubility

MLP and GNN branches stay separate until the final step, enabling direct inspection of each contribution after pretraining on larger data.

abstract click to expand

Aqueous solubility is a key property in early-stage drug discovery, but most predictive models merge physicochemical descriptors and molecular graph information into a single representation, obscuring whether a prediction is driven by global chemistry, molecular structure, or both. We present an additive deep-learning framework that keeps these two sources of information separate throughout training: physicochemical descriptors are encoded by a multilayer perceptron (the chemical branch) and molecular graph topology by a graph neural network (the structural branch), with the two outputs combined only at the prediction stage through an additive model with an optional multiplicative interaction. This design provides a direct decomposition of chemical and structural components that can be examined separately after training. Furthermore, pretraining on the larger AqSolDB dataset and fine-tuning on the smaller BigSolDB2 dataset substantially improve accuracy and reduce run-to-run variations, indicating generalizability of the learned features from the data-rich settings. We further interpret the fitted model using best linear projections of the branch outputs, molecule-level embedding summaries across solubility classes, and atom-level GNNExplainer masks aggregated over functional groups. These analyses show that the chemical branch aligns with familiar physicochemical descriptors, while the structural branch captures graph-topological and functional-group patterns associated with solubility. Across both datasets, the framework attains competitive predictive performance while making the distinct roles of chemical and structural information more transparent.

0

stat.ML 2026-07-03

Policy-coupled coverage optimizes counterfactual prediction sets

by Yurui Zheng, Ying Jin

Prediction Sets for Counterfactual Decisions: Coverage, Optimality, and Conformal Prediction

Equivalence to risk-averse optimization produces explicit optimal sets and a conformal method with finite-sample coverage guarantees.

abstract click to expand

Predictions are increasingly used to guide high-stakes decisions, from treatment selection to policy making. To ensure reliability with imperfect predictions, uncertainty quantification methods such as conformal prediction build prediction sets with coverage guarantees. However, statistical validity alone does not immediately determine the decisions to take, nor the optimality thereof. This gap is especially delicate in counterfactual settings where the outcome that materializes depends on the action taken, so uncertainty cannot be specified independently of the decision rule. We develop a decision-theoretic framework for uncertainty-informed counterfactual decisions. We identify a novel notion of \emph{policy-coupled coverage} -- namely, coverage of the realized outcome under the action induced by the prediction sets themselves -- as the optimal and lossless interface between uncertainty and action. It plays three roles. First, it justifies acting via a natural max-min rule as minimax-optimal under distributional ambiguity. Second, optimizing prediction sets under policy-coupled coverage is equivalent both to a stronger universal-coverage formulation and to the direct risk-averse optimization over policies and utility certificates; this equivalence yields the explicit form of the population-optimal prediction sets. Third, it admits a two-stage procedure, Policy-Coupled Risk-Averse Conformal Prediction (PC-RACP), that approximates these optimal sets with rigorous finite-sample coverage. Simulations and a real email-marketing experiment confirm that PC-RACP delivers higher utility than existing approaches while maintaining valid coverage, and that ignoring the counterfactual structure of the decision problem is suboptimal for both validity and utility.

0

stat.ME 2026-07-03

Weighted tilt restores coverage for censored label shift

by Seungjin Choi

Conformal Bayes for Two-Sided Censored Gaussian Regression under Label Shift

Mixed atom-density calibration weights yield smaller valid sets than source-score methods in two-sided censored Gaussian regression.

abstract click to expand

Prediction under label shift becomes nonstandard when responses are censored. In a two-sided censored Gaussian model, latent values below $L$ and above $U$ are recorded at the boundary values, so the observed predictive distribution is mixed, with atoms at $L$ and $U$ and a continuous density on $(L,U)$. In this paper we develop conformal Bayes for this mixed-space setting by combining posterior predictive tilting with weighted conformal calibration. Under a two-sided Tobit Gaussian Bayesian prediction head with a Laplace posterior approximation, the tilted predictive distribution has left-atom, interior, and right-atom components, with a three-term closed-form normalizer. The resulting prediction set is a mixed highest density region that can combine boundary atoms with an interior interval and can reduce to atom-only sets under strong censoring. The main technical issue is that latent label shift does not directly give an ordinary density ratio on the observed censored scale. A latent exponential tilt induces tail-averaged atom weights at the censored boundaries, while the interior ratio remains density based. This yields a mixed observed-space calibration weight with two atom ratios and one interior density ratio. The weight corrects the calibration measure, while predictive tilting gives target-adapted mixed-HDR geometry. Synthetic experiments show that weighted tilted conformal Bayes restores marginal coverage with smaller sets than weighted source-score calibration, while revealing a trade-off between marginal coverage and component-wise behavior across atoms and interior observations.

0

math.ST 2026-07-03

Weaker matrix condition extends simplex volume theorems to AR(1) models

by Shan Xizheng, Li Yanpeng

A note on "The volume of random simplices from elliptical distributions in high dimension"

Central and stable limits for log-volumes of high-dimensional random simplices now hold under relaxed assumptions on the population matrix.

abstract click to expand

Recent work by Gusakova et al. (Stochastic Process. Appl. 164 (2023) 357-382) has shown a central and a stable limit theorem for the logarithmic volume of random simplices and random convex bodies under an elliptical framework in the high dimensional regime, that is, if p and n tend to infinity in such a way that the ratio tends to \gamma within (0,1). A technical condition (Equation (2.6) of Assumption (B) therein) requires that the population matrix AA* is close in Frobenius norm to a multiple of the identity matrix, which is rather restrictive and rules out various settings for statistical application, such as spiked models and dependent structure models. In this note we offer a general relaxation of this condition, which arrives at a reasonable condition and covers numerous scenarios, as well as consequences for the volume of general random simplices and random convex bodies. In particular, our results covers the Toeplitz/AR(1) covariance structures studied by Jiang and Pham (Ann. Stat. 53 (2025) 907-928), giving a concrete application of our theorem to high-dimensional dependent covariance models.

0

stat.AP 2026-07-03

Quaternion wavelets classify breast histology into four classes

by Sara Antonijevic, Brani Vidakovic

Quaternion Nondecimated Wavelet Descriptors for Multiclass Breast Histology Classification

Color-coupled nondecimated transforms produce balanced accuracy on BACH data without pretrained networks or external data.

abstract click to expand

Breast histology images carry diagnostic information in color, texture, orientation, and tissue architecture across a range of scales. In H&E microscopy this information is inherently chromatic and is not fully recovered when the red, green, and blue (RGB) channels are reduced to grayscale or transformed as independent scalar images. We propose an interpretable quaternion nondecimated wavelet framework for breast histology classification. Each RGB image is encoded as a pure quaternion field, and a quaternion nondecimated wavelet transform in two dimensions (QNDWT2D) produces multiscale, directional, color-coupled coefficient fields on the original image grid, keeping color as a single vector quantity rather than three separate channels. From these coefficients we build interpretable feature families summarizing stain balance, wavelet energy, amplitude heterogeneity, quaternion phase concentration, color-axis geometry, directional anisotropy, orientation entropy, and scale-dependent energy decay, each tied to a histopathological property such as nuclear density or glandular organization. We evaluate the descriptors on the BreAst Cancer Histology (BACH) challenge, a balanced four-class set of normal, benign, in situ, and invasive tissue, using a radial-kernel support vector machine (SVM) with repeated nested cross-validation. The descriptors yield balanced recognition across classes, with errors concentrated among adjacent categories while normal and invasive are rarely reversed. Permutation importance shows that directional, phase-concentration, anisotropy, scale, and amplitude-variability groups all contribute, indicating that the classifier draws on genuine quaternion and multiscale geometry rather than global color alone. The framework uses no pretrained networks, learned filters, or external databases, offering a reproducible, interpretable baseline for computational pathology.

0

cs.LG 2026-07-03

Conformal prediction flags 15 percent of samples but detects zero traction incidents

by Varshith Roy Kotla

Predictive Conformal Slip Monitoring: An Empirical Evaluation of Rolling Split Conformal Prediction for Pre-Incident Traction Loss Detection

Evaluation across 19 drivers shows the rolling-volatility method matches a simple threshold while violating its core exchangeability assumpt

abstract click to expand

Conventional traction control architectures intervene only after the adhesion limit of a tire has already been breached. This paper investigates whether Rolling Split Conformal Prediction , monitoring the volatility of non-conformity residuals from a per-driver Random Forest model of expected slip behavior , can serve as a statistically grounded pre-incident warning signal, ahead of gross traction loss. Unlike an earlier internal draft of this work, the evaluation reported here corrects a confound in the slip proxy (vehicle speed is included as an explicit model feature, not left implicit in the target's denominator), uses every racing lap for each driver rather than only the fastest lap, and is scored against real, timestamped incident labels extracted from FIA Race Control Messages and track-limits lap deletions rather than narrated post-hoc. The result is negative: across 19 drivers and 55,563 test-phase telemetry samples, the rolling-volatility detector achieves a mean precision of essentially 0.0 and mean recall of 0.0 against 14 ground-truth incidents, while flagging on average 15.3% of all samples as anomalous , too high a false-alarm rate for any early-warning use. A static 95th-percentile threshold baseline performs no better in any way that would justify the added complexity of the conformal-volatility formulation. Residual autocorrelation diagnostics show the split-conformal exchangeability assumption is violated for every driver (Ljung-Box p < 0.001, n = 19/19), which is one plausible driver of the high false-alarm rate. We report this as a methodologically rigorous negative finding, diagnose its likely causes, and outline what a genuinely predictive version of this approach would require.

0

stat.ME 2026-07-03

Expert portfolio detects PDE model errors missed by residuals

by Ieva Kazlauskaite

Sequential Structure-Sensitive Residual Diagnostics for PDE Inverse Problems

Sequential e-process rejects bad fits early using spatial residual patterns with anytime-valid error control.

abstract click to expand

Computational models in science and engineering are often assessed by checking whether the residual norm is consistent with the assumed noise level. This can be misleading in smoothing inverse problems: structured model errors may be attenuated in observation space, leaving residual magnitudes below practitioner discrepancy thresholds while coherent residual patterns remain. As a result, residual-norm diagnostics can accept fitted models that still give biased parameters, predictions, or quantities of interest. We propose a structure-sensitive sequential diagnostic based on e-processes. The method uses a portfolio of spatial residual-pattern experts, updates their likelihood-ratio wealth as observations are processed, and rejects the fitted model when the aggregate wealth crosses a prescribed threshold, giving anytime-valid type-I error control for a fixed fitted model. We compare the method with Morozov discrepancy checks, fixed-sample residual tests, and batch projection tests. Across three inverse problems (elliptic diffusion, two-dimensional Stokes flow, and a glaciological ice-stream inversion implemented in the community finite-element model icepack) we demonstrate how standard discrepancy checks accept misspecified fits that produce materially wrong quantities of interest. Structure-sensitive batch tests detect these failures using the full dataset, while the e-process detects them earlier from a fraction of the observations. After rejection, the expert wealth attributes the evidence to residual patterns in the chosen dictionary and provides a basis for exploratory model correction.

0

econ.EM 2026-07-03

GIV consistent at √T only when few units dominate aggregate

by Gokul Gopalan Ramachandran

Granular Instrumental Variables in Large Panels: Identification and Inference Across Strong, Nearly Weak, and Weak GIV

Three regimes of instrument strength arise from unit sizes in growing panels, dictating rates and valid inference methods.

abstract click to expand

I develop the asymptotic theory of instrument strength for Granular Instrumental Variables (GIV) in large panels with both $N$ and $T$ growing. The strength of the GIV depends on the presence of dominant units. I formalise what dominance means and characterise three regimes of instrument strength. When a few units dominate the aggregate, the instrument is strong. The GIV estimator is consistent and asymptotically normal at the standard $\sqrt{T}$ rate. When large units stand out but do not dominate, the instrument weakens. But I show that the parameter of interest remains recoverable. The GIV estimator remains consistent and asymptotically normal, now at a rate slower than $\sqrt{T}$. When units are comparable in size and none stands out, the instrument is weak in the standard sense. The GIV estimator is inconsistent and has a non-standard distribution. Wald inference is reliable only outside the weak regime. When the instrument is weak, I recommend Anderson-Rubin confidence sets. In practice, the instrument must be constructed in a first stage. I show that the feasible estimator attains the same rate, but its asymptotic variance picks up an additional term from the first-stage estimation. Valid inference must use standard errors that account for this term. I apply the GIV estimator with the correct standard errors to recover the short-run demand elasticities of three commodities: refined copper, crude oil, and natural gas.

0

stat.ME 2026-07-03

Orthogonal arrays and difference schemes create larger grouped arrays

by Meixin Liu, Chunyan Wang +2 more

Grouped Orthogonal Arrays from Orthogonal Arrays and Difference Schemes

Provides new designs with more groups and larger sizes for experiments assuming negligible cross-group interactions.

abstract click to expand

Grouped orthogonal arrays were introduced to address experimental design problems arising in computer experiments with grouped inputs, as well as in physical experiments where interactions between factors from different groups are assumed to be negligible. Motivated by the growing need for flexible and efficient designs under such settings, this article develops several constructions to expand the existing catalogs of grouped orthogonal arrays. The proposed constructions provide a large collection of new grouped orthogonal arrays with significantly larger numbers of groups and group sizes.

0

math.PR 2026-07-03

Geometric graphs indistinguishable from random above (nh(p))^3 dims

by Hang Du, Cheng Mao +3 more

Resolution of the Detection Threshold Conjecture for Random Geometric Graphs in the d>n Regime

Proves conjecture by showing total variation distance to Erdős–Rényi vanishes when d ≫ (nh(p))^3 and d > n.

abstract click to expand

A random geometric graph (RGG) is generated by first sampling latent points $x_1,\ldots,x_n$ independently and uniformly from the unit sphere in $\mathbb{R}^d$, and then connecting each pair $(i,j)$ if $\langle x_i,x_j\rangle$ exceeds some threshold $\tau$. We study the sharp detection threshold -- the largest dimension at which the RGG can be statistically distinguished from the Erd\H{o}s--R\'enyi graph with the same edge density $p$. This threshold is conjectured to be $d \asymp (nh(p))^3$, where $h(p)=p \log \frac{1}{p} + (1-p) \log \frac{1}{1-p}$ is the binary entropy function. Previous works proved this conjecture for dense graphs with constant $p$ and, up to polylogarithmic factors, very sparse graphs with $p=\Theta(1/n)$. In this paper, we prove that detection is impossible when $d\gg (nh(p))^3$ and $d\ge (1+\epsilon) n$ for any constant $\epsilon>0$, thereby resolving the conjecture in the regime $p\gtrsim n^{-2/3}/\log n$ and improving upon the state of the art in the regime $1/n \ll p \ll n^{-2/3}/\log n$. The key to our proof is a sharp analysis of the posterior distribution of the latent points given the observed graph, obtained through an information-theoretic comparison argument combined with strong log-concavity.

0

stat.ML 2026-07-03

Shallow network optimum recovered by one linear solve

by Matej Benko, Pierre Bousquet +2 more

Born Discrete, Made Smooth: Variational Formulation of Shallow Neural Networks

A continuum variational problem on parameter densities turns training convex and yields the minimizer directly from a linear system with exp

abstract click to expand

Although neural networks are remarkably effective, their underlying optimization principles remain theoretically elusive, often characterized by non-convex landscapes and stochastic heuristics. In this work, we propose a paradigm shift by replacing the discrete training problem of shallow neural networks with a well-posed continuum variational surrogate. We identify a family of $\lambda$-convex functionals over parameter densities in weighted Sobolev spaces and prove that these variational problems are globally well-posed, stable, and exhibit unexpected almost $C^3$ regularity. Unlike existing Wasserstein-based or Mean-Field approaches, which often face limited regularity and discretization challenges, our formulation provides direct access to elliptic regularity and convex analysis. This allows us to prove that the optimal parameter density can be obtained by solving a single linear system, bypassing iterative optimization entirely. We establish explicit generalization error controls at a rate of $1/\alpha$ relative to the regularization parameter, and prove that finite-width networks of size $N$ achieve the continuum optimum at an $O(1/N)$ rate. This perspective bridges the gap between the Neural Tangent Kernel (NTK) and feature-learning regimes, providing a principled framework for understanding over-parameterization through the lens of variational calculus.

0

stat.AP 2026-07-03

Probit BKMR fits in bkmr rarely converge

by Akifumi Eguchi, Takayuki Kawashima +2 more

Convergence fragility in probit Bayesian kernel machine regression implemented in the bkmr R package for binary-outcome environmental mixture analyses: a simulation study

Simulation of 431 tasks found only 30 met R-hat ≤ 1.01 and ESS ≥ 400, so report full diagnostics instead of fit success alone.

abstract click to expand

Background. Bayesian kernel machine regression (BKMR) is widely used for exposure-mixture analyses with binary outcomes through a probit extension. Because a bkmr fit can complete without providing adequate effective posterior information, simulation studies should separate execution success from MCMC convergence diagnostics. Methods. We evaluated the public bkmr probit workflow using bkmr::SimData() for data generation, bkmr::kmbayes() for model fitting, and posterior for convergence diagnostics. The balanced generator used family = "binomial", hfun = 2, beta.true = 0.5, ind = 1:2, and M = 4. SimData() generated the covariate as X = 3*cos(z1) + 2*rnorm(n). Four chains were initialized with chain-specific randomized starting values generated reproducibly from the fixed initial-value base seed 20260621. These values affected only the initial state of the sampler and did not alter the BKMR model, default priors, or Metropolis-Hastings proposals. Results. Of 431 prespecified tasks, 430 returned fitted objects and one task had a numerical non-completion. Diagnostic adequacy was limited: rank-normalized R-hat <= 1.01 threshold was achieved in 55/431 tasks, bulk-ESS >= 400 in 85/431, tail-ESS >= 400 in 44/431, and both ESS criteria in 44/431. The primary diagnostic criterion, R-hat at or below the 1.01 threshold with both bulk-ESS and tail-ESS >= 400, was met in 30/431 prespecified tasks, corresponding to 30/430 completed fits. Conclusions. Completion of probit BKMR fits in bkmr should not be equated with convergence of the retained MCMC draws. Applied analyses should report the number of chains, warmup and retained iterations, rank-normalized R-hat, bulk-ESS, and tail-ESS rather than rely on a fixed iteration count or on fit completion alone.

0

stat.ME 2026-07-03

Plausibility enables exact inference in general parametric families

by Stefan Böhringer, Jesse Swen

Plausibility: Exact inference in R

R package implements the framework for regression models and supports exact tests on data examples.

abstract click to expand

Plausbility is a theoretical framework that allows to conduct exact inference in general parametric families. We introduce R-packages {\em plausibility} that implements this framework for a wide class of regression models. Plausibility can also be used to test penalized regression models such as estimated by package {\em glmnet}. We illustrate the package using a number of R data sets Through a class-based mechanism, the package can be easily extended. We illustrate and discuss computation aspects of the implementation and their impact on real-data analysis.

0

stat.ME 2026-07-03

Moment method selects random effects consistently in mixed models

by Yifan Chen, Yuedong Wang +1 more

Moment-Based Selection of Multiresponse Linear Mixed-Effects Models

It reduces the problem to convex optimization using cross-moment identities and establishes finite-sample guarantees under sub-Weibull error

abstract click to expand

We propose MOMENT (\textbf{MO}ment-Based \textbf{M}ixed-\textbf{E}ffects Selectio\textbf{N} and Es\textbf{T}imation), a stage-wise moment-based framework that exploits second-order cross-moment identities to select and estimate the random-effects covariance matrix and fixed-effects coefficients. By inducing sparsity through its diagonal under a positive semidefinite constraint, the random-effects selection problem reduces to a smooth constrained convex optimization problem that can be solved efficiently by projected gradient descent. We further establish finite-sample theoretical guarantees for the proposed procedure, including random-effects selection consistency and fixed-effects selection consistency under joint sub-Weibull errors. Simulation studies show that MOMENT performs competitively overall and can substantially outperform separate univariate analyses when responses are correlated. An application to the hemodialysis dataset demonstrates that the proposed method yields an interpretable and flexible approach for multivariate longitudinal data.

0

stat.AP 2026-07-03

IRT model extracts rider skill and condition difficulty from binary outcomes

by Fabio Carucci

Inverse Suitability: Identifying Condition Difficulty and Rider Skill from Behavioural Outcomes via Continuous-Item Response Theory

Continuous-item formulation recovers skill at r=0.96 and improves Brier score by 0.33 over expert curves on synthetic cohort of 80 riders.

abstract click to expand

Suitability scoring for outdoor activities (kitesurfing, paragliding, ski touring) maps environmental conditions to a go/no-go verdict via expert-defined curves. These curves conflate two distinct quantities: the intrinsic difficulty of a condition and the skill of the person facing it. We introduce Inverse Suitability, a continuous-item Item Response Theory (IRT) model that identifies both from behavioural outcomes alone. Each outcome is a triple (rider r, condition metric x at site s, binary outcome y); we model P(y=1) = sigma(a (theta_r - delta(x, s))), where theta_r is latent rider skill, delta(x, s) is a latent difficulty function anchored to a physics-derived expert curve as its prior, and a is a discrimination parameter. The formulation is strictly more general than a single suitability curve, which it recovers exactly when skill is integrated out under the population distribution. Parameters are estimated by marginal maximum likelihood with Gauss-Hermite quadrature; identification holds when the rider-by-condition incidence graph is connected, with a documented single-curve fallback otherwise. We validate via synthetic recovery: on a reference cohort (80 riders times 30 outcomes) the model recovers latent skill at r = 0.96, locates the difficulty minimum within 3 units of ground truth, and improves held-out Brier Skill Score by +0.33 over the expert-curve baseline. The recovered difficulty function defines a measurable, site-level construct, an intrinsic difficulty atlas, that existing meteorological observation networks do not capture. All results reproduce from a single command on synthetic data, requiring no proprietary observations.

0

stat.ML 2026-07-03

Autorelevance function recovers lag structure in time series forecasts

by Julian Cardenas, Jamie Arjona +1 more

Autorelevance function and other feature relevance measures for univariate time series

Shapley-based measures with one-step forecast replacement for missing lags identify expected patterns across ARMA and neural models.

abstract click to expand

We propose a model agnostic methodology to measure lag relevance in machine learning forecasting models applied to univariate time series. Particularly, we are working in the context of time series using the frameworks of Ghost variables and Shapley values, together with additive importance measures, to introduce the auto-relevance and partial auto-relevance functions as the lag importance values. Additionally, we propose a novel method to replace absent features in coalition based methods with a one step forecast from the same model. We evaluate these proposals under different simulations and real data cases. This combined framework perspective is particularly suitable for time series. In addition, to show our discoveries we use a pull of models from the seasonal ARMA family and recurrent neural networks. We found that the calculated relevance measures successfully demonstrate the expected lag structure in almost all cases.

0

stat.ML 2026-07-03

K-means centers from MCAR data converge at √n rate

by Xin Guan

Statistical Properties of k-means Clustering for Data Missing Completely at Random

Recovery of the true centers holds under a missing-probability and separation condition, provided centers differ in every dimension.

abstract click to expand

The classical $k$-means clustering cannot be directly used to incomplete data, and existing $k$-means-based clustering for missing data primarily focus on improving the practical accuracy of clustering, whereas most of them lack theoretical guarantees in the asymptotic sense. In this paper, we investigate the statistical properties of $k$-means clustering in the presence of missing data. We first establish the $\sqrt{n}$-excess risk bound and prove the consistency of the estimated cluster centers under general missing mechanisms. For the Missing Completely at Random (MCAR) mechanism, we further derive the $\sqrt{n}$-convergence rate and asymptotic normality of the estimated cluster centers. Moreover, we study in what cases the cluster centers estimated by incomplete data converge to the true cluster centers of original fully observed data, and give a sufficient condition about the missing probability and the separation among true clusters. These results provide a theoretical guarantee for missing-data-$k$-means. Notably, our analysis reveal that under MCAR mechanism, both achieving the $\sqrt{n}$-rate and converging to the true cluster centers require $k$ true centers to be distinct in every dimension, highlighting the significant challenges of application in high-dimensional regimes. Finally, we conduct numerical simulations on synthetic incomplete datasets to support our theoretical analysis results.

0

math.ST 2026-07-03

Perturbation theory transfers sup-norm rates to functional principal components

by Hajo Holzmann, Kevin Wilk

Transferring supremum-norm rates and weak convergence of covariance kernel estimators to functional principal components

L2-perturbation theory converts existing covariance kernel rates into optimal sup-norm and normality results for the associated eigenfunctio

abstract click to expand

We show that $L_2$-perturbation theory can be used to transfer rates of convergence in the supremum norm as well as weak convergence in the space of continuous functions from covariance kernel estimators to the associated functional principle components (FPCs). As an application we obtain optimal rates of convergence in sup-norm, including minimax-lower bounds, as well as asymptotic normality for estimating the FPCs in a discrete observational model with errors under fixed, synchronous design. The sparse to dense transition which has previously been observed for mean function and covariance kernel estimators also applies to the FPCs. Surprisingly, eigenvalue estimation exhibits a discretization-dominated regime under sparse designs, too. Our results further apply to estimators of cross-covariance and long-run covariance kernels, as well as to covariance kernels of derivative processes. We also present results of numerical experiments in which we use the Nystr\"om method to compute FPCs and eigenvalues, and give an empirical illustration to series of daily temperature curves.

0

stat.ME 2026-07-03

Closed-form wrapped Gaussians replace Laplace for posterior approximation

by Marcelo Hartmann, Luu Hoang Phuc Hau +6 more

Beyond Laplace: Closed-form wrapped Gaussian posterior approximations on statistical manifolds

Contrast functions approximate maps on statistical manifolds, removing geodesic solvers and curvature calculations.

abstract click to expand

In Bayesian statistics, the Laplace approximation provides a computationally efficient approximation to posterior distributions. However, its Gaussian form restricts it to elliptical shapes, limiting its ability to capture important posterior features such as skewness, heavy tails, and narrow high-probability regions. Recent work has addressed this limitation by exploiting Riemannian geometry to push forward Gaussian distributions from the tangent space to the manifold, referred to wrapped Gaussians. While offering greater flexibility, they introduce substantial computational challenges. Sampling requires solving geodesic equations through the exponential map and density evaluation additionally depends on the logarithmic map and Jacobi fields, involving costly differential equation solvers and geometric quantities such as inverse matrices, Christoffel symbols and curvature tensors. To overcome these limitations, we employ the theory of contrast functions to derive tractable approximations of the logarithmic and exponential maps on statistical manifolds endowed with the Fisher--Rao metric and the prior distribution geometry. The resulting methodology bypass the need to compute these geometric quantities and numerical solvers thereby removing the principal computational bottlenecks of existing wrapped Gaussian approaches. Empirical results across a range of models demonstrate that the proposed approximation captures complex posterior geometries while remaining orders of magnitude faster than current state-of-the-art approximation.

0

cs.LG 2026-07-03

Variational estimator beats spectral for density ratios with abundant data

by Francis Bach (SIERRA)

Regularized Variational and Spectral Log-Density-Ratio Estimation in the Gaussian Location Model

In the Gaussian location model, the risk ordering reverses with the observation-to-dimension ratio under ridge regularization.

abstract click to expand

We study ridge-regularized log-density-ratio estimation in the Gaussian location model with a common covariance matrix. By affine invariance, the model is written as q $\sim$ N(0, I), p $\sim$ N($\Delta$, I), with linear features, where $\Delta$ is a mean vector. The variational estimator is the empirical Kullback-Leibler (KL) log-normalized fit with a squared L2-penalty on its nonconstant coefficient, and the spectral estimator recently introduced in [1] replaces a single variational problem by a continuum of ridge-regularized least-squares problems. We derive high-dimensional deterministic asymptotic equivalents when the numbers of observations and dimension tend to infinity with fixed ratios. The regularized variational limit is characterized by a scalar entropy minimization problem derived from the convex-Gaussian-min-max theorem (CGMT), while the regularized spectral limit follows from deterministic equivalents for resolvents of weighted sums of two independent Gaussian sample covariance matrices. We use these formulas to compare population risks, with experiments focused on fixed-signal aspect-ratio sweeps and optimized regularization. Our conclusion is that with many observations, under the criteria and asymptotic regimes analyzed here, the well-specified variational estimator has the smaller risk, while with fewer observations, the spectral estimator is favored because its covariance-based construction has lower variance. We also study how a nuclear penalty can be used and partially analyzed to perform feature learning.

0

stat.ME 2026-07-03

Calibration lets multimodal predictors borrow across missingness patterns

by Junhan Yu, Kejian Zhang +2 more

Pattern-Calibrated Multimodal Prediction under Blockwise Missingness

Bounds decompose error into overlap size, calibration gap, and representation error, showing when borrowing beats local fitting.

abstract click to expand

Blockwise missingness in multimodal data is usually treated as an incomplete-input problem. We instead focus on prediction for a prespecified observed-modality pattern, where the observed modality set determines the information on which the prediction rule can condition. A procedure that imputes missing modalities, zero-fills unobserved modalities, or trains a single pooled predictor may borrow information across patterns, but it can also mix pattern-specific prediction rules. We propose Multimodal Overlap-aware Shared-specific Alignment and Inter-pattern Calibration (MOSAIC), a pattern-calibrated framework for borrowing across missingness patterns without collapsing their prediction rules. MOSAIC learns shared and modality-specific representations, uses the available representations that overlap with the target pattern to fit a first-stage predictor, and then estimates the calibration gap from target-pattern data. We establish non-asymptotic bounds that decompose the error into overlap effective sample size, calibration gap, and representation-learning error, clarifying when cross-pattern borrowing improves over local fitting and when the improvement is controlled by rule mismatch or representation-learning error. Simulations examine representation recovery and target-pattern correction, and applications to ICU mortality prediction, emotion recognition, and glaucoma classification show gains when target-pattern samples are limited or pattern-specific rules differ.

0

cs.LG 2026-07-03

Role projections lift directional accuracy on semantic benchmarks

by He Huang, Lu Shen +2 more

Role-Aware Neural Convex Divergence Heads for Asymmetric Representation Learning

Source and target projections before convex neural divergences model asymmetric relations like entailment while keeping scores nonnegative.

abstract click to expand

Many representation learning problems involve directed relations, such as lexical entailment, sentence entailment, ontology hierarchy, and citation links. Standard Euclidean, cosine, and Mahalanobis heads are symmetric, while generic neural scorers can model directionality but provide limited geometric structure. This paper proposes a role-aware neural convex divergence head for asymmetric representation learning. The head applies source- and target-role projections before evaluating an input-convex neural Bregman divergence, yielding a nonnegative structured score in the role-projected space. We characterize its projected-space identity, source-role convexity, directional-gap decomposition, and Hessian-based local curvature. Experiments on lexical, sentence, ontology, and directed graph benchmarks compare symmetric distances, unstructured asymmetric scorers, order/hyperbolic baselines, plain ICNN-Bregman heads, and the proposed role-aware variant. Across ten random seeds on the main semantic and ontology benchmarks, role-aware projections consistently improve directional accuracy over plain ICNN-Bregman heads while preserving zero observed negative divergence rate. The results also identify a boundary case: on large fixed-feature citation prediction, specialized symmetric or hyperbolic baselines remain stronger in ranking accuracy. Overall, the proposed head is best understood as a structured and interpretable plug-in distance module for tasks where directional relations matter.

0

q-bio.QM 2026-07-03

Point source restores identifiability of spatial dynamics from snapshots

by Rujie Gu, Ray Zirui Zhang +1 more

Identifiability Limits of Physics-Informed Inference for Spatial Stochastic Dynamics from Static Snapshots

Distributed sources cannot be uniquely recovered from static patterns, but a transcription site allows physics-informed methods to infer the

abstract click to expand

Despite increasing scale and resolution, many biological measurements remain destructive, revealing only spatial information rather than the dynamics it encodes. By combining flexible representations with mechanistic constraints, physics-informed machine learning offers a promising route to inferring these dynamics from static snapshots. Motivated by subcellular imaging of gene expression, we ask when a static spatial pattern of molecules can identify spatially varying diffusivity, creation, destruction, and boundary exchange, and how different inference schemes perform on the task. A structural identifiability analysis shows that distributed sources are non-identifiable, whereas a point source such as a transcription site can restore identifiability. These limits are further shaped by seemingly innocuous modeling choices: the boundary conditions, the spatial regularity of the underlying dynamics, and even the stochastic calculus convention. We then adapt several physics-informed schemes, differing in how they represent the solution and enforce the governing equations, and demonstrate effective inference from a single snapshot. Physics-informed approaches can thus recover spatial heterogeneities of biological dynamics from static data, but their use should be accompanied and guided by careful identifiability analysis for meaningful interpretation of the results.

0

stat.ML 2026-07-03

Method brings full Bayesian inference to RL without likelihoods

by Stefano Masini, Cecilia Viscardi +1 more

Full Bayesian Reinforcement Learning via LF-IBIS

LF-IBIS approximates posteriors over parameters and policies from simulation data alone to support uncertainty-aware decisions.

abstract click to expand

Reinforcement Learning (RL) is a sequential decision-making framework in which an agent learns optimal policies through interaction with an environment by maximizing cumulative rewards. Among RL methods, Bayesian Reinforcement Learning (BRL) addresses common practical challenges related to data scarcity by leveraging prior knowledge about the environment and sequential belief updates. However, most BRL approaches require an explicit likelihood function, which is frequently inaccessible or intractable in real-world settings. We propose Likelihood-Free Iterated Batch Importance Sampling (LF-IBIS), a novel algorithm for BRL that updates the agent's beliefs online as new interactions become available. By combining Approximate Bayesian Computation with Iterated Batch Importance Sampling, LF-IBIS enables full Bayesian inference in settings where the environment dynamics are not described by an explicit or tractable likelihood. The method yields approximate posterior distributions over both environment parameters and optimal policies, providing a quantification of policy uncertainty useful for a Bayesian treatment of the exploration-exploitation trade-off. We test the method on a simulation study in response-adaptive randomization in clinical trials, where closed-form posteriors enable validation. Additional experiments address settings where the posterior has no closed form and illustrate online policy updating based on the posterior distribution of the optimal policy.

0

stat.AP 2026-07-03

Glicko-2 extended with margin and draw models for football

by Bich Van Nguyen, Nam Anh Tran

An Adaptive Glicko-2 Rating Framework for Probabilistic Football Forecasting and Season Simulation

Dynamic ratings plus ordered-logit probabilities feed Monte Carlo simulations of remaining league fixtures.

abstract click to expand

Football match outcome prediction is a challenging problem because team strength changes over time, match outcomes contain a high level of randomness, and draws play a central role in the result structure. Classical rating systems such as Elo provide simple and interpretable dynamic summaries of team ability, but they do not explicitly model uncertainty and often ignore football-specific contextual information. This paper proposes an adaptive Glicko-2-based rating framework for probabilistic football forecasting and leaguelevel season simulation. The proposed framework extends the standard Glicko-2 model by incorporating football-specific mechanisms, including margin-of-victory adjustment, dominance weighting, structural shocks, home advantage modelling, and an ordered-logit draw model. The framework estimates latent team strength dynamically, converts rating differences into win-draw-loss probabilities, and uses these probabilities to simulate the remaining part of a league season through Monte Carlo sampling.

0

stat.ME 2026-07-03

Transportability estimates effect changes from modifier prevalence shifts

by Michael Cheung, Candus Shi +4 more

From Subgroups to Population Composition: A Transportability Approach to Effect Heterogeneity

Modeling effects in hypothetical populations with shifted prevalences ranks characteristics by their link to differential vulnerability.

abstract click to expand

Identifying heterogeneous populations across which exposure effects vary is essential for transportability applications, cost-benefit analyses, and intervention prioritization. Traditional methods for heterogeneity analyses rely on parametric regression with prespecified subgroups, which may fail to capture complex patterns of effect modification. While recent data-adaptive methods improve high-dimensional heterogeneous effect prediction, they add methodological complexity to analyses and may offer limited insight into key drivers of heterogeneity. In this paper, we propose a novel, conceptual approach for heterogeneity analyses that considers how exposure effects would differ in populations with different compositions by modeling the population-level effect surface as a function of the distribution of effect modifiers. The approach consists of three steps: i) selecting confounders and effect modifiers based on prior knowledge (or alternatively using data-adaptive methods to learn effect modifiers), ii) estimating exposure effects in hypothetical populations with different effect modifier prevalences using transportability methods, and iii) modeling the estimated effects as a function of prevalence values. This approach provides two types of outputs: estimation of the change in the population-level exposure effects attributable to increases in effect modifier prevalence and ranking of effect estimates across multiple effect modifiers and prevalences to identify population characteristics most strongly associated with differential vulnerability. We demonstrate the approach using Demographic and Health Surveys data to examine heterogeneous effects of drought on child stunting and provide a Shiny application to implement this approach in any setting.

0

stat.ME 2026-07-03

Lancaster copulas arise from orthogonal expansions of Lancaster probabilities

by Angelo Efoevi Koudou, Yves I. Ngounou Bakam +1 more

Lancaster copulas

The construction supplies infinite series for the copula and density whose low-order truncations already match target dependence in numerica

abstract click to expand

We introduce a new copula class, called Lancaster copulas, built from orthogonal expansions of continuous Lancaster probabilities. We derive infinite-series representations for the copula and its density, study truncation effects, and show in numerical experiments that low-order truncations already provide accurate approximation.

0

stat.ME 2026-07-03

Spike-and-slab prior recovers multimorbidity clusters in EHR data

by Oyebayo R. Olaniran, Soumya S. Paria +3 more

Continuous-Time Bayesian Networks with Structured Shrinkage Priors for Modelling Multimorbidity Trajectories in Large-Scale Electronic Health Records

Structured CTBN model on 33,558 UK Biobank participants identifies cardiometabolic and inflammatory disease modules.

abstract click to expand

Multiple long-term conditions (MLTCs) arise through complex, time-dependent interactions among diseases, yet existing methods often struggle to jointly model disease progression, multimorbidity networks, and high-dimensional risk factors. We propose a structured Bayesian continuous-time Bayesian network (CTBN) framework for learning directed disease-dependency networks from longitudinal electronic health records. The model allows disease transition intensities to depend on existing conditions, pairwise disease interactions, and exogenous covariates. To control the combinatorial growth of interaction parameters, we introduce order-dependent shrinkage priors that increasingly penalise higher-order effects while preserving clinically interpretable main effects. We compare four sparsity-inducing priors, spike-and-slab, structured normal, Bayesian LASSO, and regularised horseshoe through extensive simulation studies. Across multiple data-generating scenarios, the spike-and-slab prior achieved the best network recovery, variable-selection accuracy, and false-discovery control, while continuous shrinkage priors were less effective for hard variable selection. The proposed framework was applied to UK Biobank primary care records, focusing on data from 33,558 participants who were free of the ten selected most prevalent conditions at age 40 and who subsequently developed at least one of these conditions during the follow-up period. The selected spike-and-slab model identified two dominant disease modules: a cardiometabolic cluster centred on diabetes and an inflammatory cluster linking respiratory and atopic conditions.

0

physics.comp-ph 2026-07-02

Soliton dynamics recovered from scattering data without equations

by Seth Minor, Vanja Dukic +1 more

Learning Effective Soliton Dynamics from Scattering Data

Weak-form identification inside the inverse scattering framework yields low-dimensional models that hold in perturbed regimes.

abstract click to expand

The inverse scattering transform (IST) provides the standard theoretical framework for deriving soliton dynamics. Traditionally, such derivations have been of an analytical, rather than data-driven, nature. In this paper, we combine the conceptual framework of the IST with weak-form system identification methods to discover effective soliton dynamics directly from observed scattering data, without assuming prior knowledge of the scattering equations. Our method avoids parameterizing solitary waves via ad hoc curve-fitting by working in the scattering domain, yielding interpretable low-dimensional models that remain valid in perturbed and near-integrable regimes. We demonstrate the performance of the proposed approach on synthetic and experimental data governed by shallow-water equations of Korteweg--de Vries-type and recover models that are consistent with canonical IST theory.

0

stat.AP 2026-07-02

Kernel norm spots PV faults at 99% accuracy without labels

by Victoria Jorry, Zina-Sabrina Duma +3 more

An unsupervised kernel norm monitoring for fault detection in a time series photovoltaic system

KNM maps normal data windows into kernel space to flag sensor and shading issues in solar systems better than standard baselines.

abstract click to expand

Grid-connected photovoltaic systems (GCPVS) are generally robust but remain susceptible to faults that can compromise energy conversion efficiency or raise safety concerns. Promptly and automatically detecting such anomalies is therefore essential for maintaining system reliability and performance. However, in practice, labeled fault data are rarely available in real-world deployments, which limits the applicability of supervised approaches. Conventional unsupervised baseline models, including a one-class support vector machine (OCSVM), isolation forest (iForest), and local outlier factor (LOF), are trained on normal operation data and assign anomaly scores reflecting how closely new observations resemble that baseline. Although these methods already accommodate non-linear behavior to varying degrees, kernel-based formulations offer further flexibility in shaping the decision boundary; however, tuning the kernel hyperparameters ordinarily requires some prior knowledge of the fault regime. We overcome this limitation by proposing kernel-based norm monitoring (KNM), a non-linear, unsupervised, window-based fault-detection method designed for continuous processes. Although the paper focuses on the GCPVS as a case study, KNM is a general-purpose monitoring framework applicable to a wide range of industrial processes. Using the Grid-connected PV System Faults (GPVS-Faults) dataset operating in intermediate power point tracking (IPPT) mode, KNM is evaluated in two fault scenarios, sensor faults and partial shading, against three benchmark techniques: OCSVM, iForest, and LOF. KNM achieves up to 99.1% and 98.3% accuracy on the two fault scenarios, respectively, using the Cauchy kernel, compared to 93.5% for the best-performing benchmark. The method is interpretable, and variable contribution plots are proposed to support fault identification.

0

cs.AI 2026-07-02

AI agents reproduce 72% of human ideological gaps in data analysis

by Jiacheng Miao, Jonathan K Pritchard +1 more

The Agentic Garden of Forking Paths

Different personas cause agents to reach opposing conclusions from the same dataset, showing selective reporting among valid paths is the co

abstract click to expand

Empirical research rarely admits a unique analysis. Different analytical choices can lead to different conclusions from the same data, yet these hidden forking paths are difficult to observe. We show that AI agents capture much of the analytical variation among human researchers while making these paths explicit. Across four high-stakes domains, assigning different personas is sufficient for AI agents to report divergent, often opposing, conclusions from the same data and question, with findings systematically aligned with those beliefs. In a study in which 42 human research teams analyzed the same immigration dataset, AI agents reproduced 72% of the human ideological gap in reported effect estimates. Despite reaching opposing conclusions, it is difficult to identify clear issues in each analysis based on the final AI reports: 86% passed independent AI review and 78% passed majority human expert review. These findings suggest that the central challenge is often not flawed analyses, but selective exploration and reporting from a large space of methodologically defensible analyses. AI agents may amplify this longstanding problem by making such exploration inexpensive and scalable. To address this, we introduce the m-value (multiverse value), the probability that an analysis path would produce a claim at least as extreme as the reported one. We further introduce Agentic Bootstrap, which estimates the m-value by using AI agents to sample plausible analysis paths. Applied to the human immigration study, 13.5% of reported human analyses fell in the most extreme 5% of the analysis space (m<0.05). Scientific evidence should therefore be evaluated not only by a single reported analysis but also by its position within the distribution of analyses that could reasonably have been reported. Agentic Bootstrap makes this distribution observable and turns it into a criterion for scientific credibility.

0

cs.LG 2026-07-02

Unveiling the Non-Monotonic Effect of Privacy on Generalization under Byzantine Robustness

by Thomas Boudou, Batiste Le Bars +2 more

The benefit reverses when privacy is weaker and the usual tension with robustness returns.

abstract click to expand

Recent work has established a fundamental trilemma between Byzantine robustness, local differential privacy (LDP), and optimization error in distributed learning. We show that this trilemma does not universally extend to generalization error, but instead depends critically on the privacy regime. Specifically, in the high-noise regime (strong privacy), we prove that increasing privacy reduces the generalization error, i.e., there is no tension between robustness and privacy. In the low-noise regime (weaker privacy), however, the tension between robustness and privacy reappears and increasing privacy indeed degrades generalization. Our theory explains this surprising non-monotonic behavior of the generalization error via matching lower and upper bounds on the algorithmic stability of Byzantine-robust distributed learning under LDP constraints. We corroborate and further analyze these theoretical findings with empirical evaluations.

0

cs.LG 2026-07-02

Three-term law recovers optimal batch size from suboptimal runs

by Fabian Schaipp

How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size

Splitting training data into steps and batch size lets the scaling law fit with fewer experiments while predicting best token use.

abstract click to expand

We propose a scaling law that takes into account model size and training data while explicitly splitting the latter into training steps and batch size (called three-term law). Fitting the proposed law on a large set of training runs, we find that it correctly recovers the scaling of the optimal batch size. Moreover, because it makes use of training runs with suboptimal batch size, our proposed law can be robustly fit with a significantly smaller amount of training runs. We further show that the three-term law can be used to derive scaling laws for suboptimal batch sizes, and that it matches previous empirical findings related to the critical batch size.

0

stat.AP 2026-07-02

Parameter uncertainty reduces epidemic sensitivity and yields conservative policies

by Nicholas R. Wu, Michael C. Fu

Sensitivity Analysis and Optimization of Stochastic Epidemic Models under Parameter Uncertainty

Unbiased estimators for stochastic models show weaker herd-immunity effects and more cautious intervention levels when parameters are drawn

abstract click to expand

To address sensitivity analysis and optimization for a discrete-time stochastic epidemic model, we derive unbiased gradient estimators that accommodate uncertainties represented as distributions over the parameters of interest, such as those arising from Bayesian calibration. Specifically, we estimate the sensitivity of total infections over a finite time horizon with respect to the proportion immunized ($v$) and the contact rate ($\beta$). Comparing the proposed estimators with deterministic limit approximations based on large populations reveals differences due to the finite population and time horizon. The estimators exhibit lower variance than finite-difference estimators for the derivative with respect to $\beta$, but higher variance for the derivative with respect to $v$. Simulation experiments indicate parameter uncertainty reduces sensitivity to the parameters of interest. In particular, indirect effects of vaccination, such as herd immunity, are less pronounced compared to when parameters are known. For optimization problems balancing intervention and infection costs, incorporating parametric uncertainty leads to more conservative policies.

0

stat.AP 2026-07-02

20-50 sampled points per region lift spatial scan power

by Foad Namjoo, Drew McClelland +2 more

Sampling for Region-Aggregated Spatial Scan Statistics

Uniform sampling from each area's geometry and even value spreading recovers most detection ability lost by using centroids alone.

abstract click to expand

Anomaly detection in geospatial data is a crucial tool in geographic information science (GIS), with applications ranging from national security to public-health surveillance to the study of societal disparities. This work focuses on spatial scan statistics and addresses a key mismatch: spatial counts are typically aggregated into predefined regions (census tracts, zip codes, counties), whereas the most efficient scan algorithms operate on spatial point data. The standard remedy -- collapsing each region to its centroid, as in widely used tools such as SaTScan -- is convenient but, as we show, discards the region's spatial extent and causes a significant loss in statistical power. To resolve this, we propose a simple yet scalable fix: replace each spatial region with 20-50 points sampled uniformly from its geometry and spread the region's values evenly across them. This approach improves statistical power while maintaining computational tractability. A convergence analysis explains why so few samples per region suffice. We recommend this sampling-based conversion as the default way to apply point-based spatial scan statistics to region-aggregated data for anomaly detection.

0

cs.LG 2026-07-02

Conditional forests rank 3rd-4th as feature selectors

by Robert Milletich, Justin Downes +2 more

Conditional Inference Trees and Forests for Feature Selection

Benchmarks on 30 datasets show competitive top-k performance with bias-controlled split selection.

abstract click to expand

Conditional inference trees (CIT) and conditional inference forests (CIF) reduce split-selection bias by testing features before choosing split thresholds, but repeated permutation tests and threshold searches can make these methods computationally expensive. We study CIT and CIF as top-$k$ feature-ranking methods for downstream prediction using real-data benchmarks, runtime ablations, and synthetic feature-recovery experiments. At a fixed node, if the features and permutation budget do not depend on the node responses, Bonferroni-corrected $+1$ Monte Carlo permutation $p$-values control nodewise rejection under the complete permutation null. CIF ranks 4th among 17 classification methods on 22 datasets and 3rd among 18 regression methods on 8 datasets. With Bonferroni correction held fixed, the CIF runtime ablations indicate that adaptive stopping and the number of thresholds searched have the largest measured effect on runtime: turning off adaptive stopping and using exact threshold search increase fitting time by 4.0--8.4$\times$ and 1.9--10.8$\times$, respectively, while downstream score changes are at most 0.011. Sparse high-$p$ simulations indicate that forest feature sampling can leave informative features out of many split decisions. Overall, the results support CIF as a top-$k$ feature-ranking method in the evaluated downstream prediction benchmarks.

0

stat.ME 2026-07-02

Score test distinguishes non-nested cure-fraction survival models

by Cynthia A. V. Tojeiro, Francisco Cribari-Neto +2 more

J- and MJ-Type Tests for Non-Nested Parametric Survival Models with a Cure Fraction: A Score Test Approach

The MJ statistic checks whether at least one candidate model is correct and supplies a selection rule, all from null-model estimates alone.

abstract click to expand

We propose specification tests for discriminating among non-nested parametric survival models with a cure fraction, focusing on models that differ only in their baseline distributions. The proposed approach augments the null log-likelihood with information from competing models and applies a score test to assess whether the additional information is redundant. Because the test relies only on restricted maximum likelihood estimates, it avoids fitting augmented models. For two competing models, the score statistic reduces to a quadratic form in the sample mean of the individual log-likelihood differences. We show that its signed square root coincides with Vuong's test statistic, although our framework differs in three important respects: it tests the specific null hypothesis that a given model is the true data-generating process, it uses an unsigned statistic that extends naturally to $M \ge 2$ competing models, and it estimates the Kullback-Leibler bias by parametric bootstrap. The resulting MJ statistic combines the individual J tests to assess the global null hypothesis that at least one candidate model is correctly specified, while also providing a model-selection criterion.

0

math.ST 2026-07-02

Motif signatures distinguish latent positions where degrees match

by Roland Boniface Sogan, Tabea Rebafka

Beyond Degree: Rooted Motif Signatures for Latent Position Identifiability in Graphon Models

In generic finite-rank graphons, higher-order rooted patterns recover unique connectivity profiles even when degrees are identical.

abstract click to expand

Graphon estimation requires structural assumptions to address its intrinsic non-identifiability. A standard approach is degree-based identifiability, where the degree function is assumed to be strictly monotonic. This assumption is rather restrictive and fails for graphons with constant or non-injective degree function, even when distinct latent positions have different connectivity profiles. In this paper, we introduce \emph{rooted motif signatures} as higher-order node-level representations for graphons. They extend the degree function by recording, at each latent position, the densities of rooted motifs such as triangles, cycles, paths, and other local subgraph patterns. We study the extent to which these signatures can distinguish latent positions beyond degree information. For generic finite-rank graphons, we prove that suitable rooted motif signatures determine the connectivity profiles of latent positions. We also explain why such a property cannot hold for arbitrary graphons without additional assumptions, since different latent positions may have identical rooted motif signatures. On the statistical side, we define empirical rooted motif signatures from a single observed graph and prove uniform concentration bounds for these estimators. Simulation experiments illustrate that rooted motif signatures can reveal latent structure in settings where degree-based representations are uninformative, including graphons with constant or non-injective degree functions and stochastic block models with equal block degrees.

0

cs.LG 2026-07-02

One narrative links approximation to emergence in deep learning

by Zhilin Zhao

From Approximation to Emergence: A Theory of Deep Learning

The account organizes results on optimization, scaling, transformers, and alignment by what each explains and what each leaves out.

abstract click to expand

Deep learning has outgrown any single mathematical explanation. From Approximation to Emergence develops a unified, proof-oriented account of modern deep learning theory, tracing a path from the classical foundations of approximation, optimization, and generalization to the contemporary mechanisms of overparameterization, robustness, generative modeling, transformers, in-context learning, scaling laws, interpretability, alignment, and emergence. Rather than presenting isolated results, the book organizes a broad literature into a coherent research narrative: each theory is examined through the object it controls, the assumptions that make it valid, and the phenomena it leaves unexplained. Written for researchers, graduate students, and mathematically trained practitioners, this monograph offers a rigorous map of deep learning theory as it stands today: powerful, incomplete, and increasingly centered on the question of how learned mechanisms arise from scale, data, architecture, and training.

0

cs.LG 2026-07-02

Decision loss augments energy score for cost-aware forecasts

by Kornelius Raeth, Nicole Ludwig

Decision-Aware Training for Sample-Based Generative Models

Sample-based generative models learn to penalize downstream decision costs while retaining full probabilistic output.

abstract click to expand

Sample-based generative models are increasingly used for probabilistic forecasting in high-stakes decision settings, yet their training objectives are blind to the decision maker's cost structure. These models are commonly trained with strictly proper scoring rules, such as the energy score, which allocate their training signal in proportion to data density, with no awareness of where forecast errors are most costly for downstream decisions. We therefore propose decision-aware training for sample-based generative models, augmenting the energy score objective with a differentiable decision loss that directly penalises the cost incurred by acting on the model's forecast. This combined loss is theoretically grounded, as the decision loss is itself a proper scoring rule. We validate our method on one synthetic and two real-world tasks, showing targeted improvements in cost-sensitive regions while retaining full probabilistic forecasts.

0

stat.ML 2026-07-02

Separable graphs unify independence models in mixed graphs

by Christopher Meek, Kayvan Sadeghi

Characterizing and Identifying Separable Graphical Models

Missing edges always admit separating sets, enabling canonical representations and an identification algorithm for equivalence classes

abstract click to expand

We study a broad class of graphical models whose independencies correspond to vertex separation in mixed graphs with directed, undirected, and bidirected edges, that are capable of encoding independence structures arising from feedback, latent and selection mechanisms. In particular, we introduce separable graphs, in which each missing edge implies the existence of a separating set for its endpoints, and essentially separable graphs, those graphs separation equivalent to a separable graph. We show that these models include many existing graph families used to define graphical models an provide several characterizations of separable graphs and essentially separable graphs. We also provide multiple characterizations of separation equivalence for separable graphs. One is a graphical characterization in terms of ordinary graph properties, extending earlier results for specific subfamilies Another is a separational characterization depending only on graph separation properties. Finally, we provide a canonical representation for the equivalence classes of essentially separable graphs and develop an algorithm that, under suitable assumptions, identifies the equivalence class of any essentially separable graph.

0

stat.ML 2026-07-02

Refined assumption gives dichotomy counts for low-dimensional data

by Konstantin Häberle, Helmut Bölcskei

Function-Counting Theory for Low-Dimensional Data Structures

Extending Cover's counting theory shows how manifold structure shapes classification capacity and generalization.

abstract click to expand

The success of deep learning models in classification and regression is widely attributed to the low-dimensional structure that real-world data tend to exhibit, despite their high-dimensional representation. This work attempts to provide a mathematical framework for binary classification on low-dimensional data, building on Cover's (1965) function-counting theory. With our framework, we aim to address the question of how the low-dimensional structure of the data affects the classification capabilities of learning models. Cover's theory relies on a general position assumption that blinds it to the underlying data structure. We refine this assumption to account for the low-dimensionality of the data and derive dichotomy counts that reflect the data structure. We further extend Cover's separation capacity and problem of generalization to the low-dimensional setting, enabling the impact of the underlying data structure on both to be analyzed.

0

stat.ML 2026-07-02

Monotone transforms unify multitask learning for mixed outcomes

by Huichao Li, Tong Wang +2 more

Deep Multitask Learning for Mixed-Type Outcomes with Shared Sparsity

Shared first layer and group Lasso yield excess risk bounds plus consistent selection of common predictors.

abstract click to expand

Most existing multitask learning approaches are limited by their reliance on task-specific loss functions tailored to the scale and type of each outcome. When outcomes differ across tasks, these losses are generally not directly comparable, which makes it difficult to formulate a unified objective and may limit information sharing across tasks. We propose a multitask transformation framework in which task-specific responses may differ through unknown monotone transformations. Motivated by high-dimensional biological applications in which the predictor dimension may diverge with the sample size while only a common subset of predictors is informative, we consider shared sparsity across tasks. Under this framework, we estimate the target functions and identify important predictors by optimizing a smoothed rank-based criterion with a group-Lasso penalty, implemented through a multitask deep neural network with a shared first layer. We establish the nonasymptotic excess-risk bounds, and variable-selection consistency for the proposed estimator. Simulation studies show that the proposed method achieves competitive prediction and variable-selection performance compared with competing approaches. Analyses of gene-expression studies with continuous, binary, and mixed outcomes further illustrate that the proposed method improves prediction and identifies biologically meaningful shared predictors.

0

stat.ME 2026-07-02

Baseline treatment identifies effects despite informative switching

by Yang Liu, Andrew Ying +2 more

An Instrumental Variable Approach to Account for Informative Treatment Switching in Real-world Evidence

The doubly robust estimator uses an instrumental variable and martingale residuals, without needing a no-switching subset, and applies to mu

abstract click to expand

Reproducible and generalizable assessment of treatment decisions requires principled handling of subsequent treatment switching that may inform expected outcomes and shift across cohorts and over time. To effectively account for informative treatment switching, we propose an instrumental variable approach that characterizes the poorly documented expected outcomes at switching as unmeasured confounding. After establishing the baseline treatment as a viable instrumental variable, we constructed an estimating equation based on the association between the centered instrumental variable and a martingale style residual process that identifies the treatment effect under structural cumulative survival model. Our proposed method is doubly robust, i.e., valid whenever either of baseline propensity model or no-switching outcome model is consistently estimated. A co-training of treatment effect parameter and survival outcome regression model eliminated the requirement of observing a no-switching subset under semi-parametric additive hazards models. We further developed an baseline-survival-corrected cross-fitting approach to incorporate general machine learning models for estimating nuisance models. Numerical results demonstrated the validity of our method in various settings when a basket of benchmark solutions produced biased or contradictory results. We applied our method to comparison of high-efficacy vs standard efficacy disease modifying treatments as the second line therapy of multiple sclerosis.

0

stat.ME 2026-07-02

GGMNIRA quantifies node influence via KL after mean manipulations

by Yiming Wu, Fei Wang +1 more

Simulating Node Manipulations in Gaussian Graphical Models: The GGMNIRA Framework for Continuous and Ordinal Psychological Network Data

The algorithm simulates conditional mean changes to quantify how node manipulations alter psychological network distributions.

abstract click to expand

Scientific Abstract: In psychological network analysis, centrality indices are commonly used to evaluate the importance of nodes within a network. However, centrality only captures the static topological position of a node, and there is no sufficient theoretical justification for assuming that it reflects a node's influence on network dynamics. The NodeIdentifyR Algorithm (NIRA) offers an alternative by systematically applying simulated manipulations to node intercepts within the Ising model to evaluate nodes' projected importance, but this algorithm is restricted to binary data, and the manipulated parameter lacks a clear theoretical meaning outside the context of psychopathology. To address these limitations, we propose the Gaussian Graphical Model NodeIdentifyR Algorithm (GGMNIRA), which manipulates a node's conditional mean and uses Kullback-Leibler (KL) divergence to quantify the change in network distribution before and after manipulation, thereby extending this simulated manipulation logic to the Gaussian graphical model framework, which is applicable to continuous and ordinal data. Around this algorithm, we further developed a correlation stability coefficient and a nonparametric bootstrap difference test for KL divergence, with corresponding interpretive thresholds established through simulation studies. The framework was also extended to bridge Gaussian graphical models and moderated Gaussian graphical models, enabling its application to multi-construct comorbidity networks and to contexts involving moderation effects. All methods are implemented in the R package "GGMNIRA".

0

stat.AP 2026-07-02

Bayesian model clusters Venice micro-mobility users into eight profiles

by Vanshika Keshwani, Stefano Mazzuco

Beyond the Flow: A Bayesian Latent Clustering Framework for Shared Micro-mobility Users in Venice

Users are grouped from raw trip sequences rather than summaries, separating localized, commuter, and tourist patterns.

abstract click to expand

The study on shared micro-mobility is based on trip modeling and user data. User segmentation in shared micromobility systems is traditionally studied by aggregating trip-level observations into user-specific summary measures before applying clustering techniques. Such aggregation can obscure trip-level variability and lead to ecological fallacies if results are interpreted as applying to individual records. We propose a Bayesian finite mixture model for multivariate categorical count data that clusters users directly from repeated trip-level observations while preserving the full categorical structure of individual travel behavior. This approach focuses on identifying heterogeneous mobility users from high-dimensional categorical trip behavior while accounting for uncertainty in cluster assignments. Users are the fundamental unit of analysis for exploring latent cluster patterns. The model represents each user with a product-multinomial likelihood with latent cluster membership. The methodology is illustrated using a one-year trip record of shared bikes and e-bikes from the Municipality of Venice, Italy, comprising over 220,000 trips made by more than 11,000 recurrent users. The analysis identifies eight distinct latent mobility profiles corresponding to localized, commuter-oriented, tourist-oriented, central, and inter-zonal travel behaviors. The proposed framework provides a flexible and computationally scalable approach for clustering repeated categorical observations and is readily applicable to other large-scale behavioral and transportation datasets.

0

cs.IT 2026-07-02

Planted subgraph recovery threshold set by minimal max density

by Wasim Huleihel

Recovery of Planted Subgraphs

The smallest balanced induced subgraph's densest part determines when exact recovery from a random graph becomes possible with high probabil

abstract click to expand

Understanding the fundamental limits of recovering planted subgraphs in random graphs is a central challenge in high-dimensional statistics and theoretical computer science. While existing work has largely focused on special subgraph families such as cliques, bicliques, or dense blocks, the exact recovery of a general planted subgraph in Erd\H{o}s--R\'enyi random graphs remains poorly understood. In this paper, we study the exact recovery of an arbitrary planted subgraph $\Gamma = \Gamma_n$ embedded in a dense Erd\H{o}s--R\'enyi random graph $\mathcal{G}(n,q_n)$, where edges within $\Gamma$ are present independently with probability $p_n > q_n$. Our main results identify sharp conditions under which exact recovery is possible with high probability, and we establish matching lower bounds showing the necessity of these conditions. The resulting statistical threshold is characterized by a new graph-theoretic quantity, which we term the \emph{minimal maximum subgraph density}. This quantity is defined as the maximum subgraph density of the smallest induced balanced subgraph of $\Gamma$. We then turn to the problem of recovery under polynomial-time constraints. We propose a computationally efficient recovery algorithm that applies to arbitrary planted subgraphs and analyze its performance in terms of certain spectral properties of the adjacency matrix. In addition, we derive computational lower bounds for recovery using the low-degree polynomial framework, establishing regimes where recovery is statistically possible but computationally hard. Finally, we consider several extensions of our setting, including recovery in semi-random models and weaker notions of recovery.

0

stat.ML 2026-07-02

Surrogate variable enables explicit process noise modeling in Kalman filters

by Shilei Li, Dawei Shi +2 more

Hierarchical Variational Kalman Filtering

Reformulated inference cuts iterations and supports higher-order trackers that become zero-phase over full history.

abstract click to expand

Traditional variational Kalman filtering with unknown noise statistics suffers from inconsistent process covariance estimation and slow convergence speed, limiting its practical utility. To address these issues, we introduce a surrogate variable representing the process-noise-free state, which enables explicit modeling and inference of process noise statistics. In addition, we reformulate the conventional coordinate ascent variation inference (CAVI) as a marginalized maximum a posteriori problem, followed by a single-step hyperparameter fitting. This reformulation obviates the need for multiple inner iterations inherent to CAVI and decouples the design of the covariance tracking filters. Consequently, this architecture permits the deployment of higher-order filters for covariance tracking and enables sliding-window hyperparameter estimation. Notably, when this window encompasses all historical data, the covariance tracking estimator intrinsically operates as a zero-phase filter. Numerical simulations validate the theoretical framework, demonstrating the enhanced convergence speed and superior estimation accuracy compared with existing methods.

0

stat.ME 2026-07-02

Transfer learning yields consistent quantile regression estimators

by Gabriela Ciuperca

Transfert learning and adaptive LASSO quantile

Two L1 penalties from a source database estimator deliver sparsity and faster computation than standard adaptive LASSO.

abstract click to expand

We propose for a quantile regression an estimation method for transferring knowledge using two $L_1$ penalties based on an estimator obtained from a source database. The proposed transfer learning estimator satisfies the properties of consistency and sparsity. Its convergence rate and asymptotic behavior are studied in several scenarios. This knowledge transfer results in a shorter computation time than that of the standard adaptive LASSO estimator. Another advantage of our method is that it can be applied to models with non-Gaussian errors. In addition, in order to implement the computing of the adaptive transfer LASSO quantile estimator, we propose an algorithm. The simulations confirm the theoretical results and demonstrate that the adaptive learning estimator, calculated using the proposed algorithm, is more competitive than the LASSO estimators. Finally, we illustrate the practical utility of the proposed transfer learning estimator and algorithm using a real-data application involving the physicochemical properties of protein tertiary structures.

0

stat.ME 2026-07-02

Latent achievement enters self-efficacy regression via conditional copula

by Sarah Lee, Matias Quiroz +1 more

How does academic performance affect self-efficacy? Interpretable modelling through latent academic achievement

Formulation yields interpretable link and faster variable selection than joint model in mixed-scale data

abstract click to expand

There is increasing evidence of a directional relationship from academic performance to self-efficacy. We develop a Bayesian model for investigating this relationship when academic performance is measured on an ordinal scale and self-efficacy on a continuous scale. The model allows latent academic achievement to enter the self-efficacy regression as a predictor, while Bayesian variable selection identifies factors associated with either response. The resulting conditional formulation yields an interpretable regression characterisation of how latent academic achievement relates to self-efficacy. Furthermore, it enables a tailored partially collapsed Gibbs sampler that analytically integrates out the regression coefficients when updating the variable inclusion indicators. Simulation studies demonstrate that the proposed conditional formulation and tailored sampler improve sampling efficiency and variable-selection performance relative to a recent, more general joint Gaussian copula regression formulation. We apply the methodology to data from the longitudinal study of Australian children, a landmark national cohort study covering children's education, social and emotional wellbeing, health and family circumstances. The model and analysis shed light on how latent academic achievement relates to self-efficacy in Australian children, and reveal that the two outcomes differ markedly in the range of covariates associated with each outcome.

0

math.NA 2026-07-02

Symmetric CAEs yield more accurate latent trajectories in PDE models

by G. Li Causi, N.Tonicello +2 more

Convolutional Symmetric AutoEncoders: enhancing latent stability via differential geometry

Extending representation consistency to convolutional layers cuts reconstruction errors and boosts robustness on advection, Burgers and Kura

abstract click to expand

Autoencoders (AEs) have emerged as powerful tools for non-linear dimensionality reduction, often surpassing traditional linear methods such as Proper Orthogonal Decomposition (POD) in scenarios characterized by slowly decaying Kolmogorov $n$-widths. In the realm of Reduced-Order Modelling (ROM), these models are increasingly utilized to learn low-dimensional representations of solution manifolds associated with parametric Partial Differential Equations (PDEs). However, the high expressivity of AEs presents a challenge: although trained networks typically minimize reconstruction error, they often struggle to capture the essential properties necessary for building accurate and robust ROMs. Recent works by arXiv:2307.15288v2 and arXiv:2506.11641v1 have tackled this challenge in fully connected AEs by proposing representation-consistent architectures, which preserve some of the properties belonging to POD. This study builds upon that concept by extending representation consistency for convolutional layers. We introduce a novel class of symmetric Convolutional AutoEncoders (CAEs) designed to embody the primary properties of manifold parametrization mappings. When integrated into a ROM framework, this architecture demonstrates significantly improved predictive capabilities. Specifically, we compared the performance of the ROMs based on classical and symmetric CAEs on three one dimensional academic test cases, namely the Linear Advection, the Viscous Burger and the Kuramoto Sivashinsky equation. Numerical results demonstrate that our proposed symmetric approach consistently yields more accurate latent trajectories, lower reconstruction errors, and enhanced model robustness.

0

math.ST 2026-07-02

Approximating region contains full-conformal set in multi-task regression

by Davidson Lova Razafindrakoto (SAMM), Alain Celisse (SAMM) +1 more

Approximate full-conformal multi-task regression with reproducing kernels

The construction yields a computable region guaranteed to contain the exact full-conformal one, with a volume bound when task covariances ar

abstract click to expand

Multi-task regression aims at jointly solving multiple regression problems, called tasks. Compared to solving each task separately, better performances can be achieved as long as the tasks are sufficiently related. Full-conformal prediction is a framework that formulates a data-dependent prediction-region containing the unknown output-vector at any prescribed confidence level. However, explicit computation of this prediction-region is intractable in general since it requires training infinitely many predictors. The present work focuses on multi-task regression in a Reproducing Kernel Hilbert Space (RKHS) of vector-valued functions. This computational issue is addressed by designing an approximating predictionregion containing the full-conformal one. This construction is carried out in two scenarios: piq when the inter-task covariance-matrix is known, and piiq when this matrix is estimated. In terms of volume, the tightness of this approximation is assessed theoretically by means of an upper-bound in the first scenario. It is also empirically proved to improve upon the split-conformal prediction on synthetic data in both scenarios.

0

stat.CO 2026-07-02

MCMC proposals achieve near dimension-independent variance

by P. Dobson, J.M. Sanz-Serna +1 more

Optimal scaling of MCMC algorithms: exploiting the symmetry of the Metropolis-Hastings formula

Symmetry in the acceptance rule lets gradient-based proposals use variance O(1/d^μ) with μ arbitrarily small instead of the MALA rate of 1/3

abstract click to expand

We present a simple, yet general approach to study the scaling properties as the dimensionality of Metropolised MCMC sampling algorithms increases. The study relies ultimately on the symmetry of the Metropolis-Hastings formula. Our findings contain, as particular cases, many known results for the Random Walk Metropolis, MALA and other algorithms. In addition, they provide, in an easy way, new optimal scaling results for a variety of proposal mechanisms, including implicit proposals and proposals generated with the help of differential equation integrators. The analysis applies to targets that are products of a given, not necessarily univariate distribution, and also to cases where the different terms in the product are scaled differently. We show how to construct gradient-based MALA-like proposals where the variance of the proposal as the dimension $d$ increases may be taken as $O(1/d^\mu)$, with $\mu>0$ arbitrarily small, to be compared with the values $\mu = 1$ for Random Walk Metropolis and $\mu=1/3$ for MALA.

0

cs.LG 2026-07-02

Active-GRPO raises average SRxSim to 0.1773 by updating references

by Xuefeng Liu, Mingxuan Cao +4 more

Active-GRPO: Adaptive Imitation and Self-Improving Reasoning for Molecular Optimization

The method switches from imitation to self-reinforcement per instance and replaces the reference with better policy candidates to exceed pri

abstract click to expand

Scientific reasoning is an increasingly important capability of large language models, yet improving the robustness and efficiency of training such reasoning remains a key open challenge. We study this problem in instruction-based molecular optimization, where answer-only supervised fine-tuning (SFT) collapses multi-step reasoning and reinforcement learning with verifiable rewards (RLVR) suffers from sparse feedback. Reference-guided Policy Optimization mitigates both by anchoring policy updates to dataset-provided references, but its effectiveness is tightly coupled to reference quality: weak or misaligned references impose a performance ceiling. To overcome this ceiling, we propose active reasoning, a paradigm in which the policy actively decides, on a per-instance basis, when to imitate a reference and when to reinforce its own discoveries, while continuously upgrading what it imitates. We instantiate this paradigm as Active Group Relative Policy Optimization (Active-GRPO), realized through two coupled mechanisms: active imitate-reinforce and active referencing. The former performs imitation learning when the reference still outperforms the policy's own candidates, and shifts to self-improvement via reinforcement learning once the policy has generated molecules that surpass the reference. The latter continuously upgrades the reference itself by replacing it with the best policy-generated candidate discovered so far, progressively raising the imitation target and ensuring that reference guidance remains informative-rather than restrictive-throughout training. Across TOMG-Bench MOLOPT, Active-GRPO improves average SRxSim from 0.0959 for GRPO and 0.1665 for RePO to 0.1773 under matched three-seed evaluation, with statistically significant gains on LogP, MR, and QED.

0

cs.LG 2026-07-02

A staged framework uses SEM then OLS then double machine learning to check which survey…

by Ka Ching Chan, Qiana Liu +2 more

From Structural Equation Modelling to Double Machine Learning: Robustness Analysis for Survey-Based Research

Framework tests stability of relationships when moving from structural models to machine learning adjustments on construct scores

abstract click to expand

Structural equation modelling (SEM) is widely used in survey-based business and information systems research to assess latent constructs and theory-driven structural relationships. However, SEM path significance is obtained within a particular model specification and may not show whether findings remain stable under alternative estimation frameworks. This study develops and demonstrates a staged robustness analysis framework that connects SEM, ordinary least squares (OLS) regression, and Double Machine Learning (DML). SEM is first used to refine the measurement structure and estimate the robustness-baseline SEM model, in which the full theory-specified structural path system is retained for downstream robustness analysis before final structural path evaluation. OLS regression is then applied to SEM-derived construct scores as a transparent regression benchmark. Finally, DML-style residualisation is used to examine whether each tested focal relationship remains stable after flexible machine-learning-based adjustment for observed controls. Learner-sensitivity checks compare Random Forest, Gradient Boosting, and Support Vector Machine learners, and selected reverse-direction diagnostics are used to examine directional sensitivity. The framework is demonstrated using a FinTech Digital Customer Intimacy survey model. The findings identify which relationships are stable across SEM, OLS, and DML-style checks, and which require more cautious interpretation. A reproducible Google Colab workbook and generated result files are publicly available, providing a reusable template that researchers and students can adapt to other survey-based latent-construct studies. The paper contributes a practical robustness workflow and interpretation guide for survey-based researchers seeking to complement SEM with conventional and machine-learning-based robustness checks.

0

cs.LG 2026-07-02

Prototype LMs match dense baselines while attributing data 500x faster

by Dan Ley, Giang Nguyen +2 more

Prototype Language Models

Sparse mixtures of learned prototypes keep accuracy within 2.5 points and localize influence to training neighborhoods.

abstract click to expand

Knowing which training examples drive outputs is fundamental to auditing, correcting, and understanding language models, yet for modern LLMs this remains expensive, approximate, and largely post-hoc. Standard language models generate tokens through a dense network pathway, causing training data's influence to be distributed across parameters rather than organized along explicit, traceable components. We introduce a prototype language model architecture, Prototypes for Interpretable Sequence Modeling (PRISM), that forms each prediction via a sparse, non-negative mixture of learned prototypes, trained with clustering objectives that anchor each prototype to coherent neighborhoods of training examples. Across architectures from 130M to 1.6B parameters trained on up to 50B tokens, prototype language models either surpass or remain within 2.5 percentage points on average downstream accuracy of matched dense baselines. We show that sparse prototype structure localizes curvature in the loss landscape, yielding a more tractable Hessian and enabling training data attribution that is ~500x faster than post hoc baselines when consuming equivalent memory. Calibrating linear prototype controllers can improve downstream accuracy by roughly 3 points while tracing those corrections back to training neighborhoods, and targeted prototype suppression can remove model behaviors without finetuning or measurable loss in generation quality.

0

cs.LG 2026-07-02

Linear transformers map context distributions to responses at dim-free rates

by Peilin Liu, Ding-Xuan Zhou

Ghost in the Kernel: In-Context Learning with Efficient Transformers via Domain Generalization

Domain generalization analysis shows how linear attention achieves in-context learning and informs activation choices for large-model linear

abstract click to expand

Transformer-based large models have demonstrated remarkable generalization abilities across different tasks by leveraging a context-aware attention module for in-context learning. With richer context, transformers adapt more effectively to the current use case without any parameter updates. However, the quadratic computational and memory complexity with respect to context length significantly slows data processing in softmax transformers. Linear transformers were proposed to address this issue by reducing the complexity to linear dependence on context length, but the design and understanding of the feature mapping in linear attention, from a theoretical viewpoint, remain unclear. In this paper, we investigate the approximation and generalization abilities of linear transformers under a two-staged sampling process from domain generalization. We show that linear transformers perform in-context learning as learning a mapping from context distributions to response functions. A dimension-independent convergence rate is obtained for our generalization analysis, which also exhibits the tradeoff between the regularities of data distributions and latent features. Guided by our theoretical framework, we propose a new perspective on activation and loss design for linearizing pretrained softmax large language models.

0

stat.ML 2026-07-02

Neural network recovers time-varying AR coefficients for forecasts

by Agnieszka Kopeć, Pawe{l} Przyby{l}owicz +1 more

Neural Network-Based Estimation of Time-Dependent Parameters in AR(p) Processes

The method keeps an explicit parametric structure while estimating changing coefficients and producing intervals under two noise distributio

abstract click to expand

We investigate a forecasting framework based on a simple discrete-time dynamic model with coefficients varying in time. The parameters of the model are recovered within a deep learning framework, which makes it possible to retain a transparent parametric structure while simultaneously accounting for complex and nonstationary patterns in the observed phenomenon. Our analysis covers two specifications of the noise process. Besides the standard Gaussian setting, we also consider Laplace-distributed noise, which can offer a more adequate description in the presence of heavier tails and sharper local fluctuations. For both cases, we formulate the predictive scheme of the model and analyze the associated uncertainty quantification, including the construction of prediction intervals. The results illustrate that a relatively simple model, when combined with time-dependent parameter estimation, can serve as a mathematically tractable and practically flexible tool for forecasting complex dynamics under different noise assumptions. The general model is stated for TVAR($p$), while the prediction-interval formulas and the numerical experiments are developed for the TVAR(1) case.

0

stat.ME 2026-07-02

Distributed estimator reaches two-phase minimax rates for unidentifiable prediction

by Erbo Li, Zhaojun Hu +3 more

Distributed Prediction under Heterogeneity with Unidentifiable Parameter

Trace-similarity penalty and invex relaxation deliver model-free bounds with lower communication cost.

abstract click to expand

Predicting a response based on covariates is a fundamental problem in statistics and machine learning. However, profound difficulties arise when the underlying low-dimensional structural parameters are unidentifiable, as typified in dimension reduction contexts. Specifically,estimating these non-identifiable parameters inherently introduces severe nonconvexity. In distributed settings, this difficulty is further compounded by the challenges of data heterogeneity and communication cost. To overcome these intertwined barriers, we propose a novel distributed semiparametric framework. We formulate an adaptive homogeneity pursuit utilizing a trace-similarity penalty to effectively address data heterogeneity. To resolve the ensuing severe nonconvexity and communication bottlenecks, we introduce an invex relaxation technique coupled with a multi-step local update algorithm, ensuring stable convergence to global optimality with significantly reduced communication overhead. Theoretically, we establish a non-asymptotic model-free prediction error bound and prove that our estimator achieves a two-phase minimax optimal convergence rate and an sharper model-free prediction error bound. Furthermore, we provide theoretical guarantees for algorithmic convergence and communication efficiency. Extensive simulations and a real-world multi-center medical application validate the superiority of our method.

0

stat.ME 2026-07-02

Intervals achieve nominal coverage for risk difference in paired data

by Jia Zhou, Chang-Xing Ma

Confidence Intervals for the Risk Difference in Combined Unilateral and Bilateral Data Incorporating a Distribution-Based Approach

The distribution-based method matches existing widths while reflecting skewness that asymptotic approaches miss in small samples.

abstract click to expand

Combined unilateral and bilateral binary outcomes frequently arise in studies involving paired organs. The risk difference is a clinically interpretable measure for comparing treatment effects between groups. Existing confidence interval methods are primarily based on asymptotic normality and may fail to adequately reflect finite-sample distributional features, particularly skewness. To address this issue, we propose a distribution-based confidence interval derived from the probability distribution of the risk difference estimator and a modified MOVER procedure that accounts for intra-subject correlation. Their performances are compared with those of commonly used asymptotic methods through extensive simulation studies. Across a broad range of parameter settings, all methods exhibited satisfactory performance as sample size increased. The proposed distribution-based interval achieved coverage probabilities close to the nominal level with interval widths comparable to those of existing procedures. In small sample settings, it was able to capture skewness in the sampling distribution that was not reflected by methods relying on asymptotic normality. Analyses of two real-world datasets demonstrated the practical applicability of the competing methods and yielded consistent inferential conclusions. The proposed approach provides an alternative framework for interval estimation of the risk difference in studies involving combined unilateral and bilateral binary outcomes.

0

stat.ME 2026-07-02

Selective borrowing lets hybrid trials use external data safely

by Ke Zhu, Hairong Huang +2 more

Robust Estimation and Inference with Selective Borrowing in Hybrid Controlled Trials: A Tutorial with SelectiveIntegrative and intFRT

Tutorial workflow covers eligibility alignment, matching and selective strategies with R packages to improve efficiency while keeping infere

abstract click to expand

Hybrid controlled trials (HCTs) augment randomized controlled trials (RCTs) with external controls (ECs) to improve statistical efficiency when RCTs face limited sample sizes, slow accrual, or ethical constraints. However, valid use of ECs requires careful adjustment for covariate shift and outcome drift, as inappropriate borrowing may introduce bias and compromise inference. This tutorial provides a practical workflow for estimation and inference in HCTs. We first present a statistical analysis roadmap covering estimands, identification assumptions, eligibility alignment, matching, full and selective borrowing strategies, and both asymptotic inference and randomization tests. We then demonstrate step-by-step implementation using the SelectiveIntegrative and intFRT packages. The workflow is illustrated using a synthetic lung cancer dataset included in the intFRT package that mimics the CALGB 9633 trial and ECs from the National Cancer Database. The tutorial aims to help applied statisticians conduct transparent, interpretable, and reproducible HCT analyses that improve efficiency while maintaining valid inference.

0

stat.AP 2026-07-02

Reverse-martingale RNN matches forecast skill while warning of drought ahead of SPI-3

by Hui-Mean Foo, Yuan-chin Ivan Chang

Coupling Precipitation Forecasting and Early Warning with Reverse-Martingale Recurrent Neural Networks

The reconstruction defect from the backward-coherence penalty acts as a change detector that precedes the standard index in several climates

abstract click to expand

Precipitation forecasts are judged by accuracy, but the decisions they support -- when to restrict water, when to warn of drought -- turn on noticing when a local regime is becoming abnormal, which forecast scores alone do not reveal. We ask whether one recurrent model can do both with little or no loss in forecast skill. We add a backward-coherence (reverse-martingale) penalty that keeps the network's hidden state smooth when read backward in time; the size of the resulting reconstruction defect becomes an online warning signal, monitored by a sequential change-point detector. The design is deliberately conservative. On real daily station data from four contrasting climates -- monsoonal Taiwan, semi-arid Texas, temperate Germany, and Mediterranean Anatolia (Turkey) -- the model matches a standard network's forecast skill everywhere, and makes the hidden state markedly steadier in every region. The novelty is the added information: on these real droughts the signal can alarm well ahead of the operational SPI-3 index, giving lead that neither the forecast nor the index provides. This benefit is not uniform across the four regions -- large in one, partial in two others, and near-absent in the fourth. We offer the hydroclimatic character of drought onset, whether it precedes or merely coincides with the rainfall deficit, as a plausible explanation to be tested in future work, supported by a controlled synthetic study with known onset times. The contribution is thus a new and conservative way to read precipitation records: no loss in forecast skill, a steadier model, and an early-warning signal beyond the standard index.

0

math.ST 2026-07-02

Three-stage estimator consistent for hybrid Lévy switching SDEs

by Yuzhong Cheng

Ergodicity and High-Frequency Inference for Hybrid Switching L\'{e}vy-Driven Stochastic Differential Equations

Joint normality couples drift and scale via third moment of Lévy noise while switching rates stay uncorrelated

abstract click to expand

Hybrid switching L\'evy-driven stochastic differential equations with pure-jump noise and state-dependent switching rates are studied under high-frequency observation. A three-stage inference procedure is proposed for the drift, scale, and switching-rate parameters, combining a staged Gaussian quasi-likelihood with an intensity-type contrast. Checkable sufficient conditions for weighted exponential ergodicity are established for the hybrid process; the proof does not rely on Brownian smoothing, but uses a fixed skeleton-chain argument combining small-jump accessibility and regime connectivity. Under ergodicity and the high-frequency sampling scheme, consistency, joint asymptotic normality, and a polynomial-type large deviation inequality are proved for the full estimator. The joint limit exhibits a transparent covariance structure: the drift and scale blocks are coupled through the third moment of the driving L\'evy noise, whereas the switching-rate block is asymptotically uncorrelated with the continuous-coefficient blocks. Numerical experiments for models driven by normal inverse Gaussian noise illustrate the finite-sample behavior of the proposed estimators.

0

stat.ML 2026-07-02

FNOs achieve polynomial sample complexity on dissipative PDE operators

by Nisha Chandramoorthy, Daniel Sanz-Alonso +1 more

From Spectral Methods to Sample Complexity Bounds for Fourier Neural Operators

Bounds hold uniformly over families of equations when operators admit spectral discretizations, with rates set by smoothness and dimension.

abstract click to expand

We establish approximation and learning guarantees for Fourier neural operators (FNOs) applied to time-$T$ solution operators of dissipative evolution equations. The analysis builds on the premise that FNOs can efficiently approximate and learn solution operators whenever these operators admit stable and accurate spectral discretizations. To formalize this idea, we introduce classes of evolution operators defined through spectral methods and derive FNO approximation bounds and polynomial sample complexity guarantees for these classes. For equations with polynomial nonlinearities, the learning rates depend primarily on the smoothness of the input space and the dimension of the physical domain. Our results hold uniformly over broad families of dissipative equations, rather than for a single fixed PDE, and apply in particular to the Navier--Stokes, Allen--Cahn, and Cahn--Hilliard equations. For equations with non-polynomial smooth nonlinearities, we prove that polynomial sample complexity still holds with rates that now additionally depend on the smoothness of the nonlinear terms and the dissipation strength. Overall, we connect classical spectral approximation theory with modern operator learning and explain when FNOs can learn nonlinear evolution operators efficiently.

0

stat.AP 2026-07-02

Economic variables predict suicide rates in western counties

by Noah Jackson, Sergey Lapin

Economic Disparities and Their Relationship to Destructive Health Behaviors in Five Western U.S. States

LASSO and correlation analysis on five-state data rank predictors and highlight key links to adverse health outcomes

abstract click to expand

In this paper, we look at the relationships that economic variables have with adverse health outcomes in the western counties of Washington, Idaho, Oregon, California, and Nevada, with specific emphasis on how suicide rate relates to such economic variables. Data was first gathered from Census and County Health Rankings for the entire United States (for website use and usefulness for future research), cleaned and regression-imputed, and then various exploratory data analysis methods were used, such as PCA, clustering, correlation gathering, linear fittings, and LASSO. PCA and clustering suggested that counties may group according to broader state-level economic patterns, although political interpretations would require additional electoral data. Correlation Analysis along with LASSO and linear fittings showed us the destructive variables that connected the most with economic variables (in terms of $R^2$ and correlation values seen), the economic variables that are most and least important in predicting suicide rate, and the possible relationships that suicide rate has with these economic variables.

0

econ.EM 2026-07-02

Valid intervals for network densities survive group selection from data

by Eric Auerbach, Jonathan Auerbach +1 more

Post-selection inference for network structure

Two methods ensure coverage when communities or markets are identified using the observed connections themselves.

abstract click to expand

Researchers often use the density of connections between groups of agents, such as communities, blocs, or markets, to characterize the structure of a social or economic network. In many cases, these groups are selected using the network data, making conventional fixed-group inference procedures potentially invalid. To address this issue, we develop two new confidence intervals that are universally valid post-selection in the sense that they guarantee simultaneous coverage asymptotically over all pairs of groups whose relative sizes do not vanish. Our first interval builds on a strategy of \cite{berk2013valid}. Our second interval is based on a Talagrand-type concentration inequality for empirical processes. Both intervals are simple to compute and scalable to large networks, but a key technical contribution of our paper is show that only the second interval achieves the best-possible width asymptotically up to a constant factor. Three empirical illustrations show that accounting for selection can matter in practice. Some evidence for homophily in a social network and a hub-and-spoke structure in a trade network survives our correction, while evidence for disjoint market segments in a worker transition network does not.

0