Introduces REST-ASMR multimodal dataset of PPG, stimuli, and continuous annotations for ASMR research, validated with 97% responder rate, significant agreement, PPG deceleration, and BiLSTM achieving 75.51% frame-level accuracy under strict subject-video independent 4-fold CV.
Mixed citations
A scalable tree boosting system
Mixed citation behavior. Most common role is method (54%).
citation-role summary
citation-polarity summary
co-cited works
representative citing papers
FLOATBench is a tabular benchmark dataset with 582,120 fatigue labels from 19,404 OpenFAST simulations of three 22 MW FOWT towers, featuring alpha-shape regime partitioning and three evaluation protocols for surrogate models.
EnergyAgentBench is a new benchmark with 70 task variants that evaluates LLM agents on live energy data for datacenter siting, long-horizon optimization, and causal grid diagnosis.
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
RAVEN proposes a regime-aware MoE architecture with cumulative importance thresholding and correlation-aware weighting to adaptively select temporal context for non-stationary financial forecasting.
Four self-stigma personas identified via LPA on 1,174 Reddit users; persona-conditioned LLMs achieve targeted shifts but experts prefer generic empathy baselines.
Domain-specific macOS features enable an ML detector to reach 98.5% accuracy on 41k samples and 99.5% on 9k fresh samples, beating prior methods by 16-50%.
Strong absolute accuracy on mixture properties often masks poor recovery of non-ideal behavior, with large drops under strict molecule splits, making transfer to unseen molecules the central challenge.
TabOrder learns unsupervised causal variable orderings and enforces them with order-constrained attention for tabular prediction and imputation under distribution shifts.
TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
An agentic AI workflow evolves an adaptive XGBoost quantile regression ensemble that reduces watershed-averaged forecast error by up to 29% versus California's operational forecasts for April-July runoff at 1-6 month leads across 23 Sierra Nevada sites.
A compositional algebraic decision diagram algorithm quantifies sensitivity in decision tree ensembles with certified error and confidence bounds, outperforming model counters on benchmarks.
GHGbench supplies a harmonized dataset and multi-task benchmark for company and building carbon emission prediction, with baselines showing large OOD gaps and benefits from multimodal embeddings.
V4FinBench is a new million-record benchmark where imbalance-aware finetuned TabPFN matches or beats gradient boosting on long-horizon bankruptcy prediction while Llama-3-8B lags, with evidence of transferable patterns to US data.
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
Introduces Calibrated Size Ratio (CSR) and confidence-weighted metrics to better detect overconfidence risk and calibration issues beyond the limitations of ECE.
Probabilistic PCA latent-space model with Bayesian inference reconstructs TNO near-IR spectra from photometry, achieving 95% credible-interval coverage and supporting taxonomy plus survey optimization.
LP2B encoding converts Lund plane jet representations into Bloch sphere qubit states, enabling a QTTN that matches classical LundNet performance on polarization tagging and W/top tagging with three orders of magnitude fewer parameters and improved low-data regime results.
First observation of electroweak photon plus two jets production yields a cross section of 202 fb consistent with the standard model prediction of 177 fb at greater than 5 sigma significance.
No excess observed; first LHC search sets 95% CL upper limits on H to AA to 4e branching fraction down to 10^{-5} for 10-100 MeV masses and short lifetimes.
FeDa4Fair is a new library and benchmark for creating federated datasets with heterogeneous client-level biases to standardize evaluation of fairness methods in federated learning.
Compares foundation models for probabilistic low-voltage load forecasting on 200 real feeders and introduces a grid-planning metric that scores peak prediction by its effect on asset cost-risk decisions.
citing papers explorer
-
A multimodal dataset of photoplethysmography and continuous behavioral responses to ASMR and nature videos
Introduces REST-ASMR multimodal dataset of PPG, stimuli, and continuous annotations for ASMR research, validated with 97% responder rate, significant agreement, PPG deceleration, and BiLSTM achieving 75.51% frame-level accuracy under strict subject-video independent 4-fold CV.
-
FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue
FLOATBench is a tabular benchmark dataset with 582,120 fatigue labels from 19,404 OpenFAST simulations of three 22 MW FOWT towers, featuring alpha-shape regime partitioning and three evaluation protocols for surrogate models.
-
EnergyAgentBench: Benchmarking LLM Agents on Live Energy Infrastructure Data
EnergyAgentBench is a new benchmark with 70 task variants that evaluates LLM agents on live energy data for datacenter siting, long-horizon optimization, and causal grid diagnosis.
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting
RAVEN proposes a regime-aware MoE architecture with cumulative importance thresholding and correlation-aware weighting to adaptively select temporal context for non-stationary financial forecasting.
-
Self-Stigma Is Not a Monolith, but Generic Empathy Is: Persona-Conditioned LLM Support for People Who Use Drugs
Four self-stigma personas identified via LPA on 1,174 Reddit users; persona-conditioned LLMs achieve targeted shifts but experts prefer generic empathy baselines.
-
The Role of Domain-Specific Features in Malware Detection: A macOS Case Study
Domain-specific macOS features enable an ML detector to reach 98.5% accuracy on 41k samples and 99.5% on 9k fresh samples, beating prior methods by 16-50%.
-
A Systematic Evaluation of Molecular Mixture Behavior Prediction
Strong absolute accuracy on mixture properties often masks poor recovery of non-ideal behavior, with large drops under strict molecule splits, making transfer to unseen molecules the central challenge.
-
Learning Causal Orderings for In-Context Tabular Prediction
TabOrder learns unsupervised causal variable orderings and enforces them with order-constrained attention for tabular prediction and imputation under distribution shifts.
-
TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics
TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.
-
TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data
TabPFN-MT is a multitask in-context learner for tabular data that sets a new state-of-the-art on deep multitask learning for datasets under 1000 samples while reducing inference cost from O(T) to O(1) passes.
-
Probabilistic Seasonal Streamflow Forecasting Across California's Sierra Nevada Watersheds with Agentic AI
An agentic AI workflow evolves an adaptive XGBoost quantile regression ensemble that reduces watershed-averaged forecast error by up to 29% versus California's operational forecasts for April-July runoff at 1-6 month leads across 23 Sierra Nevada sites.
-
Quantifying Sensitivity for Tree Ensembles: A symbolic and compositional approach
A compositional algebraic decision diagram algorithm quantifies sensitivity in decision tree ensembles with certified error and confidence bounds, outperforming model counters on benchmarks.
-
GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction
GHGbench supplies a harmonized dataset and multi-task benchmark for company and building carbon emission prediction, with baselines showing large OOD gaps and benefits from multimodal embeddings.
-
V4FinBench: Benchmarking Tabular Foundation Models, LLMs, and Standard Methods on Corporate Bankruptcy Prediction
V4FinBench is a new million-record benchmark where imbalance-aware finetuned TabPFN matches or beats gradient boosting on long-horizon bankruptcy prediction while Llama-3-8B lags, with evidence of transferable patterns to US data.
-
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
-
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to fall back to the original model.
-
Beyond ECE: Calibrated Size Ratio, Risk Assessment, and Confidence-Weighted Metrics
Introduces Calibrated Size Ratio (CSR) and confidence-weighted metrics to better detect overconfidence risk and calibration issues beyond the limitations of ECE.
-
Probabilistic Spectral Reconstruction of Trans-Neptunian Objects from Sparse Photometry: A Framework for Taxonomy, Survey Optimization, and Outlier Detection
Probabilistic PCA latent-space model with Bayesian inference reconstructs TNO near-IR spectra from photometry, achieving 95% credible-interval coverage and supporting taxonomy plus survey optimization.
-
Lund Plane to Bloch (LP2B) Encoding for Object and Polarization Tagging with Quantum Jet Substructure
LP2B encoding converts Lund plane jet representations into Bloch sphere qubit states, enabling a QTTN that matches classical LundNet performance on polarization tagging and W/top tagging with three orders of magnitude fewer parameters and improved low-data regime results.
-
Measurements of electroweak production of a photon in association with two jets in proton-proton collisions at $\sqrt{s}$ = 13 TeV
First observation of electroweak photon plus two jets production yields a cross section of 202 fb consistent with the standard model prediction of 177 fb at greater than 5 sigma significance.
-
Search for light pseudoscalar bosons, pair-produced in Higgs boson decays in the four-electron final state in proton-proton collisions at $\sqrt{s}$ = 13 TeV
No excess observed; first LHC search sets 95% CL upper limits on H to AA to 4e branching fraction down to 10^{-5} for 10-100 MeV masses and short lifetimes.
-
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
FeDa4Fair is a new library and benchmark for creating federated datasets with heterogeneous client-level biases to standardize evaluation of fairness methods in federated learning.
-
Probabilistic Low-Voltage Peak Load Forecasting with Time Series Foundation Models Evaluated on Application-Oriented Metrics
Compares foundation models for probabilistic low-voltage load forecasting on 200 real feeders and introduces a grid-planning metric that scores peak prediction by its effect on asset cost-risk decisions.
-
A Stationary-Distribution Theory for Triplet-Based Plateau Search in Random Forest Ensemble-Size Selection
Derives the stationary distribution and asymptotic scaling O(ε^{-2}) for ensemble size in a Markov chain model of triplet-based plateau tuning for random forests.
-
TAVR-VLM: Risk-Conditioned Causal Grounding for Hallucination-Resistant Report Generation
TAVR-VLM introduces Risk-Conditioned Causal Grounding Attention to achieve SOTA AUROC 0.896, CIDEr 0.936, and 8.1% hallucination rate on a 1,482-patient TAVR cohort.
-
A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks
SimPhysNet achieves 96.06% accuracy classifying laser welding penetration states using self-supervised contrastive learning with a physics-informed neural network and prototypical networks on only 200 labeled images.
-
PRecover 1.0: Process Rate Recovery with Machine Learning
Machine learning models recover most warm-rain and ice microphysical process rates from standard ICON model outputs for accumulation intervals of 10 minutes or less using a two-step classification-regression approach with calibrated uncertainty.
-
The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning
A LightGBM classifier trained on NWAY Bayesian matches identifies true Chandra-Gaia counterparts for 113k X-ray sources, flags 7k ambiguous cases, and attributes half of 20k separation-only matches to chance coincidences, validated at 95% on COUP without positional features.
-
FlowCLIP: Contrastive Pretraining Using Domain Names for Encrypted Traffic Classification
FlowCLIP applies contrastive pretraining with domain-name text supervision to learn transferable representations from QUIC traffic side-channel features, matching supervised performance on time-split evaluation.
-
When is Your LLM Steerable?
Early hidden state features from the first few tokens allow a GBDT classifier to predict activation steering success, under-steering, or over-steering with 0.7 macro-F1 on unseen concepts.
-
CRUMB: Efficient Prior Fitted Network Inference via Distributionally Matched Context Batching
CRUMB speeds up PFN inference on large tabular datasets by clustering queries and selecting MMD-matched context subsets, outperforming prior selection methods on the 51-dataset TabArena benchmark across three architectures while handling covariate drift.
-
SwiftCTS: Fast Cross-Design Prediction and Pareto Optimization of Clock Tree Metrics via Few-Shot Calibration
SwiftCTS combines physics-informed gradient-boosted models with K-shot multiplicative calibration to enable fast, low-error prediction and Pareto optimization of clock tree metrics on unseen macro architectures without retraining.
-
Beyond Explaining Predictions: Logic-Based Explanations for Confidence in Machine Learning Models
Defines MCT as the weakest confidence an abductive explanation can guarantee and proposes an optimization-based algorithm to generate minimal explanations meeting a target confidence threshold for boosted tree classifiers.
-
GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning
GIFT uses LLMs for factor-guided state enhancement, risk-rule reward shaping, and diagnostic refinement in PPO financial RL, then fixes the interface to improve out-of-sample risk-adjusted performance.
-
Differentially Private Synthetic Data via APIs 4: Tabular Data
Tab-PE extends Private Evolution to tabular data with heuristic operators, outperforming AIM by up to 10% classification accuracy and 28x speed on high-order correlation datasets under differential privacy.
-
Reactivity-Informed Machine Learning for Performance Prediction and Design Space Exploration of Alkali-Activated Slag
Machine learning on the largest curated alkali-activated slag dataset shows that average metal oxide dissociation energy serves as a compact, physically interpretable reactivity descriptor enabling strength prediction and low-emission design space exploration.
-
Towards Unified and Data-Efficient Prognostics and Health Management with Tabular Foundation Models
Tabular foundation models applied to PHM via signal-to-table conversion achieve the best average ranks across prognostic and diagnostic tasks and remain competitive in low-data regimes.
-
OpenRFM: Dissecting Relational In-Context Learning
OpenRFM combines a relational transformer backbone with a batch-level ICL layer and homophily-aware synthetic-plus-real pre-training to improve relational in-context learning by ~30% over prior open models and surpass KumoRFMv1.
-
Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection
Traj-Evolve combines non-parametric experience retrieval and multi-agent RL with a leave-one-out unification strategy to outperform baselines on lung cancer prediction from up to five years of multimodal EHRs, including in never-smokers.
-
FlagGAM: Rule-Basis Generalized Additive Models for Explainable Tabular Prediction
FlagGAM builds sparse univariate rule bases from features and feeds them into a restricted additive model, achieving competitive accuracy with superior robustness to missingness and noise on tabular benchmarks.
-
Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback
IGSR pairs LLM term generation with marginal influence scoring inside MCTS to discover symbolic equations, reporting gains on benchmarks and a novel DNA-methylation / RNA-Pol-II-pausing link in genomic data that wet-lab work later supported.
-
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation
Decentralized AI agent teams self-organize around hypotheses, critique proposals, and share knowledge to outperform single-agent baselines on biomedical ML, language-model optimization, and protein fitness tasks.
-
Natural Language Query to Configuration for Retrieval Agents
BRANE maps queries to optimal retrieval pipeline configurations using LLM-derived features and per-configuration correctness predictors, improving the cost-quality Pareto frontier on three benchmarks.
-
Geometry-Aware Tabular Diffusion
GATD adds explicit geometric relational supervision to tabular diffusion, achieving SOTA benchmark wins with substantially fewer parameters across ten datasets.
-
Proxy-Based Approximation of Shapley and Banzhaf Interactions
ProxySHAP approximates higher-order Shapley and Banzhaf interactions via tree proxies plus residual correction and a polynomial-time interventional TreeSHAP generalization for tree ensembles.
-
Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference
Two-stage LLM framework infers stellar parameters and ~20 elemental abundances from spectra, showing performance gains with increasing data volume.
-
Set-Valued Policy Learning
The paper develops set-valued policies and conformal policy learning methods that output treatment sets with marginal coverage guarantees for robust decision-making under uncertainty.
-
Nonparametric inference for sublevel-set probabilities of conditional average treatment effect functions
Develops Grenander-type and debiased machine learning estimators for the sublevel-set probability curve of the CATE function, shown to be non-pathwise differentiable, along with its piecewise linear approximation.
-
Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems
Semantic segmentation decomposes monitoring features into canonical and residual components that concentrate fault-predictive information while preserving operational meaning in predictive maintenance.