Predictive Conformal Slip Monitoring: An Empirical Evaluation of Rolling Split Conformal Prediction for Pre-Incident Traction Loss Detection

Varshith Roy Kotla

arxiv: 2607.02124 · v1 · pith:BHIUFXA4new · submitted 2026-07-02 · 💻 cs.LG · stat.AP

Predictive Conformal Slip Monitoring: An Empirical Evaluation of Rolling Split Conformal Prediction for Pre-Incident Traction Loss Detection

Varshith Roy Kotla This is my paper

Pith reviewed 2026-07-03 17:14 UTC · model grok-4.3

classification 💻 cs.LG stat.AP

keywords conformal predictiontraction lossslip monitoringracing telemetrynegative resultexchangeabilityrandom forest

0 comments

The pith

Rolling split conformal prediction on slip residuals yields zero precision and recall for pre-incident traction loss detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether monitoring volatility in non-conformity residuals from per-driver random forest models of expected slip can flag traction loss before it occurs. Evaluation on 19 drivers and 55,563 telemetry samples against 14 real incidents produces mean precision and recall of essentially zero while labeling 15.3 percent of samples as anomalous. A simpler static percentile threshold matches this performance. The exchangeability assumption required by split conformal prediction is violated in every driver's time series.

Core claim

Across 19 drivers and 55,563 test-phase telemetry samples the rolling-volatility detector achieves mean precision of essentially 0.0 and mean recall of 0.0 against 14 ground-truth incidents while flagging on average 15.3 percent of all samples as anomalous. A static 95th-percentile threshold baseline performs no better. Residual autocorrelation diagnostics show the split-conformal exchangeability assumption is violated for every driver.

What carries the argument

Rolling split conformal prediction that tracks volatility of non-conformity residuals from a per-driver random forest model of expected slip behavior.

If this is right

The conformal-volatility formulation adds no value over a static threshold for early-warning use.
High false-alarm rates render the detector unsuitable for pre-incident intervention.
Violation of exchangeability in every driver is a plausible cause of the observed false-alarm rate.
Correcting the slip proxy and using all laps rather than only fastest laps still produces a negative result.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Methods that explicitly model temporal dependence would be needed before conformal residuals could serve as reliable early signals in sequential telemetry.
The same rolling-volatility construction might succeed in domains where exchangeability holds more closely than in vehicle dynamics.
Alternative non-conformity scores or windowing schemes could be tested on the same labeled incidents to isolate the role of the exchangeability violation.

Load-bearing premise

The telemetry time series satisfies the exchangeability assumption required for valid split conformal prediction.

What would settle it

A dataset of driving telemetry in which the Ljung-Box test does not reject exchangeability and the rolling-volatility detector reaches usable precision and recall on timestamped traction-loss events.

read the original abstract

Conventional traction control architectures intervene only after the adhesion limit of a tire has already been breached. This paper investigates whether Rolling Split Conformal Prediction , monitoring the volatility of non-conformity residuals from a per-driver Random Forest model of expected slip behavior , can serve as a statistically grounded pre-incident warning signal, ahead of gross traction loss. Unlike an earlier internal draft of this work, the evaluation reported here corrects a confound in the slip proxy (vehicle speed is included as an explicit model feature, not left implicit in the target's denominator), uses every racing lap for each driver rather than only the fastest lap, and is scored against real, timestamped incident labels extracted from FIA Race Control Messages and track-limits lap deletions rather than narrated post-hoc. The result is negative: across 19 drivers and 55,563 test-phase telemetry samples, the rolling-volatility detector achieves a mean precision of essentially 0.0 and mean recall of 0.0 against 14 ground-truth incidents, while flagging on average 15.3% of all samples as anomalous , too high a false-alarm rate for any early-warning use. A static 95th-percentile threshold baseline performs no better in any way that would justify the added complexity of the conformal-volatility formulation. Residual autocorrelation diagnostics show the split-conformal exchangeability assumption is violated for every driver (Ljung-Box p < 0.001, n = 19/19), which is one plausible driver of the high false-alarm rate. We report this as a methodologically rigorous negative finding, diagnose its likely causes, and outline what a genuinely predictive version of this approach would require.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a careful negative result: rolling split conformal prediction on slip residuals fails to flag real traction incidents in racing telemetry, with zero precision and recall plus 15% false alarms, due to clear autocorrelation violating exchangeability.

read the letter

The core finding is that this conformal-volatility detector adds nothing useful over a simple percentile threshold. On 55k test samples from 19 drivers it misses all 14 ground-truth FIA incidents while raising alarms on 15% of the data. The Ljung-Box test flags residual autocorrelation for every driver, which directly explains why the exchangeability assumption does not hold and why the method produces too many false positives.

The work improves on the authors' earlier internal version by adding vehicle speed as an explicit feature, using every lap rather than only the fastest, and scoring against timestamped real incident labels instead of narrated ones. Those corrections make the negative outcome more credible. The comparison to the static baseline is straightforward and shows the conformal wrapper brings no gain here.

The main limitation is narrow scope: the paper tests one specific nonconformity score and one base model on this dataset. It does not try alternative scores or ways to handle the dependence, so readers cannot tell whether the failure is fundamental to conformal prediction on autocorrelated telemetry or just to this implementation. A bit more detail on random-forest training and exact threshold selection would also help anyone who wants to reproduce or extend the experiment.

The paper is worth a referee's time. It is a clean empirical boundary on applying standard rolling conformal methods to dependent vehicle-dynamics data, and the diagnostics are honest. Researchers working on time-series conformal prediction or safety monitoring in motorsport will get value from seeing exactly where the usual assumptions break.

Referee Report

0 major / 3 minor

Summary. The manuscript evaluates Rolling Split Conformal Prediction for pre-incident traction-loss detection in motorsport telemetry. Per-driver Random Forest models predict expected slip (with vehicle speed as an explicit feature), non-conformity residuals are monitored via rolling volatility, and alarms are issued against real FIA Race Control incident labels. Across 19 drivers and 55,563 test samples the method yields mean precision and recall of essentially 0.0 on 14 ground-truth incidents while flagging 15.3 % of samples; a static 95th-percentile baseline performs no better. Ljung-Box tests reject exchangeability for all drivers (p < 0.001), which the authors identify as a likely cause of the high false-alarm rate. The work is presented as a methodologically corrected negative result.

Significance. If the negative finding holds, the paper supplies a clear, externally validated demonstration that split-conformal volatility monitoring does not yield usable pre-incident warnings under realistic autocorrelated telemetry conditions. Strengths include the use of timestamped FIA labels rather than post-hoc narration, explicit correction of the speed-feature confound, evaluation on every lap, and direct statistical diagnosis of the exchangeability violation. The result therefore functions as a useful cautionary benchmark for conformal methods applied to safety-critical time series.

minor comments (3)

§4 (or equivalent methods section): the precise definition of the rolling window length, the non-conformity score, and the volatility aggregation function should be stated explicitly with equations or pseudocode so that the implementation is fully reproducible from the text alone.
Table 1 or results section: report the exact per-driver precision and recall values (not only the means of “essentially 0.0”) together with the number of incidents per driver to allow readers to assess variability.
The Ljung-Box diagnostics are reported only as p < 0.001; adding the test statistic and lag order used would strengthen the claim that autocorrelation is the dominant driver of the false-alarm rate.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, accurate summary of the negative result, and recommendation to accept. The work is presented as a methodologically corrected negative finding on rolling split conformal prediction for pre-incident detection, and we appreciate the recognition of its value as a cautionary benchmark under realistic autocorrelated conditions.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents an empirical negative result on a conformal prediction detector for traction loss, evaluated against external FIA incident labels across 19 drivers and 55k samples. The reported zero precision/recall and 15.3% false-alarm rate follow directly from applying the detector to held-out telemetry and comparing to ground-truth timestamps; the Ljung-Box test (p<0.001) is a standard external diagnostic on residuals. No equations, fitted parameters, or self-citations are invoked to derive the outcome by construction, and the central claim remains falsifiable by the data rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The negative claim rests on the validity of the real incident labels extracted from FIA messages, the per-driver Random Forest models, and the Ljung-Box test for autocorrelation; the method itself assumes exchangeability which the paper shows is violated.

axioms (1)

domain assumption Telemetry samples satisfy the exchangeability assumption required for valid split conformal prediction intervals
Invoked in the rolling split conformal prediction formulation and explicitly tested (and rejected) via Ljung-Box p < 0.001 for n=19/19 drivers

pith-pipeline@v0.9.1-grok · 5839 in / 1299 out tokens · 22906 ms · 2026-07-03T17:14:52.905265+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

N., & Bates, S

Angelopoulos, A. N., & Bates, S. (2023). Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 16(4), 494–591

work page 2023
[2]

Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421

work page 2008
[3]

F., Candès, E

Barber, R. F., Candès, E. J., Ramdas, A., & Tibshirani, R. J. (2023). Conformal prediction beyond exchangeability. The Annals of Statistics, 51(2), 816–845

work page 2023
[4]

Gibbs, I., & Candès, E. (2021). Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34

work page 2021
[5]

N., & Romano, J

Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303–1313

work page 1994
[6]

M., & Box, G

Ljung, G. M., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303

work page 1978
[7]

Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830

work page 2011
[8]

FastF1 Documentation. (2023). FastF1: A Python package for F1 telemetry and timing data. https://theoehrly.github.io/Fast-F1/

work page 2023
[9]

F., & Milliken, D

Milliken, W. F., & Milliken, D. L. (1995). Race Car Vehicle Dynamics. SAE International. Appendix A: Data and Code Availability The corrected analysis pipeline (proxy construction, ground-truth extraction from Race Control Messages, chronological split, block-bootstrap calibration, autocorrelation diagnostics, baseline detector, and sensitivity sweep) and...

work page 1995

[1] [1]

N., & Bates, S

Angelopoulos, A. N., & Bates, S. (2023). Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 16(4), 494–591

work page 2023

[2] [2]

Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421

work page 2008

[3] [3]

F., Candès, E

Barber, R. F., Candès, E. J., Ramdas, A., & Tibshirani, R. J. (2023). Conformal prediction beyond exchangeability. The Annals of Statistics, 51(2), 816–845

work page 2023

[4] [4]

Gibbs, I., & Candès, E. (2021). Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34

work page 2021

[5] [5]

N., & Romano, J

Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303–1313

work page 1994

[6] [6]

M., & Box, G

Ljung, G. M., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303

work page 1978

[7] [7]

Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830

work page 2011

[8] [8]

FastF1 Documentation. (2023). FastF1: A Python package for F1 telemetry and timing data. https://theoehrly.github.io/Fast-F1/

work page 2023

[9] [9]

F., & Milliken, D

Milliken, W. F., & Milliken, D. L. (1995). Race Car Vehicle Dynamics. SAE International. Appendix A: Data and Code Availability The corrected analysis pipeline (proxy construction, ground-truth extraction from Race Control Messages, chronological split, block-bootstrap calibration, autocorrelation diagnostics, baseline detector, and sensitivity sweep) and...

work page 1995