Predictive Conformal Slip Monitoring: An Empirical Evaluation of Rolling Split Conformal Prediction for Pre-Incident Traction Loss Detection
Pith reviewed 2026-07-03 17:14 UTC · model grok-4.3
The pith
Rolling split conformal prediction on slip residuals yields zero precision and recall for pre-incident traction loss detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across 19 drivers and 55,563 test-phase telemetry samples the rolling-volatility detector achieves mean precision of essentially 0.0 and mean recall of 0.0 against 14 ground-truth incidents while flagging on average 15.3 percent of all samples as anomalous. A static 95th-percentile threshold baseline performs no better. Residual autocorrelation diagnostics show the split-conformal exchangeability assumption is violated for every driver.
What carries the argument
Rolling split conformal prediction that tracks volatility of non-conformity residuals from a per-driver random forest model of expected slip behavior.
If this is right
- The conformal-volatility formulation adds no value over a static threshold for early-warning use.
- High false-alarm rates render the detector unsuitable for pre-incident intervention.
- Violation of exchangeability in every driver is a plausible cause of the observed false-alarm rate.
- Correcting the slip proxy and using all laps rather than only fastest laps still produces a negative result.
Where Pith is reading between the lines
- Methods that explicitly model temporal dependence would be needed before conformal residuals could serve as reliable early signals in sequential telemetry.
- The same rolling-volatility construction might succeed in domains where exchangeability holds more closely than in vehicle dynamics.
- Alternative non-conformity scores or windowing schemes could be tested on the same labeled incidents to isolate the role of the exchangeability violation.
Load-bearing premise
The telemetry time series satisfies the exchangeability assumption required for valid split conformal prediction.
What would settle it
A dataset of driving telemetry in which the Ljung-Box test does not reject exchangeability and the rolling-volatility detector reaches usable precision and recall on timestamped traction-loss events.
read the original abstract
Conventional traction control architectures intervene only after the adhesion limit of a tire has already been breached. This paper investigates whether Rolling Split Conformal Prediction , monitoring the volatility of non-conformity residuals from a per-driver Random Forest model of expected slip behavior , can serve as a statistically grounded pre-incident warning signal, ahead of gross traction loss. Unlike an earlier internal draft of this work, the evaluation reported here corrects a confound in the slip proxy (vehicle speed is included as an explicit model feature, not left implicit in the target's denominator), uses every racing lap for each driver rather than only the fastest lap, and is scored against real, timestamped incident labels extracted from FIA Race Control Messages and track-limits lap deletions rather than narrated post-hoc. The result is negative: across 19 drivers and 55,563 test-phase telemetry samples, the rolling-volatility detector achieves a mean precision of essentially 0.0 and mean recall of 0.0 against 14 ground-truth incidents, while flagging on average 15.3% of all samples as anomalous , too high a false-alarm rate for any early-warning use. A static 95th-percentile threshold baseline performs no better in any way that would justify the added complexity of the conformal-volatility formulation. Residual autocorrelation diagnostics show the split-conformal exchangeability assumption is violated for every driver (Ljung-Box p < 0.001, n = 19/19), which is one plausible driver of the high false-alarm rate. We report this as a methodologically rigorous negative finding, diagnose its likely causes, and outline what a genuinely predictive version of this approach would require.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates Rolling Split Conformal Prediction for pre-incident traction-loss detection in motorsport telemetry. Per-driver Random Forest models predict expected slip (with vehicle speed as an explicit feature), non-conformity residuals are monitored via rolling volatility, and alarms are issued against real FIA Race Control incident labels. Across 19 drivers and 55,563 test samples the method yields mean precision and recall of essentially 0.0 on 14 ground-truth incidents while flagging 15.3 % of samples; a static 95th-percentile baseline performs no better. Ljung-Box tests reject exchangeability for all drivers (p < 0.001), which the authors identify as a likely cause of the high false-alarm rate. The work is presented as a methodologically corrected negative result.
Significance. If the negative finding holds, the paper supplies a clear, externally validated demonstration that split-conformal volatility monitoring does not yield usable pre-incident warnings under realistic autocorrelated telemetry conditions. Strengths include the use of timestamped FIA labels rather than post-hoc narration, explicit correction of the speed-feature confound, evaluation on every lap, and direct statistical diagnosis of the exchangeability violation. The result therefore functions as a useful cautionary benchmark for conformal methods applied to safety-critical time series.
minor comments (3)
- §4 (or equivalent methods section): the precise definition of the rolling window length, the non-conformity score, and the volatility aggregation function should be stated explicitly with equations or pseudocode so that the implementation is fully reproducible from the text alone.
- Table 1 or results section: report the exact per-driver precision and recall values (not only the means of “essentially 0.0”) together with the number of incidents per driver to allow readers to assess variability.
- The Ljung-Box diagnostics are reported only as p < 0.001; adding the test statistic and lag order used would strengthen the claim that autocorrelation is the dominant driver of the false-alarm rate.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript, accurate summary of the negative result, and recommendation to accept. The work is presented as a methodologically corrected negative finding on rolling split conformal prediction for pre-incident detection, and we appreciate the recognition of its value as a cautionary benchmark under realistic autocorrelated conditions.
Circularity Check
No significant circularity identified
full rationale
The paper presents an empirical negative result on a conformal prediction detector for traction loss, evaluated against external FIA incident labels across 19 drivers and 55k samples. The reported zero precision/recall and 15.3% false-alarm rate follow directly from applying the detector to held-out telemetry and comparing to ground-truth timestamps; the Ljung-Box test (p<0.001) is a standard external diagnostic on residuals. No equations, fitted parameters, or self-citations are invoked to derive the outcome by construction, and the central claim remains falsifiable by the data rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Telemetry samples satisfy the exchangeability assumption required for valid split conformal prediction intervals
Reference graph
Works this paper leans on
-
[1]
Angelopoulos, A. N., & Bates, S. (2023). Conformal prediction: A gentle introduction. Foundations and Trends in Machine Learning, 16(4), 494–591
work page 2023
-
[2]
Shafer, G., & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421
work page 2008
-
[3]
Barber, R. F., Candès, E. J., Ramdas, A., & Tibshirani, R. J. (2023). Conformal prediction beyond exchangeability. The Annals of Statistics, 51(2), 816–845
work page 2023
-
[4]
Gibbs, I., & Candès, E. (2021). Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34
work page 2021
-
[5]
Politis, D. N., & Romano, J. P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89(428), 1303–1313
work page 1994
-
[6]
Ljung, G. M., & Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303
work page 1978
-
[7]
Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830
work page 2011
-
[8]
FastF1 Documentation. (2023). FastF1: A Python package for F1 telemetry and timing data. https://theoehrly.github.io/Fast-F1/
work page 2023
-
[9]
Milliken, W. F., & Milliken, D. L. (1995). Race Car Vehicle Dynamics. SAE International. Appendix A: Data and Code Availability The corrected analysis pipeline (proxy construction, ground-truth extraction from Race Control Messages, chronological split, block-bootstrap calibration, autocorrelation diagnostics, baseline detector, and sensitivity sweep) and...
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.