pith. sign in

arxiv: 2606.09478 · v1 · pith:7RIKRGQFnew · submitted 2026-06-08 · 💱 q-fin.TR · q-fin.CP· q-fin.MF

Volatility Forecasting and Return Prediction under Market Regimes: Evidence from High-Frequency Chinese Equity Data

Pith reviewed 2026-06-27 13:55 UTC · model grok-4.3

classification 💱 q-fin.TR q-fin.CPq-fin.MF
keywords volatility forecastingmarket regimesreturn predictionhigh-frequency dataChinese equityHARQ modelsXGBoosttrading strategies
0
0 comments X

The pith

Regime-aware volatility models outperform standard forecasts on high-frequency Chinese equity data, while return prediction stays weak except in low-volatility states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether identifying market regimes with a switching model and feeding those regimes into volatility and return models can improve forecasts and trading results. It shows that adding regime information to HARQ volatility models raises accuracy over plain versions across standard metrics. Return forecasts remain unreliable overall and work mainly when markets are calm, so only filtered strategies that scale volatility, gate on low regimes, and control turnover deliver better results after costs. The work implies that prediction systems gain practical value by turning state-dependent signals into allocation rules rather than seeking strong unconditional forecasts.

Core claim

Using high-frequency CSI 300 Index data from 2005 to 2023, a two-stage framework first models realized volatility with regime-augmented HARQ specifications combined with Markov-switching GJR-GARCH to capture regimes, then inputs the volatility forecasts, regime indicators, and other predictors into an XGBoost model for return prediction under a strict walk-forward out-of-sample setup. Regime-aware volatility forecasting consistently outperforms baseline HARQ models across metrics and passes formal tests, while return predictability is weak, state-dependent, and concentrated in low-volatility regimes. Naive trading strategies fail after transaction costs, but versions with volatility scaling,

What carries the argument

The sequential two-stage framework that augments HARQ volatility models with Markov-switching GJR-GARCH regime filtering and then uses those outputs plus regime indicators inside an XGBoost return predictor estimated via walk-forward validation.

Load-bearing premise

The Markov-switching GJR-GARCH model correctly identifies distinct market regimes that are stable enough to be useful for out-of-sample forecasting and that the walk-forward procedure with the chosen hyperparameters does not overfit the regime classification or the XGBoost model on the specific Chinese data period.

What would settle it

Re-running the identical walk-forward procedure on a later hold-out period of CSI 300 high-frequency data and finding that the regime-augmented HARQ models no longer outperform baseline HARQ models on volatility forecast accuracy metrics would falsify the main claim.

Figures

Figures reproduced from arXiv: 2606.09478 by Robert \'Slepaczuk, Xinyue Fang.

Figure 1
Figure 1. Figure 1: Distribution of daily returns Note: The histogram shows the empirical distribution of daily returns together with a fitted normal density [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: QQ plot of daily returns Note: The QQ plot compares empirical return quantiles with theoretical normal quantiles. Substantial tail deviations indicate strong non-normality. The distribution of realized variance is highly right-skewed and characterized by extreme kurtosis. After applying a logarithmic transformation, the distribution becomes substantially more symmetric and approximately Gaussian. Skewness … view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of log-realized variance Note: The histogram shows the empirical distribution of log-realized variance together with a fitted normal density [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: QQ plot of log-realized variance Note: The QQ plot compares empirical quantiles of log-realized variance with theoretical normal quantiles. Beyond distributional characteristics, volatility exhibits strong temporal dependence [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Time series of log-realized variance Note: The figure illustrates the time-series evolution of log-realized variance and highlights volatility clustering [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: reports the autocorrelation function of log-realized variance. Autocorrelations remain positive and decay slowly across lags, providing strong evidence of long-memory behavior in volatility dynamics [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: provides a schematic representation of the walk-forward framework. Panel A illustrates the volatility-forecasting design, while Panel B illustrates the return-prediction design. In both cases, the orange segments denote out-of-sample evaluation periods [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: presents the time-series dynamics of realized volatility together with the filtered probability of the high-volatility regime, denoted by pt [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: presents the average feature importance across walk-forward estimation windows [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: plots predicted returns against realized returns over the out-of-sample period [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: reports the rolling information coefficient of predicted returns [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: provides direct evidence that predictive stability is regime dependent [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Out-of-sample equity curve Note: The figure presents out-of-sample cumulative net wealth from 2014-12-17 to 2023-05-26 for the Low-vol Gated Weekly Signal×Risk strategy and benchmark strategies. All strategy returns are reported net of 5 bp transaction costs. The Low-vol Gated Weekly Signal×Risk strategy does not maximize cumulative wealth in every subperiod. Instead, its main advantage lies in smoother w… view at source ↗
Figure 14
Figure 14. Figure 14: compares annual returns of the proposed strategy and Buy-and-Hold [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: compares annual maximum drawdowns [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: q × b sensitivity of net Sharpe Note: The figure reports net Sharpe ratios across threshold quantiles q and no-trade bands b. The baseline specification q = 0.60, b = 0.02 is retained as a conservative ex ante choice rather than selected as the ex post best-performing parameter combination. White Reality Check-style and Hansen SPA-style procedures are applied across the entire q × b grid. The resulting p-… view at source ↗
read the original abstract

This study investigates whether regime-dependent volatility forecasting and machine-learning-based return prediction can be jointly integrated to improve both statistical forecasting performance and economic strategy outcomes in equity markets. Using high-frequency CSI 300 Index data from 2005 to 2023, a sequential twostage framework is developed. In the first stage, realized volatility is modeled using regime-augmented HARQ specifications combined with Markov-switching GJR-GARCH filtering to capture long-memory dynamics, asymmetry, and structural market regimes. In the second stage, volatility forecasts, regime indicators, and return-related predictors are incorporated into an XGBoost return-prediction model estimated through a strictly walk-forward out-of-sample procedure. The empirical results demonstrate that regime-aware volatility forecasting consistently outperforms baseline HARQ models across forecast evaluation metrics and is generally supported by formal forecast comparison tests. In contrast, return predictability remains weak, state-dependent, and concentrated primarily in low-volatility regimes. Although naive predictive trading strategies generally fail after accounting for realistic transaction costs, carefully designed implementations incorporating volatility scaling, low-volatility gating, threshold calibration, and turnover controls can improve defensive economic performance. The findings suggest that the practical value of predictive systems in financial markets may depend less on generating strong unconditional return forecasts and more on transforming weak state-dependent signals into economically robust portfolio allocation rules. Overall, the study contributes by integrating econometric volatility modeling, regime classification, machine-learning return prediction, and implementation realism within a unified framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a two-stage framework on high-frequency CSI 300 data (2005–2023): Markov-switching GJR-GARCH is used to identify regimes, which are then incorporated into regime-augmented HARQ models for realized-volatility forecasting; the resulting forecasts, regime indicators, and other predictors feed an XGBoost model for returns, all estimated via strictly walk-forward out-of-sample procedures. The central claims are that the regime-aware volatility specifications outperform plain HARQ models on standard forecast metrics and are supported by formal comparison tests, that return predictability is weak overall but stronger in the low-volatility regime, and that naïve trading strategies fail after transaction costs while carefully tuned versions (volatility scaling, low-vol gating, threshold calibration, turnover controls) deliver improved defensive economic performance.

Significance. If the regime labels remain informative out-of-sample and the reported economic improvements survive further robustness checks, the work would usefully illustrate how regime-dependent econometric modeling can be combined with machine-learning return prediction and realistic implementation constraints in an emerging-market setting. The emphasis on the limits of unconditional return forecasts versus the value of state-dependent allocation rules is a constructive contribution to the literature on predictability under structural breaks.

major comments (2)
  1. [first-stage Markov-switching GJR-GARCH filtering and walk-forward procedure] The claim that regime-aware HARQ specifications consistently outperform baseline HARQ models rests on the stability and out-of-sample informativeness of the two-state Markov-switching GJR-GARCH regime classification. Because the sample contains multiple structural breaks (2008, 2015, 2020), the estimated transition probabilities and volatility parameters may be dominated by crisis episodes; the manuscript does not report whether the MS-GJR-GARCH is re-estimated inside each walk-forward window or fitted once on the full sample, nor does it provide regime-persistence diagnostics or sensitivity checks to the number of regimes. This is load-bearing for both the volatility-forecasting and the return-prediction results.
  2. [economic strategy results and implementation details] The economic-performance conclusions depend on post-hoc choices of volatility-scaling factors, low-volatility gates, return thresholds, and turnover controls. The manuscript should demonstrate that these choices are either pre-specified or subjected to a formal robustness exercise (e.g., grid search reported in an appendix or out-of-sample validation of the tuning parameters themselves); otherwise the reported improvement in defensive performance after costs cannot be distinguished from in-sample optimization.
minor comments (2)
  1. [Abstract] The abstract states that regime-aware models “consistently outperform” and are “generally supported by formal forecast comparison tests,” yet supplies no numerical values for RMSE, QLIKE, or Diebold-Mariano statistics; these metrics should appear already in the abstract or at least be summarized with effect sizes.
  2. [volatility-model section] Notation for the regime-augmented HARQ specification (e.g., how the regime dummy enters the HARQ equation) is introduced only descriptively; an explicit equation would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These highlight important aspects of our methodology and implementation that require clarification and additional robustness checks. We address each major comment below and will incorporate the suggested revisions into the manuscript.

read point-by-point responses
  1. Referee: [first-stage Markov-switching GJR-GARCH filtering and walk-forward procedure] The claim that regime-aware HARQ specifications consistently outperform baseline HARQ models rests on the stability and out-of-sample informativeness of the two-state Markov-switching GJR-GARCH regime classification. Because the sample contains multiple structural breaks (2008, 2015, 2020), the estimated transition probabilities and volatility parameters may be dominated by crisis episodes; the manuscript does not report whether the MS-GJR-GARCH is re-estimated inside each walk-forward window or fitted once on the full sample, nor does it provide regime-persistence diagnostics or sensitivity checks to the number of regimes. This is load-bearing for both the volatility-forecasting and the return-prediction results.

    Authors: We agree that explicit documentation of the regime-identification procedure is essential. The MS-GJR-GARCH model was re-estimated at the start of each walk-forward window using only data available up to that point, consistent with the strictly out-of-sample protocol described for the overall framework. We will revise the methodology section to state this explicitly. In addition, we will add (i) regime-persistence statistics (average duration and transition probabilities) computed out-of-sample and (ii) a sensitivity table comparing results under two versus three regimes. These diagnostics will be placed in a new appendix and referenced in the main text. revision: yes

  2. Referee: [economic strategy results and implementation details] The economic-performance conclusions depend on post-hoc choices of volatility-scaling factors, low-volatility gates, return thresholds, and turnover controls. The manuscript should demonstrate that these choices are either pre-specified or subjected to a formal robustness exercise (e.g., grid search reported in an appendix or out-of-sample validation of the tuning parameters themselves); otherwise the reported improvement in defensive performance after costs cannot be distinguished from in-sample optimization.

    Authors: We acknowledge that the specific parameter values used for volatility scaling, gating, thresholds, and turnover controls require stronger justification. While the core predictors and models are estimated strictly out-of-sample, the strategy hyperparameters were calibrated on a preliminary subsample. We will add a formal robustness appendix that reports a grid search over plausible ranges of these parameters and shows the distribution of Sharpe ratios and maximum drawdowns across the grid. Only parameter combinations that would have been feasible at the time of each rebalancing are considered, thereby addressing the concern about ex-post optimization. revision: yes

Circularity Check

0 steps flagged

No significant circularity: walk-forward OOS framework keeps derivation self-contained

full rationale

The paper's core claims rest on a two-stage pipeline with Markov-switching GJR-GARCH regime filtering followed by HARQ-XGBoost forecasting, all evaluated via a strictly walk-forward out-of-sample procedure on the 2005-2023 CSI 300 data. No step re-uses fitted parameters or regime labels as both input and output on the same observations; volatility forecasts and return predictions are generated on held-out periods, and economic strategy results are assessed after explicit transaction-cost adjustments. Because the evaluation protocol separates estimation from testing and no self-citation chain or definitional loop is invoked to justify the regime labels or model superiority, the reported outperformance does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard time-series assumptions plus data-driven parameter estimation in volatility and ML models; no new entities postulated. Review limited to abstract so ledger is incomplete.

free parameters (3)
  • HARQ parameters
    Coefficients in the heterogeneous autoregressive model with quadratic variation terms are estimated from data.
  • GJR-GARCH parameters
    Asymmetric GARCH parameters including regime-specific values fitted via maximum likelihood.
  • XGBoost hyperparameters
    Learning rate, tree depth, and other tuning parameters chosen during walk-forward optimization.
axioms (2)
  • domain assumption Market regimes can be adequately captured by a two-state Markov-switching process
    Invoked in the first stage to filter volatility dynamics.
  • domain assumption Walk-forward out-of-sample procedure prevents look-ahead bias
    Used for both stages to ensure realistic forecasting.

pith-pipeline@v0.9.1-grok · 5800 in / 1490 out tokens · 23585 ms · 2026-06-27T13:55:50.028119+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 1 canonical work pages

  1. [1]

    and Bollerslev, Tim , title =

    Andersen, Torben G. and Bollerslev, Tim , title =. International Economic Review , year =

  2. [2]

    and Bollerslev, Tim and Diebold, Francis X

    Andersen, Torben G. and Bollerslev, Tim and Diebold, Francis X. and Labys, Paul , title =. Journal of the American Statistical Association , year =

  3. [3]

    and Bollerslev, Tim and Diebold, Francis X

    Andersen, Torben G. and Bollerslev, Tim and Diebold, Francis X. and Labys, Paul , title =. Econometrica , year =

  4. [4]

    and Quaedvlieg, Rogier , title =

    Bollerslev, Tim and Patton, Andrew J. and Quaedvlieg, Rogier , title =. Journal of Econometrics , year =

  5. [5]

    and Bollerslev, Tim and Diebold, Francis X

    Andersen, Torben G. and Bollerslev, Tim and Diebold, Francis X. , title =. Review of Economics and Statistics , year =

  6. [6]

    Leushuis, R. M. and Petkov, N. , title =. Financial Innovation , year =

  7. [7]

    Journal of Financial Econometrics , year =

    Corsi, Fulvio , title =. Journal of Financial Econometrics , year =

  8. [8]

    , title =

    Hamilton, James D. , title =. Econometrica , year =

  9. [9]

    , title =

    Hamilton, James D. , title =

  10. [10]

    Empirical Economics , year =

    Klaassen, Franc , title =. Empirical Economics , year =

  11. [11]

    and Jagannathan, Ravi and Runkle, David E

    Glosten, Lawrence R. and Jagannathan, Ravi and Runkle, David E. , title =. Journal of Finance , year =

  12. [12]

    Review of Economics and Statistics , year =

    Bollerslev, Tim , title =. Review of Economics and Statistics , year =

  13. [13]

    Ma, Fang and Wahab, M. I. M. and Huang, Dong and Xu, Wei , title =. Energy Economics , year =

  14. [14]

    Accounting & Finance , year =

    Wang, Xiaohui and Shrestha, Keshab and Sun, Qing , title =. Accounting & Finance , year =

  15. [15]

    Applied Economics , year =

    Ma, Fang and Lu, Xiaoqing and Yang, Kai and Zhang, Yu , title =. Applied Economics , year =

  16. [16]

    Journal of Finance , year =

    Moreira, Alan and Muir, Tyler , title =. Journal of Finance , year =

  17. [17]

    , title =

    Lehmann, Bruce N. , title =. Quarterly Journal of Economics , year =

  18. [18]

    Journal of Finance , year =

    Jegadeesh, Narasimhan and Titman, Sheridan , title =. Journal of Finance , year =

  19. [19]

    Review of Financial Studies , year =

    Goyal, Amit and Welch, Ivo , title =. Review of Financial Studies , year =

  20. [20]

    Journal of Financial Economics , year =

    Kelly, Bryan and Pruitt, Seth and Su, Yinan , title =. Journal of Financial Economics , year =

  21. [21]

    Proceedings of the 22nd ACM SIGKDD Conference , year =

    Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD Conference , year =

  22. [22]

    Malla, S. R. and Kayastha, S. and Suwal, R. and Bhandari, H. C. and Adhikari, R. , title =. arXiv preprint arXiv:2601.08896 , year =

  23. [23]

    and Xiu, Dacheng , title =

    Gu, Shihao and Kelly, Bryan T. and Xiu, Dacheng , title =. Review of Financial Studies , volume =. 2020 , publisher =

  24. [24]

    Journal of Statistical Software , year =

    Ardia, David and Bluteau, Keven and Boudt, Kris and Catania, Leopoldo and Trottier, David-Alexandre , title =. Journal of Statistical Software , year =

  25. [25]

    , title =

    Campbell, John Y. , title =. Journal of Financial Economics , year =

  26. [26]

    and Boubaker, H

    Ben Romdhane, W. and Boubaker, H. , title =. Journal of Risk and Financial Management , year =

  27. [27]

    Sukainah, A. B. and Dania, A. N. , title =. Frontiers in Artificial Intelligence , year =

  28. [28]

    and Mirau, S

    Peter, M. and Mirau, S. and Sinkwembe, E. and Kasumo, C. and Guambe, C. , title =. Array , year =

  29. [29]

    and Varshney, N

    Jain, R. and Varshney, N. and Durgarao, M. S. P. and Maurya, S. K. and Mehta, D. K. and Kundu, A. and Verma, A. , title =. National Academy Science Letters , year =

  30. [30]

    Zhang, Y. J. and Zhang, Y. Y. and Zhang, H. and Tang, Z. , title =. Journal of Futures Markets , year =

  31. [31]

    and Chong, C

    Yihuan, L. and Chong, C. W. and Yap, N. K. and Juan, Z. and Youyuan, W. , title =. International Journal of Academic Research in Accounting, Finance and Management Sciences , year =

  32. [32]

    , title =

    Feng, H. , title =. Proceedings of the 3rd International Conference on Computer Science and Mechatronics (ICCSM 2025) , year =

  33. [33]

    and Yang, L

    Li, X. and Yang, L. and Zha, C. and Xu, Y. , title =. Computational Economics , year =

  34. [34]

    Handbook of Economic Forecasting , editor =

    Timmermann, Allan , title =. Handbook of Economic Forecasting , editor =. 2006 , volume =

  35. [35]

    Journal of Empirical Finance , volume =

    Ledoit, Olivier and Wolf, Michael , title =. Journal of Empirical Finance , volume =. 2008 , doi =

  36. [36]

    and Mariano, Roberto S

    Diebold, Francis X. and Mariano, Roberto S. , title =. Journal of Business & Economic Statistics , volume =. 1995 , doi =

  37. [37]

    , title =

    Patton, Andrew J. , title =. Journal of Econometrics , volume =. 2011 , doi =

  38. [38]

    Econometrica , volume =

    White, Halbert , title =. Econometrica , volume =. 2000 , doi =

  39. [39]

    , title =

    Hansen, Peter R. , title =. Journal of Business & Economic Statistics , volume =. 2005 , doi =

  40. [40]

    , title =

    Christoffersen, Peter and Diebold, Francis X. , title =. Management Science , volume =. 2006 , doi =

  41. [41]

    and Shephard, Neil , title =

    Barndorff-Nielsen, Ole E. and Shephard, Neil , title =. Journal of Financial Econometrics , volume =. 2004 , doi =

  42. [42]

    and Sheppard, Kevin , title =

    Patton, Andrew J. and Sheppard, Kevin , title =. Review of Economics and Statistics , volume =. 2015 , doi =

  43. [43]

    and Bollerslev, Tim and Diebold, Francis X

    Andersen, Torben G. and Bollerslev, Tim and Diebold, Francis X. , title =. International Encyclopedia of Statistical Science , editor =. 2011 , doi =

  44. [44]

    Econometrica , volume =

    Giacomini, Raffaella and White, Halbert , title =. Econometrica , volume =. 2006 , doi =

  45. [45]

    and Lopez de Prado, Marcos , title =

    Bailey, David H. and Lopez de Prado, Marcos , title =. Journal of Risk , volume =. 2012 , doi =

  46. [46]

    Economic Forecasts and Expectations: Analysis of Forecasting Behavior and Performance , editor =

    Jacob Mincer and Victor Zarnowitz , title =. Economic Forecasts and Expectations: Analysis of Forecasting Behavior and Performance , editor =. 1969 , pages =

  47. [47]

    Newey and Kenneth D

    Whitney K. Newey and Kenneth D. West , title =. Econometrica , year =