Systematic trend following has, on average, been profitable for at least two centuries; yet since approximately 2009, short-term trends have ceased to deliver reliable returns. Using a cross-section of roughly 100 liquid futures contracts spanning 1995-2025, together with an industry-representative CTA proxy, we document the break and characterise its dependence on signal speed and asset class. We evaluate four candidate explanations - capacity constraints, market electronification, a regime change in CTA-versus-order-flow interactions, and a microstructural mechanism - and find that the first three fail on grounds of timing, magnitude, or cross-sectional heterogeneity.
Our central empirical finding is that the cross-sectional variable distinguishing degraded from surviving trends is the volatility-normalised tick size: post-2008 trend PnL has collapsed on small-tick contracts across all signal horizons, while remaining essentially intact on large-tick ones. Neither asset class nor liquidity replicates this dichotomy.
We interpret this result through a self-fulfilling feedback loop that, in our view, lies at the heart of the trend anomaly itself: trend signals trigger directional trades, whose market impact reinforces the very price moves that generated the signal. Both the profitability and the persistence of trend are sustained by this impact channel, which requires that trend followers can execute aggressively at reasonable cost. We argue that the post-crisis transition to HFT-dominated market making, whose liquidity-withdrawal behaviour in front of predictable directional flow has sharply contrasting consequences for sparse (small-tick) and dense (large-tick) limit order books, has broken this loop on small-tick contracts. On large-tick contracts, residual depth remains sufficient, and the loop continues to operate.
We estimate Kyle's (1985) price-impact coefficient $\lambda$ directly from daily equity order flow and test its ability to forecast the cross-section of subsequent stock returns. Using CRSP data from 2020 to 2025, we construct firm-month measures of signed order flow and two estimators of $\hat\lambda_{it}$: a within-month price-impact regression and an Amihud-style ratio. Signed order flow strongly predicts contemporaneous and one-month-ahead returns, while volume volatility predicts lower subsequent returns, consistent with widening price impact degrading price discovery. Fama-MacBeth regressions confirm that our order-flow signal carries significant cross-sectional return information after Newey--West adjustment. Theoretically, we resolve the liquidity premium puzzle of Constantinides (1986) through an adverse-selection mechanism: low order flow widens $\lambda$ and depresses prices today; subsequent normalization restores prices, generating the illiquidity premium without risk-based compensation.
Student-t uninformed demand keeps imbalances ambiguous, flattening impact and slowing price discovery from order flow.
abstractclick to expand
When is a large trade news, and when is it a liquidity shock? We study this question in a sequential competitive limit order book with asymmetric information. In our model, liquidity suppliers observe aggregate order flow but not its decomposition into informed demand and uninformed liquidity demand. We model uninformed order flow with Student-$t$ tails, interpreted as a reduced form for rare liquidity regimes. The tail index of liquidity demand determines how informative large trades are. With thin-tailed noise, large order imbalances are quickly interpreted as private information. With heavy-tailed liquidity demand, the same imbalances remain plausibly liquidity-driven. This liquidity-tail ambiguity flattens and concavifies price impact, slows learning from order flow, and delays the decline of adverse-selection premia. We characterize equilibrium through a fixed-point equation for the marginal-cost schedule. Heavy-tailed liquidity demand changes the mathematics of equilibrium: the Gaussian monotonicity and compactness arguments fail because remote liquidity states remain pricing-relevant at polynomial order. We construct fixed points on a tail-controlled compact class and study learning and large-order asymptotics along selected monotone branches. Repeated order flow reveals the fundamental value under stable information-rate conditions, but heavier liquidity tails slow finite-horizon price discovery. Large-order impact obeys regular-variation asymptotics whose exponents depend on the liquidity-tail index, informed competition, and posterior beliefs. The model identifies liquidity tail risk as a state variable for market impact, spread resilience, and the informativeness of large trades.
Timing-based tilts across asset classes can drive much of the risk and return of a diversified cross-asset portfolio. The standard approach forecasts returns and then optimizes weights. We instead study an end-to-end AI-based policy that maps market states directly to portfolio weights, and we then ask when this one-step modeling approach outperforms simple rules-based strategies. We train these policies on the sixteen most liquid CME futures, where an edge is unlikely to be due to illiquidity, using a differentiable Sharpe ratio loss function, and we benchmark them against equal weighting, risk parity, and time-series momentum. The learned policies rank above the rules on the pooled cross-asset portfolio and in several sub-asset classes, but not uniformly. In gross terms, an LSTM and a transformer-based architecture perform comparably out-of-sample, but diverge when we consider transaction costs. The transformer generates the stronger learned policy, trades far less than the LSTM, and matches or exceeds equal weighting through moderate cost.
Five-minute Bitcoin markets show settlement spikes and reversals while fifteen-minute versions do not, confirming the model's horizon remedy
abstractclick to expand
Prediction markets increasingly list contracts settling on an asset price that holders can move by trading the underlying. We build a model showing that such contracts transfer wealth from prediction-market liquidity traders to manipulators and harm price discovery in the underlying, even as it becomes more liquid. After the launch of Polymarket's five-minute Bitcoin contract, settlement-time spot order flow spikes, causing large price reversals after settlement. Manipulators capture a large amount of profit, mostly from retail. Manipulation is largely absent in the fifteen-minute contracts: lengthening the contract horizon removes it, providing the market-design remedy our model and evidence support.
We develop a signature-based framework for optimal execution in statistical arbitrage strategies with path-dependent predictive signals. Both the alpha process and the trading speed are modelled as linear functionals of the truncated signature of a time-augmented market path, placing signal generation and execution on the same truncated signature basis. This allows the trading rule to react to the realised history of the signal while accounting for temporary impact, inventory exposure, terminal liquidation, and approximate dollar neutrality The main contribution is a quadratic reduction theorem: within the class of signature-linear trading speeds, the restricted path-dependent execution problem becomes a finite-dimensional concave quadratic programme in the policy coefficients. After running synthetic experiments under a mean-reverting log-spread model, we find that the fitted policy achieves a higher return on turnover than a z-score classical threshold benchmark. We shows how the same workflow can be deployed on a historical equity pairs-trading backtest, where the fitted signature policy again outperforms the benchmark in accounting terms.
A decomposition isolates magnitude shrinkage as the sole driver, matching bid-ask bounce rather than directional reversal.
abstractclick to expand
SPY's lag-1 return autocorrelation ($\hat\rho(1)=-0.081$, $z=-7.4$) is among the most significant regularities in empirical equity finance, yet the standard variance-ratio (VR) test cannot determine whether it reflects directional reversal or magnitude shrinkage - phenomena with entirely different trading implications. We develop the Fourier-Residue Identity (FRI), which decomposes return autocorrelation into a sign ($k=2$) and a magnitude ($k=4$) channel, each independently testable and neither redundant. Applied to six US instruments over 1993--2026 and a 21-instrument cross-asset panel, the FRI delivers a sharp microstructure diagnosis. The lag-1 autocorrelation in SPY is driven entirely by magnitude: the FRI sign test is insignificant ($p=0.11$) while the full test achieves $p<10^{-12}$. A large move yesterday predicts a smaller move today regardless of direction - the fingerprint of bid-ask bounce and non-synchronous constituent staleness, not directional reversal. At lag 3, a significant directional reversal ($p=0.02$) invisible to the scalar ACF reveals a separate partial-price-adjustment channel. We prove the Fejer identity VR(q)=1+2C_q (confirmed to <0.001 on all series), giving the Lo-MacKinlay test a spectral interpretation, and introduce a subsample diagnostic R_N=G_{N/2}/G_N that classifies equity autocorrelation as structural (R_N->1) rather than sampling noise (R_N->sqrt(2)). The cross-asset panel shows mean reversion confined to exchange-traded equities and sovereign bonds; credit ETFs, commodities, FX, and crypto are indistinguishable from random walks. All estimators pass 27 unit tests; Monte Carlo confirms correct 5% size under GARCH.
We study whether a scaling-law-style inference-compute frontier appears in limit order book prediction. Using FI-2010 and a suite of models ranging from small decision trees to neural LOB architectures, we find that the realized empirical frontier of predictive loss versus structural forward work is well summarized by a power law. In particular, with MLPLOB held out as an architecture family, a power-law fit to the low- and mid-compute non-MLPLOB frontier extrapolates across multiple orders of magnitude and attains $R^2=0.941$ on the excluded high-compute MLPLOB target frontier.
A similar exercise in latency space gives substantially weaker results, showing that latency is not merely noisy compute. We use this gap to motivate FastBiNLOB, a dense axis-separable LOB mixer built from hardware-friendly temporal and feature mixing operations. In a five-seed experiment, FastBiNLOB exceeds the published $y_{10}$ and $y_{100}$ macro-F1 targets at notably lower latency than existing published SOTA architectures.
Commodity futures can be represented hierarchically, with underlying assets at the upper level and individual futures contracts at the lower level. Entities at each level can be connected by edges reflecting inherent correlations, with cross-level edges capturing contract-to-underlying asset connections. Building on our observations of these structures, we propose a hierarchical graph learning approach for calendar spread (CS) strategies in commodity futures markets, addressing two significant gaps in the machine-learning literature: (i) the absence of learning-based methods for CS strategies in futures markets, and (ii) the lack of consideration of maturity-dependent interrelationships across commodity futures. We first establish the efficacy of CS strategies by analytically showing that CS strategies can possess higher risk-adjusted returns, measured by the information ratio, and lower risk, measured by variance and delta, than long-only strategies. We then introduce a method to convert learning-based predictions into CS positions. Next, we develop a hierarchical graph learning method that predicts futures price movements by utilizing the maturity-dependent interrelationships, thereby yielding a CS trading algorithm. Empirical results on commodity futures markets traded on the Chicago Mercantile Exchange Group demonstrate that our method outperforms benchmark models in both prediction and trading performance. We find that maturity-dependent interrelationships across commodity futures are instrumental in prediction and that CS trading based on hierarchical graph learning is effective for statistical arbitrage.
We investigate the evolving structure of interactions in cryptocurrency markets using a network-based framework constructed from high-frequency price data spanning 2020-2025. Directed and weighted networks are constructed from statistically significant Granger causal relationships between cryptocurrency log-returns, enabling us to quantify the flow of influence across assets. We find that normalized returns exhibit heavy-tailed distributions, consistent with the presence of large intermittent fluctuations and in line with stylized facts of financial markets. The resulting networks display pronounced heterogeneity in link weights and nodal strengths, indicating that a small subset of cryptocurrencies contributes disproportionately to market dynamics. By ranking cryptocurrencies based on their nodal out-strength, we uncover a dynamically evolving hierarchy of influence. Ethereum consistently emerges as the most influential asset, while Bitcoin shows a gradual decline in its relative importance. The ranking structure exhibits substantial temporal variability, with multiple cryptocurrencies entering and exiting the top positions over time. Our findings reveal a highly competitive and non-stable organization of the cryptocurrency ecosystem.
We test the square-root law (SRL) of market impact on a single U.S. large-capitalisation equity, Apple Inc. (AAPL), using the full Nasdaq TotalView-ITCH market-by-order feed over 178 trading days (2 December 2024 -- 19 August 2025; ~0.5 billion events). Without broker-tagged parent orders, we reconstruct metaorders from the anonymous tape and calibrate impact as $I/\sigma_D = c\,(Q/V_D)^{1/2}$ with the exponent fixed at the universal value $1/2$. We find $c_{\rm raw} = 0.69$ (bias-corrected $c_{\rm eff} = 0.34$), conditional impact tracking $Q^{1/2}$, and a size-distribution tail exponent $\beta = 1.54 \pm 0.15$ -- both consistent with the worldwide cross-section. A direct model comparison decisively prefers the square-root form over linear ($\Delta{\rm AIC}=22$) and logarithmic impact, and the prefactor holds ($c_{\rm raw} \in [0.63, 0.77]$) across every reconstruction setting. Two structural tests confirm the impact is genuine: shuffling trade signs collapses directional impact to chance (86% to 51%); and scrambling event chronology destroys the SRL (0 of 80 calibrations remain viable). The underlying order flow is long-memory ($\gamma=0.66$) while the price stays diffusive (Hurst 0.49) -- the two ingredients of the universality theories. The prefactor is stable across 32 weekly walk-forward re-calibrations. To our knowledge this is the first confirmation of the square-root law on a U.S. equity derived purely from anonymous order flow, without broker-tagged parent orders.
Automated Market Makers based on concentrated liquidity, such as Uniswap v3, significantly improve capital efficiency but expose Liquidity Providers (LPs) to adverse selection costs, formalized as Loss-Versus-Rebalancing (LVR). While theoretical literature quantifies these costs, the interplay between realistic blockchain microstructure and endogenous pricing mechanisms remains under-explored. This paper develops a granular Agent-Based Model of a Uniswap v3 pool interacting with a stochastic reference market governed by Heston volatility dynamics. The framework incorporates discrete block propagation, mempool latency, and a heterogeneous population of agents, including latency-sensitive arbitrageurs, smart routers, Maximal Extractable Value searchers, and active LPs benchmarked against a frictionless rebalancing strategy. We propose and evaluate dynamic fee schedules driven by volatility and order-flow toxicity proxies intended to compensate LPs for adverse-selection losses. Our simulations investigate the conditions under which LPs can achieve positive hedged Profit and Loss (fees minus LVR). The analysis suggests that dynamic fee adjustments can improve hedged LP profitability mainly by increasing fee income in states associated with stale-price risk. Depending on the configuration, these rules may also affect realized LVR, but the current aggregate results support compensation for LVR more directly than a reduction of LVR itself.
We propose a deterministic adversarial market model in which apparent randomness emerges endogenously from the interaction between a market mechanism and a population of predictive traders. Unlike a classical generative adversarial network, the model does not attempt to imitate an external empirical data distribution and does not inject random noise into a generator. The market is represented by a deterministic binary return path, while traders learn predictive strategies from observed in-sample history and trade on an out-of-sample continuation. The market then adapts against the traders by reducing their predictive and trading edge.
The central experiment begins with a smooth, highly predictable market path. Traders with multiple lookback windows and multiple holding periods learn to predict future cumulative returns. Initially, these traders earn large out-of-sample profits. After adversarial market adaptation, their out-of-sample profitability collapses toward zero. Importantly, in the final clean specification, no explicit sign-balance, transition-rate, or autocorrelation penalties are imposed. Nevertheless, the out-of-sample return sequence becomes balanced, has transition rate close to one half, has low autocorrelation, and passes block-based distributional diagnostics. In a medium-size experiment with $T_{\mathrm{IS}}=2000$ and $T_{\mathrm{OOS}}=10000$, the out-of-sample positive-return fraction is $0.5010$, the transition rate is $0.4896$, and the maximum absolute autocorrelation is $0.0275$. Binary return blocks transformed into dyadic variables are close to uniform on $[0,1]$, and normalized block sums are broadly consistent with a standard normal law. These results support the hypothesis that market randomness can arise as the endogenous residue of arbitrage pressure rather than from exogenous stochastic shocks.
We study the evolution of transaction speed and fees from January 2024 through March 2026, comparing Ethereum Mainnet and its Layer 2 (L2) networks, as well as Solana and Polygon. Ethereum has undergone upgrades that have increased block size and blob count. These upgrades have doubled transactions per second (TPS) on both the Mainnet and the L2 networks. Mainnet median fees have fallen from over \$2 to under \$0.02, and L2 median fees have fallen more than 95% from \$0.05 to \$0.0015. We forecast that Mainnet median fees will converge with Solana in August 2027, but TPS will remain below 100 until 2034. The L1 Strawmap, proposing EIP-7938, a potential exponential increase in the gas limit, brings the Mainnet to only 100 TPS in January 2028. With continued blob expansion, L2s will surpass Solana TPS in March 2029 and have lower median fees by October 2026.
Simulating financial markets at scale with multi-agent (Agent-Based) models is critical for market design, regulatory stress-testing, and reinforcement learning, but traditional CPU simulators are bottlenecked by sequential processing while vectorized GPU frameworks suffer from kernel-launch overhead and redundant global-memory round-trips. We formalize, analyze, and evaluate a reusable parallel design pattern: persistent, state-carrying clearing for iterative multi-agent reductions. By caching mutable simulation state in thread-block shared memory across step boundaries, aggregating agent actions via shared-memory atomics, and resolving the clearing function cooperatively, the pattern reduces the per-step critical-path depth from Theta(L+A) for sequential clearing (L price-grid ticks, A agents) to Theta(log L + ceil(A/L)) and makes global-memory traffic independent of the step count. We implement this in KineticSim, a lightweight GPU execution engine that simulates massive ensembles of limit-order books in parallel, reaching a peak throughput of over 54.7 billion agent-events per second. On a fixed workload it delivers speedups of 3406x over CPU (NumPy), 27.8x over PyTorch GPU, 42.8x over JAX GPU, and 8.4x over a naive custom CUDA baseline, while using roughly an order of magnitude less GPU memory than PyTorch. Across 53 configurations the two custom CUDA engines produce bitwise-identical order books, and aggregate statistics match the CPU reference to within 0.1%. The pattern generalizes to other iterative multi-agent workloads requiring state-persistent, block-localized reductions.
We study the fee policy of a liquidity provider (LP) in a constant-product automated market maker (AMM) whose fee can be adjusted continuously, as enabled by programmable hooks. Building on the loss-versus-rebalancing (LVR) framework of Milionis et al. (2022) and its extension to nonzero fees by Milionis et al. (2024), we model the LP's wealth relative to the continuously rebalanced benchmark as a controlled process in which the fee governs two opposing forces: it raises revenue per uninformed trade while discouraging uninformed volume, and it widens the no-arbitrage band, which lowers the rate at which arbitrageurs extract value. Because the fee enters only the drift of relative wealth and never its diffusion, the LP's expected-utility problem reduces to an ergodic control problem whose solution is a pointwise volatility feedback. We prove that the growth-optimal fee is independent of the LP's wealth and of its constant relative risk aversion, that it collapses to a static constant when volatility is constant, and that it is strictly increasing in instantaneous variance, so that the optimal schedule is pro-cyclical. When volatility is stochastic, we characterise the optimal fee through a scalar ergodic Hamilton-Jacobi-Bellman equation and a linear Poisson equation, solved by a finite-difference scheme. We further show that the optimal fee is invariant to price jumps under logarithmic preferences, relate the optimal fee to a stylised model of competition among venues, and treat gas costs through an impulse-control dead-band. In a calibration to liquid large-capitalisation conditions, the optimal dynamic fee weakly dominates every static and volatility-linked heuristic fee on each simulated path, improving the LP's growth rate over the best static fee by a modest but uniformly positive margin, with a dead-band rendering gas costs negligible.
The digitization of financial markets has produced two classes of platforms that price, in principle, the same state - contingent payoffs: centralized crypto-option exchanges and blockchain-based prediction markets. This paper provides the first option-implied benchmark test of prediction-market pricing for cryptocurrency threshold contracts. For each hour in a matched sample, we compare the Polymarket Yes price with the discounted risk-neutral binary value implied by a listed Binance call option on the same underlying, strike, and maturity, and study the gap between them. In the main September 2023 Bitcoin contract, the mean pricing gap equals 5.6 percentage points across 214 hourly observations (t = 6.46, p < 10^{-9}). Pooling three Binance-compatible Bitcoin threshold markets yields a mean gap of 6.3 percentage points across 287 observations, robust to HAC and block-bootstrap inference. The gap is persistent - with an AR(1) half-life of roughly four hours - yet mean-reverting, consistent with slow information transmission between segmented venues rather than mechanical noise. Cross-sectional regressions reveal that the wedge is largest at low option-implied probabilities and long maturities, a pattern consistent with speculative demand for prediction-market contracts rather than measurement error. A delta-hedged arbitrage proxy remains profitable after conservative transaction costs, though with marginal statistical precision. A Deribit extension on the same three Bitcoin contracts produces a larger pooled gap of 11 percentage points, while a smaller Ethereum exercise yields mixed evidence. The results demonstrate that digital fragmentation of financial markets generates systematic, persistent pricing wedges even for economically identical payoffs.
A reaction-diffusion model unifies LMF long memory with square-root meta-order impact while distinguishing event and physical time.
abstractclick to expand
Starting with a coupled discrete reaction--diffusion formulation for the lit and latent order books with non-uniformly sampled event times and meta-order source terms we show how two familiar market-microstructure regularities can emerge from this framework: the long-memory of trade signs associated with the Lillo--Mike--Farmer (LMF) theory and the square-root law (SQRL) of meta-order impact. This uses the locally linear order book and constant participation-rate execution in the front dynamics to reduce the dynamics to a Volterra equation whose leading-order solution then yields the well know result of concave impact trajectory, and a completion impact proportional to the square root of the meta-order size. We then use the interface representation to show how heavy-tailed Pareto meta-order lengths generate power-law trade-sign autocorrelations through the source term. These are familiar derivations, what is slightly different here is that we reinterpret these known derivations to make it clear that LMF law is an event-time sign-memory statement, whereas the square-root law is a physical-time viability statement where subordination can alter the calendar-time impact trajectories depending on the mappings and interpolation used to set continuum operational time.
An important question for an algo trader working an order is to understand if their actions are moving the market against them -- i.e., causing market impact. The conventional answer usually is one of two: (i) monitor price slippage in real-time, potentially reducing adverse activity with increased slippage, or (ii) do away with dynamic trading adjustments and rely on semi-static rules based on ex-post estimates of slippage over a large sample of events.
Realtime monitoring fails because reliably estimating slippage is statistically expensive -- it requires hundreds of fills before it can be told apart from background volatility. More fundamentally however, it does not establish causality. Observed adverse price moves may be caused by the trader's own actions, or by an unrelated participant competing for the same liquidity and capturing the same alpha. The optimal response (say, slow down vs.\ speed up) is opposite in the two cases.
We propose a method that detects price impact, on a per-action basis, by measuring the timing synchronicity between a trader's actions and subsequent adverse market events. The method at heart is a test for statistical \emph{surprise} in the timing of adverse events post trader action.
We must be clear in that we do make a leap of faith here and assume that surprisingly fast adverse market events are evidence of causation and that the action triggered them -- a direct signature of impact and information leakage.
Validating it requires real execution data; we set out the empirical tests that would do so.
We consider the role of a continuum operational time u and its mapping to calendar time t and how these relate to event time for option pricing problems. We derive option-pricing equations from an operational-time Markov lattice rather than from a calendar-time diffusion. The primitive model is a nearest-neighbour log-price lattice with state- and time-dependent transition probabilities. Its Chapman-Kolmogorov decomposition yields discrete forward and backward equations, which converge under local finite-variance scaling to the usual continuum adjoint pair. In price variables, the backward equation gives a generalized European pricing PDE and reduces to Black-Scholes-Merton under the risk-neutral drift restriction and constant volatility. Interpreted as a reaction-boundary model for limit-order-book mid-prices, the construction identifies local volatility with an activity-rescaled risk-neutral bid-ask reaction-boundary variance. The framework separates the operational kernel, calendar-time projection, and pricing-measure choice, to clarify how unspanned clock, jump, or renewal risks can lead to incomplete-market pricing.
This study investigates whether regime-dependent volatility forecasting and machine-learning-based return prediction can be jointly integrated to improve both statistical forecasting performance and economic strategy outcomes in equity markets. Using high-frequency CSI 300 Index data from 2005 to 2023, a sequential twostage framework is developed. In the first stage, realized volatility is modeled using regime-augmented HARQ specifications combined with Markov-switching GJR-GARCH filtering to capture long-memory dynamics, asymmetry, and structural market regimes. In the second stage, volatility forecasts, regime indicators, and return-related predictors are incorporated into an XGBoost return-prediction model estimated through a strictly walk-forward out-of-sample procedure. The empirical results demonstrate that regime-aware volatility forecasting consistently outperforms baseline HARQ models across forecast evaluation metrics and is generally supported by formal forecast comparison tests. In contrast, return predictability remains weak, state-dependent, and concentrated primarily in low-volatility regimes. Although naive predictive trading strategies generally fail after accounting for realistic transaction costs, carefully designed implementations incorporating volatility scaling, low-volatility gating, threshold calibration, and turnover controls can improve defensive economic performance. The findings suggest that the practical value of predictive systems in financial markets may depend less on generating strong unconditional return forecasts and more on transforming weak state-dependent signals into economically robust portfolio allocation rules. Overall, the study contributes by integrating econometric volatility modeling, regime classification, machine-learning return prediction, and implementation realism within a unified framework.
This paper axiomatizes the bid-ask market maker's quoting rule. A quoting rule maps the maker's state, namely inventory, belief, variance, trade intensity, and informed-trader fraction, to a bid-ask pair. Eight natural axioms, together with six environmental assumptions on the maker's inventory cost, force a unique three-parameter family: the mid-quote is linear in inventory, and the spread decomposes additively into inventory and adverse-selection components. Each of the three parameters is identified from a distinct moment of the observable quoting rule, with the three identifications mutually decoupled. The eight axioms partition into a four-axiom indispensable core, one structural choice, and three modularity extensions. Two structural corollaries follow: the latent inventory cost function is recoverable from the limit order book, and a sharp phase transition separates a functioning regime from a frozen one. A closing meta-theorem identifies four features invariant across all admissible structural primitives within the axiom system. To our knowledge, this is the first forced-uniqueness axiomatization of the quoting rule.
We consider the interaction between centralized trading and decentralized Proof of Stake (PoS) blockchain ecosystems. Motivated by the increasing dominance of centralized exchanges and the institutionalization of crypto markets, we study how trading activities on centralized exchanges affect staking behavior, token allocation, and decentralization within a PoS blockchain. We formulate a continuous-time mean field model, where the miners simultaneously act as validators in the PoS protocol and traders in a centralized market with price impact. Under suitable assumptions, we establish the local well-posedness of the mean field system, and derive a semi-explicit characterization of the equilibrium trading strategy. Numerical results suggest that centralized trading activities may enhance staking participation, and promote decentralization of the staking distribution through market incentives. We also study the effects of transaction costs and token supply mechanisms on the equilibrium staking ratio and concentration profile. These results illustrate how market microstructure and centralized liquidity provision can exert significant influence on decentralized blockchain protocols.
This study addresses the optimal execution of large stock sell programs by introducing TT-DAC-PS (Twin-Target Deterministic Actor-Critic with Policy Smoothing), a deterministic actor-critic architecture that combines twin exponential-moving-average critic targets with pessimistic min backup, TD3-style target policy smoothing noise, delayed actor updates, and conservative Q regularisation to curb overestimation. Exploration uses Ornstein-Uhlenbeck (OU) noise with a hybrid schedule: deterministic episode-wise decay, variance-guided adjustment based on recent reward dispersion, and a Soft Actor-Critic (SAC)-style temperature that is learned and mapped to the noise scale. The environment integrates Almgren-Chriss (AC) trade impact with Limit Order Book (LOB) prices and volumes, normalised state features, per-step volume participation caps, and a utility-based reward. The trade execution algorithm is applied to LOB data for ten U.S. stocks. Performance is assessed against reinforcement-learning baseline algorithms, including Proximal Policy Optimisation (PPO), Soft Actor-Critic (SAC), and Advantage Actor-Critic (A2C), as well as alternative trade execution algorithms, including Time-Weighted Average Price (TWAP), Volume-Weighted Average Price (VWAP), and AC. The proposed model consistently reduces mean implementation shortfall percentage with competitive variance, outperforming classical baselines and standard reinforcement-learning benchmark models.
Large language models (LLMs) and agentic systems are increasingly proposed for financial trading, yet their reported performance remains difficult to compare because studies vary in data provenance, temporal split discipline, execution timing, turnover treatment, and transaction-cost modeling. This article presents a targeted topical review and reproducibility audit of execution realism in LLM-based trading research. A coded evidence matrix covering 30 trade-relevant primary studies is used to assess point-in-time controls, split transparency, held-out evaluation, cost and turnover treatment, execution semantics, universe definition, and artifact release. Across the audited sample, architecture reporting is generally clearer than the evaluation assumptions needed to judge whether a trading result is economically interpretable or reproducible. A 10-equity worked example is included only as a methodological scaffold to illustrate how explicit friction and timing choices can materially compress active-strategy results. The main conclusion is that the next useful step for LLM trading research is not only better agent design, but also clearer reporting standards for execution realism, reproducibility, and evaluation comparability.
We test whether large language models (LLMs) add value in commodity portfolio construction when the information set and implementation rules are held fixed across strategies. A Hawkish Agent (inflation-tightening prior), a Dovish Agent (growth-easing prior), a Debate Agent, and a deterministic z-score Rule Agent each receive identical FRED macro z-scores and route their tilt signals through the same portfolio engine. Across 124 weekly rebalancing dates spanning the 2023 U.S. rate peak and the 2024-2025 soft landing, all three LLM strategies outperform the Rule Agent in Sharpe terms; the Hawkish and Debate Agents record the largest gains (\Delta Sharpe = +0.044 and +0.040, both p < 0.10 under a block bootstrap) and preserve a net-of-cost advantage over the passive inverse-volatility benchmark at one-way trading costs up to 30 basis points, while the Rule Agent's thin margin over passive disappears at approximately 5 basis points.The Debate Agent does not outperform the best single agent (\Delta Sharpe = -0.004, p = 0.769); its contribution is bias correction -- averaging out the Dovish Agent's miscalibrated prior -- rather than deliberation-generated return. The performance advantage is concentrated in the soft-landing sub-period, the evaluation window spans a single rate cycle, and the reported $p$-values are unadjusted for multiple comparisons. Within these limits, the results suggest that an LLM acting as a constrained macro-interpretation function can add modest but economically meaningful value over a transparent rule layer, though the margin is small and its persistence beyond this sample is unknown.
Rejection criteria net positive on 4874 observations but profits hinge on three trades
abstractclick to expand
This paper measures hour-of-day effects, filter precision, fragility, and realised yield in a 15-day paper-traded deployment of an autonomous memecoin trading system on Solana decentralised exchanges. The 190-trade sample (March 29 to April 12, 2026) shows a 40.5 percent win rate, mean per-trade return of +0.62 percent, cumulative +117.7 percent (net SOL +0.039), skewness -1.21, excess kurtosis 6.61. A Mann-Whitney U test of three poorest-performing UTC hours (2, 13, 23) against the others yields U = 1,274, p = 0.22; directional but not significant at n = 190. The three hours were selected in-sample, so the comparison is exploratory, not confirmatory. A parallel counterfactual rejection-tracking system collected 4,874 forward-sample observations across 184 distinct rejection events. Of those events, 17.9 percent reached a 50 percent drawdown from reference within 24 hours; 26.0 percent of forward samples recorded the rejected token below half-reference. The filter stack avoided these realised drawdowns: evidence that the rejection criteria are net-positive against forward-market outcomes. Fragility is the principal caveat. Removing the top three trades (1.6 percent of sample) flips cumulative return unprofitable. Profitability rests on a small number of large winners and is structurally fragile. The dataset and audit script are deposited under CC-BY-4.0 (Zenodo DOI 10.5281/zenodo.20043302).
Method records price and liquidity of filtered candidates to judge trading filters against observed results rather than backtests.
abstractclick to expand
Algorithmic trading systems on decentralised exchanges (DEXs) reject most candidate tokens they evaluate. The counterfactual outcome of rejected candidates (what would have happened had the system entered) is rarely measured. This paper introduces Post-Rejection Follow-up Sampling (PRFS). A separate tracking subsystem samples each rejected token's price and liquidity at a configurable cadence, over a horizon of up to twenty-four hours. PRFS produces the data needed to evaluate filter precision against actual market outcomes of rejected candidates, not against synthetic backtest reconstructions. The methodology, data architecture, and deposit format are described in Section III. The companion dataset contains 67,000 forward-outcome observation rows across 2,997 rejection events spanning 457 unique mints, collected over a continuous eight-day window (2026-04-10 to 2026-04-19, UTC). Approximately 55 percent of rejection events receive at least one forward observation; coverage at the mint level is complete. The principal binding constraint on downstream classification is per-event horizon density, not event-level coverage. PRFS is dataset-independent. It generalises to any algorithmic decision system in which rejections substantially outnumber executions.
Identity from realized minus counterfactual returns constrains scaling at participant level and recovers square-root law when neutral
abstractclick to expand
Decomposing impact as the difference between realized and counterfactual returns and requiring both to be diffusive yields an identity that restricts admissible impact scaling at the level of individual participants. This constraint implies the square-root law in the information-neutral regime and a crossover to linear impact under strong informational coupling, consistent with empirical observations. In the weak-coupling regime, cumulative market impact is itself diffusive -- a diagnostic that many propagator and latent liquidity models fail to satisfy.
We model a market with multiple dealers who compete for client order flow by dynamically updating their bid and ask quotes for a risky asset. Dealers aim to maximise expected profits while controlling inventory risk by skewing their quotes to attract offsetting order flow (internalisation) or by directly offloading positions in the market (externalisation). Using a variational approach, we derive a closed-form equilibrium for the resulting Nash competition, shedding light on key features of dealer market dynamics. We show that dealers relying on internalisation are compelled to increase their externalisation activity when competing with externalising dealers. This strategic shift in equilibrium leads to significantly higher hedging costs for all dealers and substantially wider spreads for clients.
Simulations find price discovery benefits can offset adverse selection costs despite local fluctuations
abstractclick to expand
This paper studies how market informedness affects market makers' profitability in a computational market environment with heterogeneous learning agents. We develop an agent-based market model in which market makers differ in their information sets and inventory-risk aversion, prices form endogenously, fundamental values evolve exogenously, and market-taker order flow follows a state-dependent self-exciting process. The model provides a controlled computational laboratory for analyzing the interaction between informed trading, adverse selection, price discovery, and liquidity provision. We establish finite-horizon stability properties of the market-taker order-flow process and solve the market-making problem using multi-agent reinforcement learning with centralized training and decentralized execution. The results show that informed market order flow is particularly harmful when aggregate market informedness is low, exposing market makers to severe adverse-selection risk. However, as market informedness increases, market-maker profitability displays an overall upward trend despite local non-monotonicities arising from complex market dynamics and stochastic learning. This suggests that the price-discovery benefits of informed trading can offset its adverse-selection costs. The findings contribute to computational economics by showing how agent heterogeneity, endogenous price formation, and learning-based liquidity provision jointly shape market outcomes.
Arrovian fairness requires the weighted geometric mean while strategy-proofness requires the median, leaving only dictatorship.
abstractclick to expand
No deployed automated market maker lets its liquidity providers vote on the trading function. We show this is structural, not an oversight. On the weighted-product family with $n \geq 3$ assets, no aggregation rule is at once fair and strategy-proof. Arrovian fairness forces a unique form, the weighted Aitchison centroid, the weighted geometric mean of the providers' preferred pools. But fairness forces mean-type aggregation and strategy-proofness forces median-type, and the only rule that is both is a single-provider dictator. The obstruction is sharp: it vanishes at $n = 2$, where a fair strategy-proof rule exists. Under the Frongillo--Papireddygari--Waggoner equivalence, the centroid is Genest's logarithmic opinion pool, and the impossibility transfers to externally Bayesian pooling.
This study aims to determine whether the application of Deep Reinforcement Learning (DRL) as a specialized execution overlay can enhance pair trading in highly volatile cryptocurrency markets. Although classical implementations of the strategy have proven successful in traditional equities, they frequently exhibit rigidity and suffer from severe divergence risks when applied to high-variance environments. To address this need, this research introduces novel concepts. To construct a robust system, we developed a hierarchical "Filter-then-Rank" pair selection methodology and a proprietary "Fixed Risk, Adaptive Mean" execution model. The system employs a Proximal Policy Optimization (PPO) agent with a Long Short-Term Memory (LSTM) layer to govern execution decisions within strict deterministic risk management boundaries. Evaluated on 1-hour interval data from the Binance USD-M Futures market, the optimized RL policy achieved an out-of-sample performance that substantially outperformed the heuristic baseline. A stationary circular block bootstrap robustness check confirms that the agent's risk-adjusted outperformance is statistically significant at the 10 percent level. Although falling marginally short of the stricter 5 percent threshold, this result highlights the extreme idiosyncratic variance characteristic of digital assets. Ultimately, this thesis contributes to the quantitative finance literature by introducing a hybrid architecture that combines statistical arbitrage with DRL execution policies. Furthermore, it delivers a novel framework for safe reinforcement learning via deterministic shielding, proving that anchoring a neural policy to statistically robust boundaries successfully mitigates severe divergence risks.
We introduce the Polymarket-v1 Database: the complete on-chain trade archive of Polymarket's first-generation CTF Exchange on Polygon, spanning 2022-11-21 to 2026-04-28 and covering the full contract lifecycle from first settlement to natural termination. The dataset comprises 1.20 billion trade records across 1.30 million markets with $61 billion in nominal volume. Its defining feature is 100% ground-truth aggressor direction derived from the blockchain settlement layer, a property unavailable in existing prediction market archives, which rely on heuristic inference. We use this truth-aligned archive to benchmark standard microstructure tools and document three findings. First, the tick rule and bulk volume classification achieve near-random aggregate accuracy (49.83% and 50.51%), but this masks a systematic, correctable price-level gradient driven by positive trade direction autocorrelation and concentrated market-making -- two structural features of prediction markets that violate the mean-reversion assumption embedded in classical classifiers. Second, these classification errors propagate into downstream metrics: inferred VPIN diverges substantially from ground-truth VPIN, and OFI estimates are directionally biased, with material consequences for Transaction Cost Analysis. Third, ground-truth microstructure quality predicts forecasting performance in ways that classification-based proxies cannot recover: True VPIN positively predicts Brier scores, while Gibbs spread negatively predicts them -- a selection effect reflecting that high-spread niche markets attract informed specialists rather than noise traders. Replacing ground-truth metrics with classified proxies attenuates both relationships, illustrating that measurement accuracy at the transaction level is a prerequisite for reliable inference about prediction market design and probability calibration.
We study the robustness of AMM-based on-chain price oracles to strategic manipulation. An attacker trades against constant product automated market makers (CPMMs) to distort an on-chain oracle, arbitrageurs restore cross-pool and cross-venue consistency, and an oracle designer chooses how to aggregate pool quotes.
Taking an efficient-market-hypothesis (EMH) view of the off-chain "true" price, we define the \emph{cost of manipulation} as the minimal mark-to-market loss that an attacker must incur to move the oracle by a given multiplicative factor. For independent CPMMs, we derive closed-form single-pool manipulation formulas and solve the attacker-designer game for weighted means and weighted medians, showing that liquidity weights maximize the minimum cost of manipulation within these classes for weighted medians (for any distortion level) and, for weighted means, locally as the distortion tends to zero. For larger distortions, weighted means become more fragile: optimal weights can depend on the target distortion and no single choice is uniformly optimal across distortion levels. In a frictionless CPMM model with cross-pool arbitrage, the manipulation cost depends only on the total quote depth and coincides across symmetric aggregators.
We extend this framework to multi-asset star architectures, confirming that liquidity weights remain optimal in the same sense. Finally, we bridge theory and practice by incorporating dwell times and rate limits, providing a quantitative yardstick to size oracles against the explicit economic costs of attack.
We consider the problem of estimating the true Sharpe ratio of an asset selected for having the highest observed in-sample Sharpe ratio among many assets. We discuss estimators based on the polyhedral lemma, James Stein shrinkage, debiasing the expected maximum Sharpe ratio, thresholding and empirical Bayes. We test these estimators in simulations, computing bias and root mean square error across different values of sample size, number of assets, and spread and shape of population Sharpe ratios. We also compute rank correlation of the estimators against the underlying quantity, simulating how these estimators might be used to compare or rank the output of different teams which perform this selection process. We find that the James Stein estimator provides the best performance across many different realistic values of the relevant parameters, followed by the GMLEB estimator of Jiang and Zhang. These results are fairly robust to correlation of asset returns, with some caveats.
In inventory market making, the running-penalty coefficient $\phi$ of the Cartea-Jaimungal framework and the risk-aversion parameter $\gamma$ of the Avellaneda-Stoikov framework are typically treated as independent free parameters, calibrated separately. We show that they are in fact not independent. A small set of axioms on the market maker's dynamic preference functional, namely cash-additivity, normalization, concavity, strong dynamic consistency, and law-invariance, forces the preference functional to be the entropic certainty-equivalent on liquidation-adjusted terminal wealth, parametrized by a single positive scalar $\gamma$. The Avellaneda-Stoikov framework is the unique representative of this axiom class. The Cartea-Jaimungal framework is its second-order Taylor expansion in inventory magnitude, with the running coefficient forced to $\phi = \gamma\sigma^2/2$ and (under a mild regularity condition on the liquidation cost) the terminal coefficient forced to $\alpha = \frac{1}{2}L''(0)$. The two frameworks, typically presented as competing alternatives with the choice between them driven by tractability, are different manifestations of a single underlying object. The forced relation is invertible, $\gamma = 2\phi/\sigma^2$, giving a consistency cross-check on independently calibrated desk parameters.
This paper analyzes transaction fees on blockchains by considering that they form a priority queue and users play a queueing game. Using an M/G^K/1 priority queue model, we provide new insights into the dynamics governing transaction fees and their impact on user behavior. We derive semi-closed form expressions for steady-state quantities and extend the relationship between user delay costs and transaction fees to general block generation times. We apply the model to the Bitcoin network and simulate user responses under various scenarios. Cross-chain analysis across Bitcoin, Dogecoin, and Litecoin reveals similarities in normalized cost structures.
Hit ratio is a common service metric for electronic corporate bond market making, but raw hit-ratio targets can be economically misleading when client flow has heterogeneous adverse-selection content. This paper extends a stochastic-control framework for OTC bond RFQ market making with hit-ratio constraints by replacing raw hit ratio with a residual-quality-adjusted hit ratio. The key modelling distinction is that adverse post-trade markouts are first decomposed into observable credit factors, carry/rolldown, issuer-relative-value effects, index or ETF demand effects, and residual adverse selection. Only the residual component is treated as client-flow toxicity. The resulting control problem remains tractable: after dualizing the quality-hit-ratio penalty, the HJB retains separable Hamiltonians, and the dual variable is the solution of an exact one-dimensional nonlinear fixed point for each targeted tier. Under a quadratic value-function approximation, optimal quotes decompose into a riskless spread, inventory skew, credit-alpha skew, residual-toxicity charge, and quality-hit-ratio subsidy. Synthetic multi-bond simulations with nonlinear dual solves illustrate that raw hit-ratio targeting can subsidize residual-toxic flow, while residual-quality targeting reallocates service toward low-residual-toxicity flow and improves the attained service/economics frontier. A final reduced-form extension studies inventory-recycling value through risk-aware style-aligned client-flow warehousing. Sweep or portfolio-trade opportunities fill randomly, and participation is sized using the same quadratic value approximation as the RFQ quoting problem. A passive/index-demand experiment is reported in the appendix as a special case of forecastable client flow. The numerical evidence is synthetic and mechanism-oriented; no proprietary RFQ data are used.
Cryptocurrencies are increasingly adopted as investment assets, making their interactions with traditional financial markets central to cross-asset diversification and systemic risk. This paper studies the integration of cryptocurrencies, fiat currencies, and S&P500 equities using a balanced panel of 381 assets from October 2017 to February 2024. We combine rolling correlation networks, community structure, market-specific and system-wide Turbulence Indices, and VAR-based connectedness analysis to examine how market stress, network structure, and shock transmission vary across financial regimes. The results show that cross-asset integration is episodic. In calm periods, the three asset classes remain relatively segmented, whereas under stress, local clustering increases, modular separation weakens, and communities become more compositionally mixed across asset classes. Connectedness analysis further shows that regime shifts alter the structure of transmission rather than simply increasing spillover magnitudes. In high-turbulence states, fiat-market turbulence becomes the dominant propagation channel, while network clustering and modularity play a greater role in transmitting forecast uncertainty. These findings support the interpretation of network structure as an emergent, state-dependent transmission layer rather than a persistent exogenous driver of turbulence. The results highlight the need for regime-aware risk monitoring, since full-sample connectedness estimates can understate the cross-asset coupling that emerges precisely when diversification benefits are most fragile.
Evaluating whether large language model (LLM) agents can profit in capital markets is increasingly framed as end-to-end trading: place an agent in a historical market, let it trade, and measure portfolio returns. This setup is vulnerable to two evaluation failures. First, long backtests often overlap with the knowledge cutoffs of frontier LLMs, allowing memorized tickers, dates, prices, and market narratives to substitute for investment reasoning. Second, raw returns are a noisy proxy for stock-selection ability, since positive performance may come from market beta, style exposure, or favorable regimes rather than genuine alpha.
We introduce KTD-Fin (Knowing-To-Doing Financial Benchmark), an end-to-end stock-market trading benchmark that addresses both issues. KTD-Fin uses a data-side masking protocol to anonymize key identifiers and calendar information consistently across prompts and tools, separating historical market memory from investment decision-making. It also incorporates a Barra-style performance attribution framework that decomposes portfolio returns into market, style, and stock-selection alpha components.
Across ten frontier LLM agents evaluated on the Chinese CSI300 over a 2024--2026 window, masking substantially changes agent rationales, pushing them towards anonymized factor-based reasoning. Attribution analysis further shows that LLM agents' cumulative returns under leakage-controlled evaluation are largely explained by passive market and style exposure, with limited evidence of persistent stock-selection alpha. These findings suggest that financial LLM benchmarks should evaluate not only whether an agent makes money, but also whether the source of returns reflects transferable investment skill. We release KTD-Fin as a reproducible template for leakage-controlled and attribution-aware evaluation of LLM trading agents.
Motivated by the emergence of local groundwater exchanges, we construct and analyze stochastic models of dynamic groundwater markets. Our primary focus is endogenizing the price formation and groundwater pumping strategies in a closed market with stochastic groundwater allocations and opportunities for intertemporal transfer through rights banking. In our model, several agents, interpreted as farmers or agricultural districts, make competitive decisions on water consumption to produce a basket of goods, as well as on trading allocations among themselves, or banking them for future periods. We define the respective discrete-time non-zero-sum non-cooperative game and construct its sub-game perfect Nash equilibria characterized by the groundwater price process $\{p^\circ(t)\}$. We furthermore construct an algorithm to determine equilibrium strategies and prices through a machine learning approach on top of best-response iterations. Extensive numerical experiments illustrate dynamic phenomena, including the role of groundwater recharge dynamics, agents' risk aversion and groundwater allocations. Our model provides insights into competitive effects in environmental markets with banking features.
Cumulative subsidy to traders equals σ_v σ_ε² / √(σ_u² + σ_ε²) and matches LVR gap under order-flow observation
abstractclick to expand
We extend the closed-form privacy-subsidy result of Nakamura~(2026, arXiv:2605.15746) from the single-period Kyle model to continuous-time. A committed Bayesian automated market maker observes the aggregate order flow perturbed by an independent Brownian privacy channel of diffusion intensity $\sigma_\varepsilon$. Under the Markovian linear equilibrium, the price-impact coefficient is $\lambda = \sigma_v / \sqrt{\sigma_u^2 + \sigma_\varepsilon^2}$ -- constant in time -- and the cumulative expected transfer from the protocol's liquidity pool to traders over $[0,1]$ is $|\Pi_M| = \sigma_v \sigma_\varepsilon^2 / \sqrt{\sigma_u^2 + \sigma_\varepsilon^2}$. We then establish a structural correspondence between this cumulative privacy subsidy and Loss-Versus-Rebalancing (Milionis et al.~2022), identifying privacy-noise welfare as the order-flow observation analog of LVR's price observation gap. The result completes the continuous-time Kyle leg of the program of quantifying break-even fees for committed-AMM exchanges under privacy-aggregated information environments.
We study a finite-inventory risk-sensitive market making problem in which a dealer controls bid and ask quotes, faces Brownian midprice risk, and receives liquidity-taking orders through point processes with quote-dependent intensities. The objective is the certainty equivalent induced by exponential utility with terminal and running inventory penalties. We introduce an exact discrete entropy-regularized Bellman operator that applies log-sum-exp regularization to deterministic-action certainty-equivalent scores, rather than to a risk-neutral one-step reward. This distinction is essential because the exponential certainty equivalent does not commute with quote randomization.
For time step \(h\) and entropy parameter \(\lambda\), we prove uniform convergence to the unregularized continuous-time risk-sensitive value at rate \[
O\bigl(h+\lambda(1+|\log\lambda|)\bigr). \] We also prove certainty-equivalent performance bounds for the induced Gibbs policies under a fresh-sampling relaxed implementation, in which quote marks are sampled at potential fill events rather than frozen over a time step. Under a quadratic growth condition on the Hamiltonian in the relevant quote coordinates, these policies concentrate around the unregularized optimal quote set. Finally, we show that a lower-cost Hamiltonian-Gibbs proxy satisfies a certainty-equivalent performance bound of the same order as the exact Bellman Gibbs policy. Numerical experiments in an Avellaneda--Stoikov specification support the predicted scaling for discretization error, entropy bias, policy gap, quote concentration, and exact-versus-proxy consistency.
This paper develops a unified explicit solution theory for optimal execution through sequential limit-order placement in a limit order book. Rather than controlling only the trading speed of a metaorder, we determine how individual limit orders should be quoted over time. The model incorporates signal-dependent drift, price impact, inventory risk, and execution risk, with fills modeled by point processes whose intensities depend on the submitted quotes. We formulate four execution criteria: expected terminal wealth, expected terminal wealth with running inventory penalty, CARA utility of terminal wealth, and CARA utility with running inventory penalty. For general price-impact and inventory-penalty functions, we derive the corresponding HJB equations and show that all four problems reduce to a triangular finite-dimensional structure which can be solved explicitly, leading to fully explicit value functions and optimal quotes across all cases. We also prove well-posedness, admissibility, and verification results. The explicit formulas reveal connections between quoting strategies under different criteria, support long-horizon asymptotic analysis, and show numerically that signal-dependent drift can substantially affect optimal execution.
We explore the application of LLM-driven algorithm optimization to several common tasks in quantitative finance. MadEvolve, a general-purpose algorithm optimization framework inspired by DeepMind's Alpha-Evolve, was recently developed to optimize algorithms in computational cosmology. Here we demonstrate the utility of MadEvolve to optimize algorithmic trading strategies and alpha generation at the example of Bitcoin trading. On our simulation and backtesting setup, we achieve significant improvements on all tasks we considered, such as evolving feature sets for signal generation, optimizing separate components of the trading strategy, and jointly evolving the feature pipeline together with the execution strategy. Additionally, we compare our method to other agentic search approaches, specifically Claude Code, and carefully evaluate p-hacking probabilities on our simulation setup. Our findings strongly support the utility of AI-driven agentic and evolutionary algorithms for algorithmic trading and quantitative finance.
Ethereum block builders run sealed auctions among searchers, but nothing in the protocol forces a builder to honor the auction outcome after observing submitted bundles. This paper studies the commitment problem. We model a builder who defects with probability $\varepsilon$ and, upon defection, replicates a type-specific fraction $\gamma(\tau)$ of the winning MEV opportunity. Searchers anticipate this behavior and choose between a risky first-price bid and a safe deterrence bid that makes frontrunning unprofitable. The resulting equilibrium is piecewise, with the cost of imperfect commitment depending jointly on replicability and competition. Using the libMEV dataset, we estimate $\gamma(\tau)$ from right-tail bribe plateaus and decompose observed auction revenue against the surplus a defecting builder could capture. The results show sharp heterogeneity across MEV types: sandwich opportunities are already highly competitive, while naked arbitrage and liquidations leave substantially more surplus exposed to builder defection. Credible MEV auctions, therefore, require not only an auction format, but also constraints on the builder's ability to use observed bid and payload information ex post.
Awareness of self-reinforcing dynamics raises directional accuracy unevenly for three frontier models on dot-com and GFC data.
abstractclick to expand
We study how frontier large language models (LLMs) behave as financial forecasters during boom-bust market cycles when made progressively aware of Soros's theory of reflexivity. Standard AI-assisted forecasting treats the market as an exogenous system. Reflexivity theory holds otherwise: prices shape fundamentals, and every forecaster is a participative agent in the loop it analyzes. We evaluate three frontier models - GPT5, Claude Sonnet 4.6, and Gemini 3 Pro - under four accumulating zero-shot conditions across two historically distinct episodes: the dot-com bubble (1996-2001) and the global financial crisis (2004-2009). The primary metric is directional forecasting accuracy; we also report the Sharpe ratio of an implied long/cash strategy to capture the risk-adjusted economic value of the forecasts. All inputs are anonymized and normalized to guard against memorization. We find that conditions incorporating reflexivity awareness improve forecasting accuracy differently across models and context windows, revealing that the same theoretical awareness can produce qualitatively different forecasting behavior across frontier LLMs.
This paper investigates whether machine learning forecasts of hourly BTC-USDT returns can be converted into economically meaningful trading performance after transaction costs. Using approximately 70,000 hourly observations from 2018-2026, XGBoost, LSTM, and iTransformer are evaluated in a 27-fold walk-forward protocol. All three models produce positive gross trading performance in selected configurations, but naive sign-based strategies fail once transaction costs of ten basis points are imposed. A cost-aware execution filter, which prevents trades only when the forecast magnitude exceeds a transaction-cost-based threshold, sharply reduces turnover and restores profitability in selected configurations. The strongest long-only XGBoost strategy produces annualised returns above 65% with a Sharpe ratio above one. Additional tests show that technical indicators improve performance in selected cases, EGARCH-derived features do not provide uniformly robust gains, and XGBoost is descriptively stronger than the neural alternatives, although bootstrap evidence does not support formal statistical dominance. Loss-function and model-selection effects are secondary and statistically fragile. The results show that the main obstacle in hourly cryptocurrency trading is not only weak predictability, but also the way forecasts are converted into trades.
A per-trade transfer of μηΔ subsidizes traders at the expense of the liquidity pool under committed Bayesian pricing on noisy direction sign
abstractclick to expand
We derive a closed-form bid-ask spread and welfare decomposition for the Glosten-Milgrom 1985 sequential-trading model when the market maker observes the trade direction perturbed by a binary flip channel of probability $\eta$ -- a natural information-theoretic model of privacy mechanisms acting on the direction signal. Under a committed Bayesian market-maker pricing rule, the equilibrium spread is $\mu(1-2\eta)\Delta$, where $\mu$ is the informed-trader fraction and $\Delta = v_H - v_L$ the value range. The welfare decomposition identifies a per-trade transfer $\mu\eta\Delta$ from the protocol's liquidity pool to traders -- the "privacy subsidy", mirroring the Gaussian-Kyle analog established in prior work. The result extends the privacy-subsidy concept from continuous Gaussian to discrete two-state microstructure, demonstrating robustness across both classical models. Primary application: MPC-based matching engines with $\varepsilon$-differentially-private direction disclosure, where the engine prices on a noisy direction signal.
Walk-forward tests on MNQ five-minute data show accuracies stuck at the 51.8 percent base rate.
abstractclick to expand
This paper compares gradient boosting and long short-term memory (LSTM) architectures for intraday directional prediction in Micro E-Mini Nasdaq 100 futures (MNQ). Motivated by recent foundation-model research on financial candlestick data, including the Kronos architecture, we test whether five-minute OHLCV bar sequences contain exploitable sequential predictive structure at the scale of a single instrument dataset. Using 944 trading days from 2021-2025, four model configurations are evaluated under strict expanding-window walk-forward validation across three out-of-sample periods. The target variable is whether the session close exceeds the 10:30 AM open by more than ten points. No configuration produces statistically significant out-of-sample accuracy above the 51.8% base rate. Combined OOS accuracies range from 50.00% to 50.89% across gradient boosting variants, while the LSTM achieves 50.59%. Permutation tests yield p-values of 0.135 for the best gradient boosting model and 0.515 for the LSTM, indicating no statistically significant predictive edge. Feature importance instability across walk-forward folds suggests noise fitting rather than stable structural signal capture. The results indicate that four years of single-instrument five-minute OHLCV data are insufficient for reliable sequential ML-based intraday forecasting. The primary contribution is a documented evaluation of a Kronos-inspired architecture on a constrained real-world dataset, providing an empirical lower bound on data scale requirements for sequential financial ML.
This paper develops a model to evaluate the viability of blockchain markets as the sole venue for price formation. Blockchains clear at discrete intervals called block time, and transactions are executed sequentially according to priority fees paid by traders who compete for queue position. We show that these features undermine the viability of markets. Paid-priority ordering induces endogenous selection, where only traders with sufficiently high valuations participate. The participation cutoff rises with competition, which intensifies with lower information costs or higher liquidity demand. This hinders price discovery and biases prices. It also impairs liquidity: the cutoff concentrates trading among aggressive traders and increases adverse selection that liquidity suppliers absorb in a single clearing round. Although longer block times enhance consensus security, they amplify these effects and can cause markets to shut down.
This study develops and evaluates a deep reinforcement learning framework for dynamic portfolio allocation across global equity markets. The Soft Actor-Critic algorithm is used to learn continuous portfolio weights within a Markov Decision Process, incorporating transaction costs, turnover penalties, and diversification constraints into the reward function. Five model configurations are compared, varying in reward formulation, policy structure (flat versus hierarchical Dirichlet), portfolio constraints, and temporal encoder (LSTM versus Transformer), and evaluated via walk-forward optimization across sixteen out-of-sample folds spanning 2003-2026 on the Nasdaq-100, Nikkei 225, and Euro Stoxx 50. Results show that RL strategies achieve competitive risk-adjusted performance primarily in the Euro Stoxx 50, where statistically significant abnormal returns are observed, but the central hypothesis is only partially confirmed: no strategy achieves statistically significant excess returns relative to Buy and Hold under HAC-robust inference across all markets. Regime analysis reveals that RL adds the most value during periods of elevated uncertainty, while ensemble aggregation across markets improves risk-adjusted performance and confirms the benefits of geographic diversification.
Privacy-preserving cryptocurrency exchanges alter what the pricing mechanism observes about order flow. We derive the unique linear Kyle equilibrium when a committed Bayesian market maker observes order flow perturbed by independent Gaussian privacy noise. The price-impact coefficient and informed-trader strategy rescale by reciprocal factors of the privacy parameter (one down, one up), so their product is invariant. A welfare decomposition then identifies a closed-form per-period transfer from the protocol's LP pool to traders -- the "privacy subsidy", the break-even fee any privacy-aggregated exchange must charge. The result is the single-period closed-form privacy-noise analog of Loss-Versus-Rebalancing (Milionis et al. 2022). The primary application is shielded AMMs with explicit additive-noise injection (e.g., differential privacy); related designs (batched swaps, sealed-bid auctions, oracle-pegged crossings) require separate frameworks that we leave to future work.
In algorithmic markets, predictive models become part of the data-generating process they aim to forecast. Once their outputs are converted into trades, allocations, execution schedules, or risk controls, they change the future data on which they are evaluated. I introduce algometrics, a framework for time series whose evolution depends on the predictive algorithms forecasting them. The framework distinguishes historical risk, measured under passive forecasting, from deployment risk, measured when forecasts drive actions. I prove three results. First, deployment risk is not identifiable from passive historical data alone: even in a one-step linear feedback model, infinitely many algorithm-mediated environments induce the same historical law while implying different deployment risks for the same forecaster. Second, historical model rankings can invert under crowding, so a predictor with lower passive error can have higher deployment error once similar algorithms are adopted. Third, randomized or instrumented actions identify short-horizon linear feedback, and I derive a finite-sample bound for deployment-risk estimation. These results suggest that time-series benchmarks in algorithmic markets should report feedback sensitivity alongside predictive accuracy.
RED-2400 is a public benchmark of 6,660 algorithmically-rejected trading events from a live Solana decentralised-exchange filter stack, observed continuously over 22 calendar days (2026-04-10T21:10Z through 2026-05-02T21:48Z, UTC). Each rejection event is linked to its post-rejection price-and-liquidity trajectory. The deposit contains 169,123 forward-outcome observations and 1,837 graveyard-tracker lifecycle snapshots, covering 1,076 distinct mints in the rejection registry and 1,075 in the forward-observation file. Outcome labels follow the five-tier classification rule introduced by a related methodology paper [Kamat 2026c]. The deposit includes a lifecycle-tracker file that permits external validation of any subset of those labels against observed token-lifecycle ground truth. Filter labels are anonymised to filter_1 through filter_8; source-collector identifiers to source_a and source_b. Liquidity and 24-hour volume are quantised to the nearest power of two, preserving heavy-tailed shape while preventing operational-threshold inference. This is the first window of a planned series; subsequent windows will extend the time horizon and enable regime-stratified analysis. "RED-2400" is a brand name, not a count; current cohort sizes are listed below and do not equal 2,400.
Prediction markets cannot exist without market makers, arbitrageurs, and other non-retail liquidity providers, yet the supply-side microstructure of Polymarket-class venues has not been characterized at on-chain pseudonymous-address scale. This paper studies non-retail participation on Polymarket using an empirical run on the PMXT v2 archive over 2026-04-21 through 2026-04-27 (13,356,931 OrderFilled events; 77,204 addresses with five+ fills; 43,116 markets).
We report three findings. First, Polymarket's off-chain CLOB architecture renders address-level quote-lifecycle attribution permanently unavailable: OrderPlaced and OrderCancelled events are off-chain and absent from public archives, so quote-intensity, two-sided-ratio, and posted-spread features cannot be built at address level. We document this as a structural validity-gate failure (G-QUOTE-LIFE universal fail) and restrict analysis to a six-feature fill-side vector. Second, density-based clustering (DBSCAN, fifteen sensitivity configurations) on the fill-side vector produces a single dense cluster with zero noise: fill-side behavior in the empirical window is uni-modal under the six-feature vector, contradicting the pre-registered hypothesis of four-to-five separable archetypes. Third, robust retail vs non-retail separation is achievable through clustering-independent feature-tier stratification: whale-tier, high-frequency-operator, and power-trader tiers jointly hold 81.4% of total notional across 12.6% of addresses.
Address-level market-making and liquidity-provision claims are withdrawn per the G-QUOTE-LIFE failure; spoof-by-non-fill manipulation detection is downgraded to market-level book diagnostics. A privacy-respecting derived-dataset deposit accompanies the paper as Bundle 3 of the PMXT family. Fourth paper in a four-paper programme on event-linked perpetuals and leveraged prediction-market microstructure.
Pre-market conditions mark mornings with drift and afternoons with reversal, yet every tested rule fails after transaction costs and year-by
abstractclick to expand
This paper constructs and validates a composite day-classification system for Micro E-Mini Nasdaq 100 futures (MNQ) using three pre-market observable conditions: first-30-minute return magnitude, overnight gap magnitude, and abnormal opening-bar volume relative to a rolling baseline. Using 947 regular trading days of five-minute data from 2021-2025, we find that classifier-positive days exhibit statistically distinct intraday behavior, including directional morning drift followed by systematic late-session reversal. Despite these descriptive characteristics, all tested directional trading strategies fail institutional validation standards after transaction costs and multi-year consistency requirements are applied. The highest-performing configuration achieves T = 1.46 and mean net +7.80 points but fails year-stability criteria. The primary contribution is the validation of the Volatility-Volume-Gap (VVG) classifier as a descriptive regime-identification framework and the documentation of failed attempts to convert these statistical patterns into deployable trading signals under realistic execution constraints.
We introduce When Alpha Disappears, a paired evaluation benchmark for diagnosing decision-time leakage in financial machine-learning backtests. Rather than treating leakage as a binary property, the benchmark estimates protocol-induced inflation by toggling one evaluation convention at a time around a clean $t{+}1$-open reference, while holding the data panel, walk-forward split, model family, horizon, portfolio rule, and cost convention fixed. Across two daily-OHLCV equity panels, six model families, and yearly tests from 2016--2024, we find that inflation is highly selective: centered temporal features and same-day-open execution with post-open daily-bar information cause large and stable increases in both predictive and trading metrics, whereas global normalization, future-informed graph structure, and same-day-close execution are weak in most settings. The benchmark is diagnostic rather than a claim of tradable alpha, and is intended to make evaluation assumptions, failure modes, and protocol fragility directly measurable.
We show that under mild assumptions, the total value of information to informed traders in the market can be measured by the covariance between price changes and order flow. This covariance captures noise trader losses, which equal informed trader gains when market making is competitive. We estimate the value of information using high frequency data on US equities at about $3.5 million per year for the average stock. The aggregate value of information is about 0.04% of market cap, which is considerably lower than the 0.67% in fees investors pay each year searching for superior returns (French 2008). We discuss potential resolutions for these puzzling findings.
In event-linked markets the two-axis taxonomy shows linear scaling for price manipulation, threshold shifts for outcome manipulation, and a
abstractclick to expand
The introduction of leverage on prediction-market event contracts raises three structurally distinct questions that have not been addressed jointly: how leverage changes manipulation incentives, how it interacts with informed-trading rents, and how regulatory frameworks should respond. This paper develops a theoretical framework for the first two and a synthesis of the existing regulatory landscape for the third. The principal analytical move is a two-axis manipulation taxonomy distinguishing market-price manipulation from real-world outcome manipulation, where the manipulator affects the underlying event itself. Continuous-underlying derivative markets generally do not make outcome manipulation a venue-level payoff channel; event-linked markets do. Within this taxonomy, leverage plays asymmetric roles: it scales market-price manipulation linearly but shifts the cost-benefit threshold for outcome manipulation, and it scales informed-trading rents in three ways (direct multiplication, Sharpe-ratio preservation, detection-cost amortization). Section 7 connects Paper 1's pre-emption and halt-protocol findings (CC-007b, CC-008) to three manipulation channels: pre-emption introduced by the dynamic-margin engine, halt-arbitrage introduced by the resolution-zone halt protocol, and strategic bad-debt-shifting that no engine in Paper 1's framework family addresses. The framework's manipulation-resistance contribution is a re-allocation of attack surface, not a net reduction. The regulatory synthesis covers principal jurisdictions (US, EU, UK, Singapore, offshore) and identifies three regulatory-arbitrage pathways. The paper concludes with 14 recommendations for venue operators, regulatory bodies, and the research community, separated into framework-independent and framework-conditional categories.
Organized by four design axes, each with payoff rules, inheritance maps and test criteria for historical data.
abstractclick to expand
Paper 1 of this research programme develops a resolution-aware risk-design framework for the simplest event-linked perpetual: a contract whose underlying tracks a single binary prediction-market probability through resolution. The instrument class is broader. Variants span conditional probabilities P(A|B), spreads p^A - p^B, weighted baskets sum w_i p^(i), derivatives on variance or entropy of the probability process, contracts on liquidity itself, perpetual-on-expiring-event roll structures, and funding-only derivatives with no settlement. Each variant inherits some framework components from the single-market binary case and requires its own design adaptations. This paper develops a formal taxonomy of seven pure-form canonical variants beyond the probability-index perpetual of Paper 1, organised along four orthogonal design axes: underlying geometry, temporal structure, settlement structure, and venue composition. The list is not exhaustive; combinations are not treated separately. For each variant we provide a precise payoff definition; an inheritance map identifying which Paper 1 components carry over, are modified, or fail; variant-specific design constraints; microstructure properties; empirical evaluability on the PMXT v2 archive; and limitations. Notable findings: the conditional variant admits a candidate non-portability proposition (denominator instability as the conditioning event becomes improbable); the spread variant requires a three-channel decomposition of resolution risk; the volatility/entropy variant avoids random binary terminal-collapse but introduces estimator-convention and entropy-decay issues; the basket variant requires multi-period jump-aware margin whose aggregation is correlation-dependent. The paper is theoretical primarily; it specifies how demonstrative time series can be constructed and provides evaluability criteria to guide future work.
Counterfactual tests on 13k Polymarket archives show standard designs fail on resolution jumps; new framework distinguishes execution risk (
abstractclick to expand
We develop and counterfactually evaluate a resolution-aware risk-design framework (PIRAP) for perpetual futures whose underlying tracks a single binary prediction-market probability through resolution. The framework specifies six components: an index estimator combining mid-price, depth-weighted mid, and time-decayed VWAP; jump-aware tiered margin sized against bounded-event terminal-collapse magnitude; leverage compression schedule contracting toward resolution; resolution-aware funding rule with boundary-aware correction; a multi-stage halt protocol; and an eligibility framework. Two formal non-portability propositions establish that standard basis-only funding paired with continuous-vol static margin fails on bounded-event underlyings. Empirical evaluation uses Polymarket's PMXT v2 archive for 2026-04-21 to 2026-04-27 (13,298-market analysis sample passing adequacy gates from 61,087 ingested; 13,115 resolved within the empirical window for E3). E1 evaluates two pre-registered stylized facts; E2 conducts counterfactual replay across three engine configurations; E3 isolates the resolution-zone protocol's contribution. Results are mixed. Five pre-registered floors: stylized-fact floors (boundary depth asymmetry, terminal-jump magnitude) PASS; welfare-side directional floors (final-hour liquidation -6%, drawdown -5.1% pooled, median PnL +14%) two FAIL one PASS; E3 mechanic floors (final-hour liquidation -80% by halt construction PASS; bad-debt frequency +2.4% FAIL). Three of five materiality floors fail: the framework as specified does not validate deployment, but the empirical record establishes a halt-versus-margin scope distinction (halt addresses execution-channel risk; terminal-jump bad-debt remains margin-side) and documents a pre-emption trade-off constraining the dynamic-margin component. The paper concludes with structural recommendations and explicit non-deployable status.
Accurate stock price forecasting has consistently remained a pivotal yet challenging FinTech task that underpins quantitative trading and investment decision making. Recent efforts have been dedicated to modeling various complex relationships among stocks in the stock market toward more reliable stock price forecasting.These methods depend heavily on strong static prior assumptions by modeling either temporal dependencies within individual stocks or spatial dependencies across different stocks based on predefined structures, while the complex market dynamics that drive stock price movements remain unexplored. To alleviate this issue, we propose a novel game-theoretic modeling method that captures heterogeneous investor interactions for stock price forecasting. The core idea is to embed game-theoretic mechanisms into the heterogeneous graph structure to finely model the dynamic strategic interactions among heterogeneous investors with respect to target stocks. Additionally, temporal positional encoding is adopted to reflect the differentiated influences of each game event at different time steps within the time window on future stock price movements. Leveraging heterogeneous graph networks, we proxy the intricate dynamics of the stock market through investor games and enable real-time information propagation and node updates among all nodes. Extensive experiments conducted on two real-world benchmark dataset demonstrate that our method effectively outperforms state-of-the-art stock price forecasting methods.
We study permissionless spot--perpetual basis trading in decentralized finance as a collateral control problem. The strategy holds spot inventory, hedges directional exposure with a short perpetual, and allocates capital between spot inventory and derivative margin under on-chain liquidity and execution frictions.
The paper delivers three results. First, it solves a static control problem for the collateral share and shows that the risk-constrained formulation provides a more robust operating benchmark relative to the economic optimum. In comparative calibration, the required collateral rises monotonically under volatility stress. The collateral is the lowest for BTC and increases significantly for long tail assets such as LINK and DOGE. Second, the paper derives an asymmetric dynamic extension in which the lower boundary of intervention is solvency driven, and the upper boundary is determined by a trade-off between carry-loss and the cost of rebalancing. Monte Carlo simulation shows that the lower boundary remains structurally relevant, whereas meaningful interior upper triggers survive mainly in the regimes with high carry and low costs. Third, the paper validates an execution-aware implementation with live routed execution and historical backtests. The execution layer shows that the realized wedges are significant, but become worse in the case of selling the basis. This justifies a minimum effective rebalancing size and a positive execution buffer. The historical validation shows that in the case of a fixed control rule the realized performance is predominantly explained by the funding environment.
947 days of five-minute bars show gross edges capped below two-point round-trip costs for all fourteen signal families tested
abstractclick to expand
This paper tests whether intraday momentum signals derived from open-high-low-close-volume (OHLCV) data produce a statistically significant trading edge in Micro E-mini Nasdaq 100 futures (MNQ) under realistic execution constraints. Using 947 trading days of five-minute data (2021-2025), fourteen signal families are evaluated, including opening range breakouts, gap strategies, volume signals, cross-session momentum, liquidity grabs, volatility-conditioned classifiers, and news-driven strategies. All signals are assessed using strict institutional criteria: out-of-sample walk-forward validation, minimum T-statistic of 2.0, at least 30 trades, positive net return after a fixed two-point round-trip cost, and multi-year stability. No signal satisfies all criteria simultaneously. The gross edge available to next-bar-open execution is constrained to approximately 0.07-1.50 points per trade, insufficient to overcome transaction costs. A gap-continuation signal achieves T = 3.23 and +14.52 points but fails minimum sample requirements (N = 22). Two validated signals from a separate research program are included as positive controls, confirming the methodology detects genuine edge when present. The primary contribution is a reproducible falsification framework and a documented null result, highlighting structural limits of OHLCV-based intraday strategies.
Multi-agent LLM systems fail in production at rates between 41% and 87%, mostly due to coordination defects rather than base-model capability. Existing responses split between cataloguing failure modes empirically and shipping declarative orchestration frameworks as engineering tools; neither delivers a principled mapping from coordination configuration to predictable failure-mode signature. We argue that coordination should be treated as a configurable architectural layer, separable from agent logic and from information access, enabling architectural reasoning rather than only engineering productivity.
We instantiate this with an information-controlled design on prediction markets: a single LLM, fixed tools, fixed per-call output cap, and fixed prompt template across five reference coordination configurations, with total compute per question treated as an endogenous architectural output. The Murphy decomposition of the Brier score separates calibration from discriminative power, so configurations leave distinguishable signatures even when aggregate scores coincide.
On 100 Polymarket binary markets resolved after the model's training cutoff (claude-opus-4-6) we report Murphy signatures, a cost-quality Pareto frontier, category-conditioned analysis, and a bootstrap power-projection. Three of five pre-specified predictions are upheld in direction; two configurations dominate the Pareto frontier within this regime; exploratory bootstrap intervals separate consensus alignment from others, though pairwise tests do not survive Bonferroni correction at n=100. We also deploy the same configurations as live agents on Foresight Arena under web-search-enabled conditions, as an on-chain replication channel accumulating in parallel. Harness, trace dataset, and production agents are released. We position this as a methodology-validating first instantiation, not a general cross-model claim.
Current post-trade clearing systems rely almost exclusively on cash or cash-like collateral, leaving vast reserves of short-term liquidity embedded in trade credit outside formal settlement infrastructures. A key barrier to integrating this liquidity is the near-universal dependence of clearing services on novation, which imposes institutional overhead that restricts accessibility and limits the range of obligations that can be brought into settlement.
This paper introduces the Cycles Protocol: a distributed, multilateral clearing mechanism based on double-entry accounting and atomic cycle execution that maximizes balance sheet compression. Unlike novation-based clearing, Cycles does not redistribute counterparty risk; it can thus be applied generally to existing financial networks, without any change in counterparty relations, allowing it to complement existing clearing systems and Central Counterparties (CCPs).
By representing commitments as edges on a unified directed graph, Cycles surfaces liquidity hiding within existing network structure. We focus here on two applications of Cycles to deepening secondary market liquidity: first, as a compression layer between existing clearing participants and CCPs; and second, as a means to incorporate the liquidity of the trade credit network into formal settlement, extending market clearing beyond financial obligations and into real-economy financing.
Sign-randomization checks account skill, heuristics flag insiders, and leakage scores measure per-market front-loading for a more precise
abstractclick to expand
April 2026 saw notable methodological convergence in the academic study of informed trading on decentralized prediction markets. Three approaches surfaced almost simultaneously: Mitts and Ofir (2026) apply a composite screen to over 210,000 wallet-market pairs; Gomez-Cram et al. (2026) apply an event-level sign-randomization test to Polymarket's complete transaction history, classifying 3.14% of accounts as "skilled winners" and separately flagging 1,950 accounts as "insiders" via a lifecycle heuristic; Nechepurenko (2026) develops the Information Leakage Score (ILS) framework, which quantifies per-market information front-loading at an article-derived public-event timestamp. This paper provides a methodological comparison. The central claim is that these are three distinct layers of detection, not competing methods on a single layer. Sign-randomization is best understood as an account-level test of persistent directional skill conditional on opportunity selection -- not a direct test of insider trading, and not a per-market measure. The heuristic insider flag is separate from the skill classifier, applies to a population the classifier excludes by design, and has unknown precision. The Polymarket sample pools politics, sports, crypto, and other categories with different information technologies, so a platform-wide "skilled winner" classification is mechanism-ambiguous. The January 2026 U.S.-Venezuela operation cluster, where the DOJ indictment of Master Sergeant Gannon Van Dyke provides a rare external enforcement benchmark, illustrates how the layers stack: lifecycle heuristics identify suspicious accounts; legal investigation addresses non-public-information possession; per-market scoring would quantify how much information was leaked into each contract. A combined pipeline gains in precision because each layer filters a different dimension.
Public-event timestamps produce 0.444 shift across zero versus resolution proxies on largest Polymarket case.
abstractclick to expand
This paper reports an end-to-end empirical evaluation of the deadline-Information Leakage Score (ILS-dl) extension introduced in the companion methodology paper. The deadline-ILS extends the original ILS to deadline-resolved prediction-market contracts, the dominant structural form of publicly documented insider trading on Polymarket. We anchor the evaluation in the 2026 U.S.-Iran conflict cluster of the ForesightFlow Insider Cases (FFIC) inventory, the largest documented deadline cluster. The evaluation has four parts: per-category exponential-hazard estimation, a single-case ILS-dl computation, cross-market wallet analysis, and methodological refinements.
Hazard-rate estimation produces an adequate exponential fit for military-geopolitics markets (KS p = 0.426, half-life 2.9 days, n = 18) and a preliminary fit for corporate-disclosure markets (n = 5). The regulatory-decision category is rejected as bimodal (p = 0.023). On the largest applicable FFIC contract ("US forces enter Iran by April 30," $269M volume), the article-derived public-event timestamp yields ILS-dl = +0.113 versus a resolution-anchored proxy value of -0.331: a 0.444 shift in magnitude on opposite sides of zero, demonstrating that the extension distinguishes signal from proxy artefact. Pre-event drift is mild, and short-window variants (30-min, 2-hour) are exactly zero. Cross-market wallet analysis identifies 332 wallets active in both major Iran-cluster markets, but the available trade history covers only the resolution-settlement window.
v2 (May 2026) corrects the hazard fit to the full Tier-3 population; the v1 estimate lies inside the v2 95% CI.
Financial markets such as bond, derivatives, and repo markets form networks of interdependent obligations. Existing multilateral netting methods typically trade off the extent of netting against preservation of counterparty exposure: central clearing reallocates exposure to a central counterparty, while trade compression may alter bilateral counterparty relationships. TradeMech is a mechanism for markets in which one or two homogeneous fungible objects are traded. The mechanism transforms a network of initial bilateral contracts into chains and cycles, nets the designated object multilaterally on those chains and cycles, and replaces initial contracts with multiparty contracts whose assigned trades remain fractions of the original bilateral trades. The construction achieves maximal multilateral netting of the designated object while preserving each agent's contractual profit and preserving the location of counterparty risk. When a party fails to pre-commit a required object, the affected assigned trade is recovered as a bilateral contract between the same original counterparties and the remaining assigned trades are re-netted on residual chains, so no new counterparty exposure is created.
VGRSI derived from visibility graphs on asset prices generates trading signals yielding ~$340,000 total profit across DJI30, EUR/USD and…
abstractclick to expand
Traditional technical analysis indicators, although widely used by market participants, are often not sufficiently effective. We propose the Visibility Graphs Relative Strength Index (VGRSI), based on backward visibility relations in the price of a financial instrument. Rescaled to the 0--100 range, it can generate profitable trading signals. The performance of the indicator was evaluated using an automated trading strategy based on a 30-day optimisation window and a 7-day test window for three instruments representing different asset classes: DJI30, EUR/USD and XAU/USD over the 2024--2025 period (503 trading days). The strategy based on VGRSI signals generated a profit of USD~146,000 for DJI30, USD~69,000 for EUR/USD, and USD~125,000 for XAU/USD. This gives a total result of USD$\sim$340,000, which corresponds to an average profit of USD$\sim$676 per trading day, with a fixed investment of USD~1,000 to open a single trade. For all three assets, the strategy generated substantial profits while maintaining a moderate drawdown (10--18\% relative to a portfolio value of USD~10,000), a relatively low trading intensity (3.3--4.8 trades per day) and high Sharpe ratio values (2.55--3.6). These results indicate that VGRSI constitutes a promising technical analysis tool that goes beyond the classical trend-following approach by exploiting the geometric properties of asset price fluctuations.
Conventional algorithmic trading systems are grounded in deterministic heuristics or offline-trained statistical models that cannot adapt to the semantic complexity of rapidly shifting market regimes. This paper introduces AGENTICAITA, an agentic AI framework that replaces the traditional signal then execute paradigm with a fully autonomous deliberative loop in which multiple specialized Large Language Model agents reason, negotiate, and act in concert - without any offline training or human intervention. The framework proposes four architectural contributions: (i) an Adaptive Z-Score Trigger Engine that acts as a cognitive resource allocator, gating LLM inference exclusively on statistically anomalous market conditions; (ii) a Sequential Deliberative Pipeline - the core agentic contribution - in which an Analyst agent, a Risk Manager agent, and an Executor agent form a structured reasoning chain governed by typed JSON contracts and a deterministic hard-gate safety layer; (iii) an Inference Gating Protocol, a mutex-based cognitive resource scheduler that serializes concurrent agent activations and ensures fully reproducible audit trails; and (iv) a Correlation-Break Diversification composite score that operationalizes portfolio-level idiosyncratic signal prioritization within individual agent reasoning. Validated over a five-day autonomous dry-run session under live market conditions, the framework demonstrates operational correctness of the deliberative pipeline, achieving 157 zero-intervention invocations across 76 assets with an 11.5% agentic friction rate that confirms non-trivial inter-agent negotiation. This preliminary proof-of-concept establishes the feasibility of training-free, deterministic safety-constrained multi-agent orchestration in financial decision loops, with statistically robust performance evaluation and execution cost modeling deferred to extended live deployment.
ForesightFlow is an Information Leakage Score (ILS) framework for detecting informed trading on decentralized prediction markets. For an event-resolved binary market, the score quantifies the fraction of the terminal information move priced in before the public news event. Three operational scope conditions (edge effect, non-trivial total move, anchor sensitivity) are stated as preconditions for interpretation. The score admits a Murphy-decomposition reading that connects label generation to the proper-scoring-rule literature.
A pilot empirical evaluation surfaces three findings. First, a resolution-anchored proxy for the public-event timestamp does not separate event-resolved markets from a matched control population (Mann-Whitney p = 1e-6, separation reversed), demonstrating that proxy quality is itself a binding constraint. Second, the article-derived timestamp on a single high-stakes case shifts the score by 0.444 in magnitude relative to the proxy and lies on the opposite side of zero. Third, an audit of the publicly documented Polymarket insider record reveals that documented cases are systematically deadline-resolved, falling outside the original ILS scope (0 of 24 FFIC inventory markets satisfied original scope conditions).
This last finding motivates a deadline-ILS extension introduced in Section 7, anchored at the public-event timestamp rather than the news timestamp, and equipped with a per-category exponential hazard baseline for the time-to-event distribution. The extension closes the gap between the methodology and the population in which insider trading has been empirically documented. An end-to-end evaluation of the extension on the 2026 U.S.-Iran conflict cluster is reported in a companion paper. We release the FFIC inventory, the resolution-typology classification of the 911,237-market corpus, and all code at github.com/ForesightFlow.
Population-scale test shows resolution ambiguity blocks analysis of nearly all markets and shifts focus to clearer rules.
abstractclick to expand
We carry the deadline-resolved Information Leakage Score (ILS-dl) framework of Nechepurenko (2026a, 2026b) from a single-case proof of concept to a population-scale evaluation across 12,708 Polymarket markets, October 2020 to April 2026. We frame the paper as a scope-discovery study: scaling reveals that the framework's effective domain is materially narrower than initial framing suggested, and the principal obstacle is not score computation but resolution semantics.
We report four findings. First, only 88 of 12,708 candidate markets (0.7%) yield computable ILS-dl values; only 1 of 32 markets in the ForesightFlow Insider Cases (FFIC) inventory is in scope, and 14 of 32 FFIC markets are flagged unclassifiable due to genuine resolution-criterion ambiguity. Second, only 12 of the 88 computed markets (13.6%) satisfy anchor-sensitivity, and an independent-second-pass T_event validation reaches 57.8% exact-date agreement, below the 90% ex-ante criterion. Third, raw ILS-dl medians are negative across all six (sub-bucket by period) cells, but a hazard-decay baseline correction we introduce yields a heterogeneous result: regulatory_formal post-2024 shifts to near-zero (-0.21 to -0.02), while regulatory_announcement post-2024 retains a 95% bootstrap CI entirely below zero. Fourth, the constant-hazard exponential of Nechepurenko (2026b) is rejected in favor of Weibull on the pooled post-2024 cell, but a per-subcategory check confirms the preference reflects category mixture rather than within-cell duration dependence.
The implication is that detection of informed flow requires methodological refinement on the resolution-typology and score-baseline axes, not only on the score-computation axis where prior work concentrated.
Prediction-market price moves are widely treated as informationally equivalent: a price jump is read the same way regardless of whether it reflects durable Bayesian updating, transient liquidity pressure, strategic position adjustment, or genuine disagreement. This paper formalizes the Signal Credibility Index (SCI) introduced in Nechepurenko (2026) as a stand-alone diagnostic. We make four contributions: (i) a revised persistence component using the persistence ratio PR(t,w) on logit prices, well-defined on short rolling windows; (ii) a weighted Cobb-Douglas form SCI({\alpha}\alpha {\alpha}) with flow-based concentration HHI_flow; (iii) a time-varying specification SCI(t; w) for real-time monitoring; and (iv) Monte Carlo validation including an out-of-distribution stress test, coordinated multi-wallet manipulation, and a logistic-regression benchmark. The validation establishes discrimination among designed microstructure regimes, not external evidence of downstream coordination effects. We document two failure modes consistent with the index targeting coordination credibility rather than pure information content: a Type II error on informed-but-concentrated whale repricing, and a Type I error on coordinated multi-wallet manipulation.
LLM agents are promising tools for empirical discovery, but their flexibility can also turn discovery into uncontrolled search. We study how to use agents under a reproducible protocol through cryptocurrency factor discovery. Our framework casts the task as sequential hypothesis search: an agent reads an append-only experiment trace, proposes falsifiable factor hypotheses, and maps them to executable recipes, while a deterministic engine enforces fixed data splits, selection gates, transaction costs, and portfolio tests. Candidate actions are restricted to a point-in-time factor DSL, making both successful and failed hypotheses auditable. A ridge-combined portfolio trained only on 2020--2022 data achieves a 44.55% annualized return and Sharpe ratio of 1.55 in the 2024--2026 pure out-of-sample period after a 5 basis point one-way trading cost.
Traditional moving average convergence divergence (MACD) trading rules are often constrained by signal lag and susceptibility to false signals. To address these limitations, this study develops a volume-price-adjusted MACD (VP-MACD) framework that incorporates volume, volatility, and intraday price structure into the conventional indicator, and introduces a sensitivity parameter to allow earlier trade entry and improve responsiveness to market movements. Using the S&P 500, Nasdaq-100, and Dow Jones Industrial Average as representative U.S. equity indices, the model is calibrated over historical records from 2018 to 2022 and evaluated out of sample over 2023 to February 2026. The results indicate that the proposed framework generally delivers better economic performance than the baseline MACD strategy in terms of profitability, risk-adjusted return, and downside-risk control, while generating fewer but more selective trading signals. These findings suggest that incorporating additional market information into technical trading rules may enhance signal quality in U.S. equity index markets.
We study the microstructure of Polymarket, the largest on-chain prediction market, using a continuous tick-level archive of the public order-book feed (30 billion events over 52 days) joined to the authoritative on-chain trade record. On a pre-registered stratified panel of 600 markets we report eight stylized facts: a longshot spread premium; a depth profile closer to uniform than to top-of-book; a null block-clock alignment effect; broad maker-wallet diversity with a concentrated tail; category-conditional effective-spread differences; a sub-50 ms median archive-ingestion delay with a multi-second tail; a self-counterparty wash share with median 1% and a 22% upper tail (well below Cong et al. 2023's 25-70% for unregulated crypto venues -- a sanity bound, not an apples-to-apples reference); and a cross-sectional depth profile explained by market duration, price level, and volume, with no residual time-to-close effect. The paper also contributes a measurement result: trade direction inferred from Polymarket's public order-book feed agrees with on-chain ground truth on only ~59% of buckets (panel mean 0.615, 95% CI [0.58, 0.65]), well below the ~80% Lee-Ready accuracy on Nasdaq. The effective half-spread changes sign between feed- and on-chain trade directions on 67%/50% of markets across two 7-day windows; Kyle's lambda on 60%/43%. Microstructure work on Polymarket therefore needs to source trade direction from on-chain OrderFilled events; we release a replication package that performs the join.
Prediction markets are widely treated as forecasting devices that reveal collective expectations about uncertain futures. This article argues that under specifiable conditions they also function as coordination mechanisms: public probabilities that organize the behavior of voters, donors, journalists, traders, and institutions in ways that can be self-fulfilling or self-defeating. Most existing work asks whether prediction markets forecast accurately; this paper asks whether accurate forecasting is even the right criterion for a market that has become a public coordination device. Drawing on transaction-level evidence from the 2024 U.S. presidential election, we show that the social force of a market signal depends less on its size than on its persistence, the breadth of responding trader types, and cross-platform consensus. We introduce a Signal Credibility Index (SCI) -- combining the variance ratio VR(6), a two-sidedness diagnostic, and a trader-concentration adjustment -- as a microstructure-grounded criterion for predicting when price moves acquire behavioral traction. Applied to three major 2024 political shocks, the framework reveals that superficially similar events generated qualitatively distinct signal types with different implications for elite coordination. A cross-platform comparison establishes a systematic decoupling of social authority from epistemic robustness: the most visible market produced the least accurate forecasts. The framework carries direct implications for regulating prediction markets as democratic information infrastructure.