Statistics10 min read

Combinatorial Purged Cross-Validation

Combinatorial Purged Cross-Validation (CPCV) is a framework for estimating how likely it is that a strategy's backtest result is an artefact of data-mining rather than genuine edge. It produces a single number — the Probability of Backtest Overfitting (PBO) — that summarises this risk.

Why standard cross-validation fails for strategies

Standard k-fold cross-validation splits data into k folds and rotates which fold is held out for testing. The problem in financial time series is that adjacent bars share information through overlapping labels, serial correlation, and regime persistence. Training on bars surrounding the test set leaks forward-looking information. Standard cross-validation overstates out-of-sample performance as a result.

CPCV addresses this with two modifications: purging (removing bars adjacent to the test boundary) and embargo (dropping a buffer of bars after the test period ends, since signal from the test period may persist into subsequent training bars).

The C(N, k) structure

Divide the full time series into N equal-length groups. CPCV generates all C(N, k) combinations of groups, where k groups form the out-of-sample (OOS) set and the remaining N − k groups form the in-sample (IS) set.

Number of folds = C(N, k) = N! / (k! × (N−k)!)

For N = 6, k = 2: C(6, 2) = 15 folds. Each fold produces one IS Sharpe and one OOS Sharpe. Unlike walk-forward, where each bar appears in only one OOS window, CPCV generates a rich distribution of OOS Sharpe ratios across all possible IS/OOS splits.

Probability of Backtest Overfitting

PBO is the fraction of the C(N, k) folds where the OOS Sharpe ratio is below zero.

PBO = #{folds where OOS Sharpe < 0} / C(N, k)

A PBO of 0.50 means half of all possible IS/OOS splits produce a strategy that loses money out-of-sample. Even if the in-sample result looks excellent, a PBO above 0.50 is strong evidence that you've fit noise.

PBO measures overfitting risk for a single strategy on a fixed parameter set. It does not account for multiple strategies being compared — for that, the DSR adjustment applies. CPCV and DSR are complementary, not alternatives.

Interpreting PBO

PBO < 0.10Very low overfitting risk. OOS performance is robust across splits.

PBO 0.10–0.30Moderate risk. Strategy generalises in most splits but not all.

PBO 0.30–0.50Meaningful overfitting risk. Treat results with caution.

PBO > 0.50More folds fail than pass. Strong evidence of curve-fitting.

The embargo

After each OOS period, an embargo window (5 bars by default in Kestrel Signal) is dropped from the start of the subsequent IS range. This prevents the model from learning on bars where market impact or regime continuation from the test period could still be present. Skipping the embargo overstates OOS performance, particularly for strategies with autocorrelated signals.

Requirements

CPCV requires enough data for the groups to have meaningful IS and OOS lengths. With N = 6 groups, each group covers roughly 1/6 of the data. With 5 years of daily data (≈1260 bars) and k = 2, each IS set covers about 4/6 × 1260 = 840 bars and each OOS set covers 2/6 × 1260 = 420 bars — entirely workable.

Strategies with very low trade frequency are problematic: if OOS windows contain fewer than 10–20 trades, the Sharpe estimate is too noisy to interpret. CPCV is most informative for strategies that trade at least a few times per month.

CPCV is available on Lab tier and above

Kestrel Signal runs CPCV automatically when the feature is enabled in your backtest configuration. The PBO score and fold-level Sharpe distribution appear alongside your main result metrics. Free and Researcher tier users see the DSR and walk-forward analysis; CPCV is a Lab+ feature due to its compute cost (15 full backtests per run with default settings).