Why Optimising for Sharpe Ratio Produces Fragile Strategies
Selecting strategies on Sharpe ratio systematically favours overfit, negatively-skewed configurations whose apparent quality is an artefact of the metric.
Optimising a strategy for Sharpe ratio is one of the most common ways to produce a backtest that looks excellent and a live system that disappoints. The objective function rewards smoothness, penalises any deviation from the mean equally in both directions, and assumes returns are approximately normal. Each of these properties is wrong in ways that systematically favour overfitted, fragile parameterisations during the search process. The result is selection bias toward strategies whose apparent quality is an artefact of the metric itself.
The metric punishes the wrong kind of variance
The Sharpe ratio treats upside volatility identically to downside volatility. A strategy that occasionally produces large positive returns is penalised in the denominator the same as one that occasionally blows up. When you grid-search parameters, the optimiser learns to suppress all variance — including the right-tail variance that generates most of the long-run edge in trend-following, breakout, and convexity-seeking systems.
Because Var[r] symmetrically sums squared deviations, the gradient of Sharpe with respect to parameters pushes the search toward configurations that clip both tails. In practice this means tighter stops, smaller positions during regime transitions, and faster exits on winners. Each of these reduces measured volatility while quietly reducing expectancy.
Why higher-Sharpe configurations are usually overfit
For any reasonable parameter grid, the configuration with the highest in-sample Sharpe is almost always the one that happened to avoid the largest drawdowns of the sample. This is not skill — it is the parameter set that best memorised the location of historical adverse events. The smoother the equity curve looks, the more the parameters have absorbed sample-specific noise.
This effect compounds with the number of parameters searched. The expected maximum Sharpe across N independent configurations grows roughly with sqrt(2 ln N), even when no configuration has any real edge. Selecting on Sharpe therefore guarantees a positive deflation between in-sample and out-of-sample performance, and the deflation is largest for the apparent winner.
The normality assumption breaks where it matters
Sharpe ratio is only a sufficient statistic for risk-adjusted return when returns are normally distributed. Real strategy returns are not. They exhibit skew, excess kurtosis, autocorrelation in volatility, and regime dependence. A strategy with Sharpe 2.0 and skew of -3 is not the same risk profile as a strategy with Sharpe 1.2 and skew of +1, but the metric reports the first as objectively better.
Negative-skew strategies — short volatility, mean reversion in liquid instruments, carry trades — are systematically advantaged by Sharpe-based selection. They produce the steady drip of small wins that the variance term rewards, while concealing the magnitude of the eventual loss in the left tail that no in-sample window may contain. The 2007 quant crisis, the 2018 XIV unwind, and countless smaller blow-ups share this signature.
What to optimise instead
The correct objective depends on the strategy family, but the general principle is to use metrics that are either robust to tail behaviour or that explicitly model it. Probabilistic Sharpe ratio adjusts for sample size and higher moments. Deflated Sharpe ratio further corrects for the number of trials. For convexity-seeking strategies, optimising on Calmar (return over maximum drawdown) or on the lower partial moment preserves the upside variance that Sharpe destroys.
The expression above incorporates skewness γ3 and kurtosis γ4 directly, so a strategy cannot earn a high score by hiding fat tails. When the optimiser is forced to account for the shape of the distribution it produces, the selected parameterisations tend to look less impressive on a tearsheet and degrade less out of sample. This is the trade you want.
Practical discipline
Treat Sharpe as a descriptive statistic, not an objective function. Use it to summarise a strategy after selection, not to drive selection itself. When running parameter sweeps in Kestrel Signal, define the objective in terms of out-of-sample robustness: median performance across walk-forward folds, deflated Sharpe across the search space, or a multi-metric Pareto frontier that exposes the tradeoffs between smoothness and tail behaviour explicitly.
The strategies that survive contact with live markets are rarely the ones with the highest backtested Sharpe. They are the ones whose selection process did not actively reward the appearance of safety. Optimising for the right thing is harder, produces worse-looking equity curves in research, and is the difference between a system that compounds and one that quietly bleeds the edge that the metric pretended was there.