Metrics4 min read

The Calmar Ratio

The Calmar ratio measures the compound annual return of a strategy relative to its worst drawdown. It is a tail-aware performance metric — unlike the Sharpe ratio, it cares about how deep the equity curve fell, not how much it wiggled.

Traders who survive long enough to compound returns care about maximum drawdown because that is the experience that ends careers and triggers redemptions. The Calmar ratio condenses that survival concern into a single number.

Calmar = CAGR / |Max Drawdown|

CAGR is the compound annual growth rate of the equity curve. Max Drawdown is the largest peak-to-trough decline observed over the evaluation window, expressed as a positive number. Both inputs are typically computed over the same trailing window — the canonical convention from Young (1991) uses 36 months.

Interpretation and typical ranges

A Calmar above 1.0 means the strategy earns more per year than its worst observed loss. Below 0.5, you are accepting drawdowns more than twice the size of your annual return — a difficult psychological and capital-allocation position. Above 3.0 is rare for live, capacity-constrained strategies and should invite suspicion rather than celebration.

For systematic equity and futures strategies, sustainable live Calmar ratios cluster between 0.5 and 1.5. Backtests routinely produce values of 3 to 10; the gap between these and live performance is one of the cleanest diagnostics of overfitting available. High-frequency and short-holding-period strategies can sustain higher Calmars due to faster mean reversion of drawdowns, but capacity falls accordingly.

The Calmar ratio is monotone in window length over long horizons: extending the evaluation period almost always uncovers a deeper drawdown, so Calmar tends to drift down as the sample grows. Comparing strategies requires aligned windows.

What the Calmar ratio does not capture

Max drawdown is a single-point statistic. It tells you the depth of the worst loss but nothing about how often deep drawdowns occur, how long recovery took, or the distribution of smaller drawdowns underneath the maximum. Two strategies with identical Calmar values can have radically different drawdown frequencies.

The metric is also estimator-unstable. Max drawdown is the extreme of a path-dependent statistic, so its sample variance is large — a single bad month can swing Calmar by 30% or more. This is the opposite failure mode of Sharpe: Sharpe is too stable because it averages, Calmar is too jumpy because it extremes.

Calmar says nothing about return skew, tail kurtosis, or the conditional distribution of losses beyond the maximum. A strategy with bounded losses by construction (long-only equity) and one with unbounded tail risk (short volatility) can show the same Calmar in-sample while having entirely different risk profiles out-of-sample.

A backtest with Calmar above 5 over a multi-year window almost always reflects look-ahead bias, survivorship bias, or parameter overfitting rather than genuine edge. Treat extreme values as a red flag for methodological review, not as a quality signal.

Finally, Calmar uses realized drawdown, not expected drawdown. The maximum observed in your sample is a lower bound on the maximum the strategy can produce — extreme value theory suggests true expected max drawdown over a longer horizon is materially deeper than the historical worst case.

How Kestrel Signal presents the Calmar ratio

Kestrel Signal reports Calmar alongside MAR (return over average drawdown) and Ulcer Performance Index in every backtest summary. Computing all three discourages the common error of optimizing toward a single drawdown statistic, since each metric weighs the drawdown distribution differently.

The platform also displays Calmar computed over rolling 36-month windows, exposing how stable the ratio is across regimes rather than collapsing performance into one point estimate. When you walk-forward a strategy in Kestrel Signal, the out-of-sample Calmar is reported separately from the in-sample value — the ratio between the two is typically more informative than either number alone.