Model performance

Kolmogorov-Smirnov (KS) statistic

KS

Definition

The maximum vertical distance between two empirical cumulative distribution functions; in credit modeling, the maximum separation between the cumulative score distributions of good (non-default) and bad (default) borrowers, used as a measure of how well a model discriminates between the two populations.

Formally, the two-sample KS statistic is the supremum of the absolute difference between the empirical CDFs of the two populations[1]Jump to source 1 in the sources list[2]Jump to source 2 in the sources list:

KS = supx |Fbad(x) − Fgood(x)|

Fbad(x) = empirical CDF of model scores for defaulters

Fgood(x) = empirical CDF of model scores for non-defaulters

A KS of 0 means the two score distributions are identical — the model has no ability to distinguish good borrowers from bad. A KS of 1 means the two distributions are perfectly separated — every bad scores lower than every good. Real credit and fraud models fall somewhere in between, and the KS value summarizes how much daylight a model puts between the two populations. The statistic was introduced by Kolmogorov (1933)[3]Jump to source 3 in the sources list and tabulated in two-sample form by Smirnov (1948)[4]Jump to source 4 in the sources list, with Massey (1951) bringing it into mainstream applied use[5]Jump to source 5 in the sources list.

1.00Model score (low → high)Cumulative %KSBadGood
KS statistic = maximum vertical distance between the cumulative score distributions of good and bad borrowers.

Practitioners read the KS at the score cutoff where the curves are furthest apart — equivalently, the cutoff that maximizes the Youden index J = sensitivity + specificity − 1[9]Jump to source 9 in the sources list. Geometrically, KS equals the maximum vertical distance from the ROC curve to the diagonal, which makes it directly related to AUC and the Gini coefficient: all three are rank-only discrimination measures derivable from the same score ranking.

Interpretation bands (consumer credit)

The bands below are practitioner heuristics widely used across North American consumer-credit practice; they are not a published standard. Concrete acceptance thresholds vary by portfolio, segment, and bad-rate (Hand & Henley 1997[8]Jump to source 8 in the sources list; Siddiqi 2017[10]Jump to source 10 in the sources list). KS values are commonly reported as percentages (0–100) or as decimals (0–1); the bands below assume the percentage convention.

KS rangeTypical interpretation
0 – 20Weak; little practical discrimination
20 – 40Acceptable for consumer credit underwriting
40 – 50Strong; competitive for prime portfolios
50 +Very strong; common in fraud and well-tuned credit models

Fraud models often score higher because event labels are typically more sharply separable than default labels — a qualitative observation, not a published constant.

KS has two well-known limitations. First, it summarizes the model at a single point on the score distribution — the maximum-separation cutoff — and therefore ignores performance everywhere else; this is the same critique Hand (2009) levies against single-number classifier summaries more broadly[6]Jump to source 6 in the sources list. The goodness-of-fit literature has long noted that KS is less powerful than the Anderson–Darling or Cramér–von Mises statistic, which weight the full distribution rather than a single supremum[7]Jump to source 7 in the sources list. Second, a high KS does not imply good calibration; a model can rank borrowers correctly while assigning poorly-calibrated probabilities. For these reasons, KS is typically reported alongside the AUC, the Gini coefficient, and a calibration measure such as Brier score, expected calibration error, or a reliability diagram.

Sources

  1. [1]1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test NIST/SEMATECH e-Handbook of Statistical Methods, current (retrieved 2026-05-15)
    D measures the largest distance between the empirical distribution function F_data(x) and the theoretical function F_0(x), measured in a vertical direction.
  2. [2]Kolmogorov-Smirnov 2-Sample Goodness of Fit Test NIST/SEMATECH DataPlot Reference Manual, current (retrieved 2026-05-15)
  3. [3]Sulla determinazione empirica di una legge di distribuzione Kolmogorov, A.N. — Giornale dell'Istituto Italiano degli Attuari 4, 1933 (retrieved 2026-05-15)
  4. [4]Table for Estimating the Goodness of Fit of Empirical Distributions Smirnov, N.V. — Annals of Mathematical Statistics 19(2), June 1948 (retrieved 2026-05-15)
  5. [5]The Kolmogorov-Smirnov Test for Goodness of Fit Massey, F.J. Jr. — Journal of the American Statistical Association 46(253), March 1951 (retrieved 2026-05-15)
  6. [6]Measuring classifier performance: a coherent alternative to the area under the ROC curve Hand, D.J. — Machine Learning 77(1), 2009 (retrieved 2026-05-15)
  7. [7]EDF Statistics for Goodness of Fit and Some Comparisons Stephens, M.A. — JASA 69(347), 1974 (retrieved 2026-05-15)
  8. [8]Statistical Classification Methods in Consumer Credit Scoring: A Review Hand, D.J. & Henley, W.E. — JRSS Series A 160(3), 1997 (retrieved 2026-05-15)
  9. [9]On the equivalence between Kolmogorov-Smirnov and ROC curve metrics Adeodato, P.J.L. & Melo, S.B.M. — arXiv:1606.00496, 2016 (retrieved 2026-05-15)
  10. [10]Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards (2nd ed.) Siddiqi, N. — Wiley/SAS, 2017 (retrieved 2026-05-15)