Kolmogorov-Smirnov (KS) statistic

Formally, the two-sample KS statistic is the supremum of the absolute difference between the empirical CDFs of the two populations^{[1]Jump to source 1 in the sources list}^{[2]Jump to source 2 in the sources list}:

KS = sup_x |F_bad(x) − F_good(x)|

F_bad(x) = empirical CDF of model scores for defaulters

F_good(x) = empirical CDF of model scores for non-defaulters

A KS of 0 means the two score distributions are identical — the model has no ability to distinguish good borrowers from bad. A KS of 1 means the two distributions are perfectly separated — every bad scores lower than every good. Real credit and fraud models fall somewhere in between, and the KS value summarizes how much daylight a model puts between the two populations. The statistic was introduced by Kolmogorov (1933)^{[3]Jump to source 3 in the sources list} and tabulated in two-sample form by Smirnov (1948)^{[4]Jump to source 4 in the sources list}, with Massey (1951) bringing it into mainstream applied use^{[5]Jump to source 5 in the sources list}.

KS statistic = maximum vertical distance between the cumulative score distributions of good and bad borrowers.

Practitioners read the KS at the score cutoff where the curves are furthest apart — equivalently, the cutoff that maximizes the Youden index J = sensitivity + specificity − 1^{[9]Jump to source 9 in the sources list}. Geometrically, KS equals the maximum vertical distance from the ROC curve to the diagonal, which makes it directly related to AUC and the Gini coefficient: all three are rank-only discrimination measures derivable from the same score ranking.

Interpretation bands (consumer credit)

The bands below are practitioner heuristics widely used across North American consumer-credit practice; they are not a published standard. Concrete acceptance thresholds vary by portfolio, segment, and bad-rate (Hand & Henley 1997^{[8]Jump to source 8 in the sources list}; Siddiqi 2017^{[10]Jump to source 10 in the sources list}). KS values are commonly reported as percentages (0–100) or as decimals (0–1); the bands below assume the percentage convention.

KS range	Typical interpretation
0 – 20	Weak; little practical discrimination
20 – 40	Acceptable for consumer credit underwriting
40 – 50	Strong; competitive for prime portfolios
50 +	Very strong; common in fraud and well-tuned credit models

Fraud models often score higher because event labels are typically more sharply separable than default labels — a qualitative observation, not a published constant.

KS has two well-known limitations. First, it summarizes the model at a single point on the score distribution — the maximum-separation cutoff — and therefore ignores performance everywhere else; this is the same critique Hand (2009) levies against single-number classifier summaries more broadly^{[6]Jump to source 6 in the sources list}. The goodness-of-fit literature has long noted that KS is less powerful than the Anderson–Darling or Cramér–von Mises statistic, which weight the full distribution rather than a single supremum^{[7]Jump to source 7 in the sources list}. Second, a high KS does not imply good calibration; a model can rank borrowers correctly while assigning poorly-calibrated probabilities. For these reasons, KS is typically reported alongside the AUC, the Gini coefficient, and a calibration measure such as Brier score, expected calibration error, or a reliability diagram.

Sources

[1]1.3.5.16. Kolmogorov-Smirnov Goodness-of-Fit Test — NIST/SEMATECH e-Handbook of Statistical Methods, current (retrieved 2026-05-15)
“D measures the largest distance between the empirical distribution function F_data(x) and the theoretical function F_0(x), measured in a vertical direction.”
[2]Kolmogorov-Smirnov 2-Sample Goodness of Fit Test — NIST/SEMATECH DataPlot Reference Manual, current (retrieved 2026-05-15)
[3]Sulla determinazione empirica di una legge di distribuzione — Kolmogorov, A.N. — Giornale dell'Istituto Italiano degli Attuari 4, 1933 (retrieved 2026-05-15)
[4]Table for Estimating the Goodness of Fit of Empirical Distributions — Smirnov, N.V. — Annals of Mathematical Statistics 19(2), June 1948 (retrieved 2026-05-15)
[5]The Kolmogorov-Smirnov Test for Goodness of Fit — Massey, F.J. Jr. — Journal of the American Statistical Association 46(253), March 1951 (retrieved 2026-05-15)
[6]Measuring classifier performance: a coherent alternative to the area under the ROC curve — Hand, D.J. — Machine Learning 77(1), 2009 (retrieved 2026-05-15)
[7]EDF Statistics for Goodness of Fit and Some Comparisons — Stephens, M.A. — JASA 69(347), 1974 (retrieved 2026-05-15)
[8]Statistical Classification Methods in Consumer Credit Scoring: A Review — Hand, D.J. & Henley, W.E. — JRSS Series A 160(3), 1997 (retrieved 2026-05-15)
[9]On the equivalence between Kolmogorov-Smirnov and ROC curve metrics — Adeodato, P.J.L. & Melo, S.B.M. — arXiv:1606.00496, 2016 (retrieved 2026-05-15)
[10]Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards (2nd ed.) — Siddiqi, N. — Wiley/SAS, 2017 (retrieved 2026-05-15)