Population stability index

PSI compares the distribution of values across a set of bins — most commonly the model’s output score, but also any individual feature — between two snapshots of a population. A typical pairing is the training-time distribution against the most recent month of production scoring, with the score range partitioned into ten equal deciles^{[5]Jump to source 5 in the sources list}. The statistic accumulates a positive contribution from every bin where the population has shifted, regardless of direction, so a small drift across many bins and a large drift in one bin can produce comparable scores. PSI is sensitive to bin choice and to empty bins (which make ln(0) undefined and are typically handled with a small additive smoothing constant).

PSI = ∑_i (c_i − b_i) × ln(c_i / b_i)

c_i = percent of current population in bin i

b_i = percent of baseline population in bin i

Sum runs over all n bins (typically score deciles).

0–100

100–200

200–300

300–400

400–500

500–600

600–700

700–800

Baseline Current

PSI measures how far the current population’s score distribution has drifted from the baseline used at model build.

Mathematical foundation

PSI is mathematically identical to the Jeffreys divergence — the symmetrized Kullback–Leibler divergence^{[6]Jump to source 6 in the sources list} J(p, p₀) = D_KL(p ∥ p₀) + D_KL(p₀ ∥ p)^{[7]Jump to source 7 in the sources list} — applied to histogram bins. The symmetry is what gives PSI its direction-invariance: swapping the baseline and current populations yields the same value, in contrast to raw KL divergence^{[1]Jump to source 1 in the sources list}.

Interpretation thresholds

The widely-used 0.10 / 0.25 thresholds were originally proposed by Lewis (1994) in An Introduction to Credit Scoring^{[3]Jump to source 3 in the sources list} and propagated through Karakoulas (2004)^{[4]Jump to source 4 in the sources list} and Siddiqi (2017)^{[5]Jump to source 5 in the sources list}. They are practitioner conventions, not derived from a formal statistical test. Yurdakul & Naranjo (2020) demonstrated the thresholds are sample-size and bin-count dependent: the standard cutoffs over-trigger refresh on small samples and under-trigger on large production volumes^{[2]Jump to source 2 in the sources list}. For high-volume monitoring, treat them as guidelines rather than tests.

PSI value	Action
< 0.10	No significant population change — no action
0.10 – 0.25	Moderate shift — investigate cause; closer monitoring
≥ 0.25	Material shift — evaluate model refresh

Production monitoring typically computes PSI at two levels: on the model’s output score (population-level PSI, or what Siddiqi calls the “system stability report”), and on every top-importance feature individually. Per-feature PSI is sometimes called the Characteristic Stability Index (CSI) in scorecard literature^{[5]Jump to source 5 in the sources list}. A breach on a single high-importance feature is often a leading indicator of approaching refresh, since input drift can be measured immediately while outcome metrics depend on label realization (which lags by the loan’s vintage horizon). PSI monitoring is part of the ongoing-monitoring expectation under the SR 11-7 model risk framework^{[10]Jump to source 10 in the sources list}.

Sources

[1]Statistical Properties of Population Stability Index — Yurdakul, B. — PhD dissertation, Western Michigan University, April 2018 (retrieved 2026-05-15)
[2]Statistical properties of the population stability index — Yurdakul, B. & Naranjo, J. — Journal of Risk Model Validation 14(4), December 2020 (retrieved 2026-05-15)
“PSI > 0.25 seems reasonable for sample sizes between 100 and 200, but it is too conservative for larger sample sizes.”
[3]An Introduction to Credit Scoring (2nd ed.) — Lewis, E.M. — Athena Press, San Rafael CA, 1994 (retrieved 2026-05-15)
[4]Empirical Validation of Retail Credit-Scoring Models — Karakoulas, G. — The RMA Journal 87, September 2004 (retrieved 2026-05-15)
[5]Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards (2nd ed.) — Siddiqi, N. — Wiley/SAS, 2017 (retrieved 2026-05-15)
[6]On Information and Sufficiency — Kullback, S. & Leibler, R.A. — Annals of Mathematical Statistics 22(1), March 1951 (retrieved 2026-05-15)
[7]An invariant form for the prior probability in estimation problems — Jeffreys, H. — Proc. R. Soc. London A 186(1007), September 1946 (retrieved 2026-05-15)
[8]A critical review of existing and new population stability testing procedures in credit risk scoring — du Pisanie, Allison, Visagie & Budde — arXiv:2303.01227, March 2023 (retrieved 2026-05-15)
[9]Open Risk Manual — Portfolio Stability Index — Open Risk Manual, current (retrieved 2026-05-15)
[10]SR 11-7: Guidance on Model Risk Management (ongoing monitoring framework) — Board of Governors of the Federal Reserve System, April 4, 2011 (retrieved 2026-05-15)