Feature importance vs. SHAP values

Feature importance is a global summary: for each feature, a single number describing how much that feature contributes to the model’s predictions across the entire dataset. The most common implementations are tree-based — gain (and total gain) and split-count importance for gradient-boosted trees and random forests^{[5]Jump to source 5 in the sources list}, permutation importance from random forests^{[4]Jump to source 4 in the sources list}, and model-agnostic permutation methods. These metrics carry known biases: impurity-based importance inflates high-cardinality and continuous features^{[6]Jump to source 6 in the sources list}, and permutation importance can be distorted by feature correlation because permuted features extrapolate into off-manifold regions^{[8]Jump to source 8 in the sources list}. Feature importance is useful for understanding which inputs the model relies on overall, for pruning low-signal features, and for sanity-checking that domain-intuitive features appear at the top of the list.

SHAP values work at the level of an individual prediction. For a single applicant, the SHAP value of each feature is the average contribution of that feature to the prediction across all possible orderings in which features could enter the model — a result drawn from cooperative game theory (the Shapley value^{[2]Jump to source 2 in the sources list}). Lundberg & Lee (2017) proved SHAP is the unique additive attribution method satisfying three desirable axioms: local accuracy (the sum of SHAP values plus the baseline equals the prediction), missingness, and consistency^{[1]Jump to source 1 in the sources list}. The local-accuracy property is what makes SHAP the standard input to adverse-action reason-code generation: each denied application decomposes additively into a per-feature attribution that can be mapped to ECOA-compliant reason strings^{[10]Jump to source 10 in the sources list}. One important caveat: Kernel SHAP’s exact theoretical guarantees assume feature independence; correlated features produce attributions that may be misleading without a conditional-expectation formulation^{[7]Jump to source 7 in the sources list}. TreeSHAP, the polynomial-time exact algorithm for tree ensembles, handles dependence via the path-aware computation^{[3]Jump to source 3 in the sources list}.

Side-by-side

	Feature importance	SHAP values
Granularity	Global (one number per feature)	Local (one number per feature per prediction)
Method dependence	Tied to the model class (gain, splits, permutation)	Model-agnostic in principle; tree-optimized in practice (TreeSHAP)
Additivity	Not additive	Additive: sum + baseline equals the prediction (local accuracy)
Known biases / caveats	Impurity importance inflates high-cardinality features; permutation distorts under correlation	Kernel SHAP assumes feature independence; correlated features bias attributions
Best use	Model debugging, feature selection, sanity-checking what the model relies on	Per-decision explanations, adverse-action reason codes, audit trails

The two views are complementary, not competing. A model build typically reports both: feature importance to characterize the overall structure of the model, and SHAP values for the per-prediction explanations that model risk teams, adverse-action generators, and fair-lending audits need. Aggregating SHAP values across the population — typically as the mean absolute SHAP value per feature — produces a third, related view sometimes called “SHAP feature importance,” which is useful for global interpretation while remaining mathematically consistent with the per-prediction attributions^{[3]Jump to source 3 in the sources list}^{[9]Jump to source 9 in the sources list}.

Sources

[1]A Unified Approach to Interpreting Model Predictions — Lundberg & Lee — NeurIPS 2017, December 2017 (retrieved 2026-05-15)
“Local accuracy: f(x) = φ₀ + Σᵢ₌₁ᴹ φᵢ — the model output equals the baseline plus the sum of feature SHAP values.”
[2]A Value for n-Person Games — Shapley, L.S. — RAND P-295 / Contributions to the Theory of Games II (Princeton University Press, 1953), 1953 (retrieved 2026-05-15)
[3]From Local Explanations to Global Understanding with Explainable AI for Trees — Lundberg, Erion, Chen et al. — Nature Machine Intelligence 2, January 2020 (retrieved 2026-05-15)
[4]Random Forests — Breiman, L. — Machine Learning 45, October 2001 (retrieved 2026-05-15)
[5]Greedy Function Approximation: A Gradient Boosting Machine — Friedman, J.H. — Annals of Statistics 29(5), October 2001 (retrieved 2026-05-15)
[6]Bias in random forest variable importance measures: Illustrations, sources and a solution — Strobl, Boulesteix, Zeileis & Hothorn — BMC Bioinformatics 8:25, January 2007 (retrieved 2026-05-15)
“The Gini importance is most strongly biased… variables with more categories are obviously preferred.”
[7]Explaining individual predictions when features are dependent: More accurate approximations to Shapley values — Aas, Jullum & Løland — Artificial Intelligence 298, September 2021 (retrieved 2026-05-15)
[8]Please Stop Permuting Features: An Explanation and Alternatives — Hooker & Mentch — arXiv:1905.03151, 2019 (retrieved 2026-05-15)
[9]Interpretable Machine Learning (2nd ed.) — Permutation Feature Importance and SHAP chapters — Molnar, C. — open textbook, 2022 (retrieved 2026-05-15)
[10]CFPB Circular 2022-03: Adverse action notification requirements in connection with credit decisions based on complex algorithms — Consumer Financial Protection Bureau, May 26, 2022 (retrieved 2026-05-15)