Feature importance is a global summary: for each feature, a single number describing how much that feature contributes to the model’s predictions across the entire dataset. The most common implementations are tree-based — gain (and total gain) and split-count importance for gradient-boosted trees and random forests[5]Jump to source 5 in the sources list, permutation importance from random forests[4]Jump to source 4 in the sources list, and model-agnostic permutation methods. These metrics carry known biases: impurity-based importance inflates high-cardinality and continuous features[6]Jump to source 6 in the sources list, and permutation importance can be distorted by feature correlation because permuted features extrapolate into off-manifold regions[8]Jump to source 8 in the sources list. Feature importance is useful for understanding which inputs the model relies on overall, for pruning low-signal features, and for sanity-checking that domain-intuitive features appear at the top of the list.
SHAP values work at the level of an individual prediction. For a single applicant, the SHAP value of each feature is the average contribution of that feature to the prediction across all possible orderings in which features could enter the model — a result drawn from cooperative game theory (the Shapley value[2]Jump to source 2 in the sources list). Lundberg & Lee (2017) proved SHAP is the unique additive attribution method satisfying three desirable axioms: local accuracy (the sum of SHAP values plus the baseline equals the prediction), missingness, and consistency[1]Jump to source 1 in the sources list. The local-accuracy property is what makes SHAP the standard input to adverse-action reason-code generation: each denied application decomposes additively into a per-feature attribution that can be mapped to ECOA-compliant reason strings[10]Jump to source 10 in the sources list. One important caveat: Kernel SHAP’s exact theoretical guarantees assume feature independence; correlated features produce attributions that may be misleading without a conditional-expectation formulation[7]Jump to source 7 in the sources list. TreeSHAP, the polynomial-time exact algorithm for tree ensembles, handles dependence via the path-aware computation[3]Jump to source 3 in the sources list.
Side-by-side
| Feature importance | SHAP values | |
|---|---|---|
| Granularity | Global (one number per feature) | Local (one number per feature per prediction) |
| Method dependence | Tied to the model class (gain, splits, permutation) | Model-agnostic in principle; tree-optimized in practice (TreeSHAP) |
| Additivity | Not additive | Additive: sum + baseline equals the prediction (local accuracy) |
| Known biases / caveats | Impurity importance inflates high-cardinality features; permutation distorts under correlation | Kernel SHAP assumes feature independence; correlated features bias attributions |
| Best use | Model debugging, feature selection, sanity-checking what the model relies on | Per-decision explanations, adverse-action reason codes, audit trails |
The two views are complementary, not competing. A model build typically reports both: feature importance to characterize the overall structure of the model, and SHAP values for the per-prediction explanations that model risk teams, adverse-action generators, and fair-lending audits need. Aggregating SHAP values across the population — typically as the mean absolute SHAP value per feature — produces a third, related view sometimes called “SHAP feature importance,” which is useful for global interpretation while remaining mathematically consistent with the per-prediction attributions[3]Jump to source 3 in the sources list[9]Jump to source 9 in the sources list.
Sources
- [1]A Unified Approach to Interpreting Model Predictions — Lundberg & Lee — NeurIPS 2017, December 2017 (retrieved 2026-05-15)
“Local accuracy: f(x) = φ₀ + Σᵢ₌₁ᴹ φᵢ — the model output equals the baseline plus the sum of feature SHAP values.”
- [2]A Value for n-Person Games — Shapley, L.S. — RAND P-295 / Contributions to the Theory of Games II (Princeton University Press, 1953), 1953 (retrieved 2026-05-15)
- [3]From Local Explanations to Global Understanding with Explainable AI for Trees — Lundberg, Erion, Chen et al. — Nature Machine Intelligence 2, January 2020 (retrieved 2026-05-15)
- [4]Random Forests — Breiman, L. — Machine Learning 45, October 2001 (retrieved 2026-05-15)
- [5]Greedy Function Approximation: A Gradient Boosting Machine — Friedman, J.H. — Annals of Statistics 29(5), October 2001 (retrieved 2026-05-15)
- [6]Bias in random forest variable importance measures: Illustrations, sources and a solution — Strobl, Boulesteix, Zeileis & Hothorn — BMC Bioinformatics 8:25, January 2007 (retrieved 2026-05-15)
“The Gini importance is most strongly biased… variables with more categories are obviously preferred.”
- [7]Explaining individual predictions when features are dependent: More accurate approximations to Shapley values — Aas, Jullum & Løland — Artificial Intelligence 298, September 2021 (retrieved 2026-05-15)
- [8]Please Stop Permuting Features: An Explanation and Alternatives — Hooker & Mentch — arXiv:1905.03151, 2019 (retrieved 2026-05-15)
- [9]Interpretable Machine Learning (2nd ed.) — Permutation Feature Importance and SHAP chapters — Molnar, C. — open textbook, 2022 (retrieved 2026-05-15)
- [10]CFPB Circular 2022-03: Adverse action notification requirements in connection with credit decisions based on complex algorithms — Consumer Financial Protection Bureau, May 26, 2022 (retrieved 2026-05-15)