Shap

Automate and integrate SHAP for explainable AI and machine learning model insights

Source: K-Dense-AI/claude-scientific-skills

SHAP is a community skill for model interpretability using SHAP values, covering feature importance, prediction explanations, interaction effects, model comparison, and visualization of machine learning model behavior.

What Is This?

Overview

SHAP provides tools for explaining machine learning model predictions using Shapley values from cooperative game theory. It covers feature importance that ranks input variables by their average impact on model output, prediction explanations that show how each feature pushes individual predictions above or below the baseline, interaction effects that reveal how pairs of features combine to affect predictions, model comparison that evaluates multiple models through consistent explanation metrics, and visualization that creates summary plots, waterfall charts, and dependence graphs. The skill helps data scientists understand model decisions.

Who Should Use This

This skill serves data scientists debugging model behavior, ML engineers validating model fairness, and analysts communicating prediction logic to stakeholders.

Why Use It?

Problems It Solves

Complex models like gradient boosting and neural networks produce predictions without transparent reasoning. Feature importance from tree-based models can be inconsistent across methods. Stakeholders require explanations for individual predictions to trust model decisions. Debugging model failures requires understanding which features drive incorrect predictions.

Core Highlights

Feature ranker computes global importance from Shapley values. Prediction explainer shows per-feature contributions for each output. Interaction detector reveals feature pair effects on predictions. Visualization builder creates summary and waterfall explanation plots.

How to Use It?

Basic Usage

import shap
import numpy as np
from sklearn.ensemble import (
  GradientBoostingClassifier)
from sklearn.datasets import (
  load_breast_cancer)
from sklearn.model_selection\
  import train_test_split

X, y = load_breast_cancer(
  return_X_y=True)
X_train, X_test, y_train,\
  y_test = train_test_split(
    X, y, test_size=0.2)

model = (
  GradientBoostingClassifier(
    n_estimators=100))
model.fit(
  X_train, y_train)

explainer = (
  shap.TreeExplainer(
    model))
shap_values = (
  explainer.shap_values(
    X_test))

importance = np.abs(
  shap_values).mean(
    axis=0)
top_features = (
  np.argsort(
    importance)[::-1][:5])
names = load_breast_cancer(
  ).feature_names
for idx in top_features:
  print(
    f'{names[idx]}: '
    f'{importance[idx]:.4f}')

Real-World Examples

import shap
import numpy as np

class ModelExplainer:
  def __init__(
    self, model,
    X_train
  ):
    self.explainer = (
      shap.TreeExplainer(
        model))
    self.bg = X_train

  def explain_instance(
    self,
    instance,
    feature_names: list
  ) -> dict:
    sv = self.explainer\
      .shap_values(
        instance\
          .reshape(1, -1))
    contributions = {}
    for i, name in (
      enumerate(
        feature_names)
    ):
      contributions[
        name] = float(
          sv[0][i])
    sorted_c = dict(
      sorted(
        contributions
        .items(),
        key=lambda x:
          abs(x[1]),
        reverse=True))
    return sorted_c

  def global_importance(
    self,
    X_test,
    feature_names: list
  ) -> list:
    sv = self.explainer\
      .shap_values(X_test)
    imp = np.abs(sv).mean(
      axis=0)
    return sorted(
      zip(feature_names,
        imp),
      key=lambda x: x[1],
      reverse=True)

exp = ModelExplainer(
  model, X_train)
contrib = (
  exp.explain_instance(
    X_test[0], names))
for feat, val in list(
  contrib.items())[:5]:
  print(
    f'{feat}: '
    f'{val:+.4f}')

Advanced Tips

Use TreeExplainer for tree-based models and KernelExplainer for model-agnostic explanations with different speed and accuracy tradeoffs. Compute SHAP interaction values to detect feature pairs that have non-additive effects. Save computed SHAP values to avoid recomputation during iterative analysis.

When to Use It?

Use Cases

Explain why a credit scoring model approved or denied a specific application. Identify the most influential features driving a churn prediction model globally. Compare feature importance between two model versions to validate that a retrained model uses sensible patterns.

Important Notes

Requirements

SHAP Python package installed with numpy and matplotlib for visualization. A trained machine learning model compatible with SHAP explainer types. Training data or background dataset for computing baseline expected values and feature names for readable output.

Usage Recommendations

Do: use the explainer type matched to your model for accurate and fast computation. Visualize SHAP values with summary plots for global understanding and waterfall plots for individual predictions. Validate explanations against domain knowledge to catch model issues.

Don't: use KernelExplainer on large datasets without sampling since computation scales quadratically. Interpret SHAP values as causal effects since they measure prediction contribution rather than causation. Rely on SHAP alone for model validation without checking standard accuracy metrics.

Limitations

KernelExplainer is computationally expensive for high-dimensional data and large sample sizes. SHAP values explain model behavior but do not indicate whether the model is correct. Interaction values add significant computation time and may not be feasible for all model types.

More Skills You Might Like

Explore similar skills to enhance your workflow