A machine learning explanability algorithm
Explainability is a hot topic in the machine learning research community these days. Over the past few years, many methods have been introduced to understand how a machine learning model makes a prediction. However, explainability is not an entirely new concept, and it was actually started a few decades ago. In this blog post, I will introduce a rather unknown but simple technique that was introduced almost 20 years ago. This technique is called Contextual Importance and Utility (CIU) for explaining ML models and show you how we can explain any types of machine learning. This method relies on the notion of context is important.
For example, imagine we are trying to predict house prices from a set of features such as the number of bedrooms and pools. If every house in the dataset has no pool (the current context), then the feature corresponding to it has no usefulness and no importance for predicting a model. On the other hand, in a city where most houses have one or two bedrooms (again the current context), houses with three or more bedrooms are more unusual.
It is a model-agnostic methods, and it can explain the output of any “black-box” machine learning model.
It produces local explanations, which means that the explanations are generated for individual instances (not the whole model), and they show which features are more important for an individual observation.
It gives us post-hoc explanations as it is a method that processes the output of a machine learning model after training.
Unlike LIME and many other techniques, CIU does not approximate or transforms what a model predicts but instead directly explain predictions. It can also provide a contrastive explanation. For instance, why did the model predict rainy and not cloudy?
CIT estimates two values that aim to explain the context in which a machine learning model predicts:
Contextual Importance (CI) measures how much change in the range and output values can be attributed to one (or several) input variables. CU is based on the notion that a variable which results in a broader ranger of output values would be more critical. Formally, CIU is defined as follows:
CI = (Cmax - Cmin)/(absmax - absmin)
Contextual Utility (CU) indicates how favorable the current value of one (or several) input variables is for a high output value. CU is computed using the following formula:
CU = (out - Cmin)/(Cmax - Cmin)
Cmax and Cmin are the highest and lowest values that the output of an ML model can take by changing the input feature(s). Obtaining Cmax and Cmin is computationally, and mathematically is not a trivial task. In the original paper, these values are computed using a Monte Carlo simulation, where a lot of observations were generated. Also, absmax and absmin indicate the absolute range of values that the output has taken. For example, In classification problems, the absolute minimum and maximum range of values are the predicted probabilities of machine learning models between 0 and 1.
CIU is implemented both in python and R. For simplicity, I will use its python implementation (py-ciu library) in this blogpost.
You can install py-ciu
using the pip command:
pip install py-ciu
I will use the breast cancer dataset in scikit-learn to show how we can use CIU. I will train three different machine learning models, including a decision tree, a random forest, and a gradient boosting algorithm on this dataset and compute CI and CU values for a single instance from the test dataset.
First, we need to load the necessary libraries and modules.
from ciu import determine_ciu
from sklearn.ensemble import GradientBoostingClassifier,RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# for reproducability
123) np.random.seed(
Then we split the dataset into a training and test set. We train our machine learning models on the training dataset and evaluate their performance on the test dataset. Note that for explaining ML models, we should use samples from the test dataset and not the training dataset.
= pd.DataFrame(load_breast_cancer()['data'])
X = load_breast_cancer()['target']
y = load_breast_cancer()['feature_names'] X.columns
= train_test_split(X,y,stratify = y) X_train,X_test, y_train,y_test
def fit_evaluate_model(clf):
= clf.fit(X_train, y_train)
clf print(' Accuracy on test dataset {}'.format(clf.score(X_test,y_test)))
return clf
As mentioned before, CIU only generates local explanations and does not give us a global overview of how a model makes a prediction. To gain a better understanding of the global importance of the model, we can compute the permutation feature importance scores:
def print_permutation_importance(model):
= []
imp_features = permutation_importance(model, X_test, y_test,
pi =30,
random_statefor i in pi.importances_mean.argsort()[::-1]:
if pi.importances_mean[i] - 2 * pi.importances_std[i] > 0:
print(f"{X_test.columns[i]:<8} "
f"{pi.importances_mean[i]:.3f} "
f" +/- {pi.importances_std[i]:.3f}")
imp_features.append(pi.importances_mean[i])if len(imp_features) == 0:
print('no important features')
Since we just used a toy example, I will not be very picky about my model’s hyper-parameters and leave them to be the default values in sklearn.
= DecisionTreeClassifier()
dt = fit_evaluate_model(dt) dt_fit
Accuracy on test dataset 0.9370629370629371
worst perimeter 0.173 +/- 0.019
worst concave points 0.145 +/- 0.023
worst concavity 0.135 +/- 0.017
worst area 0.063 +/- 0.014
radius error 0.036 +/- 0.014
worst smoothness 0.018 +/- 0.008
mean area 0.017 +/- 0.006
= RandomForestClassifier(
)= fit_evaluate_model(rf) rf_fit
Accuracy on test dataset 0.972027972027972
worst texture 0.023 +/- 0.004
mean texture 0.013 +/- 0.006
worst smoothness 0.010 +/- 0.004
mean concavity 0.010 +/- 0.005
worst fractal dimension 0.006 +/- 0.003
= GradientBoostingClassifier()
gb = fit_evaluate_model(gb) gb_fit
Accuracy on test dataset 0.9790209790209791
worst concave points 0.023 +/- 0.011
mean concave points 0.021 +/- 0.010
The random forest and gradient boosting classifiers have the same accuracy score; however, their most important features are different.
Now let us explain how each model predicts a single example (observation) from the test dataset.
= X_test.iloc[1,:]
example = gb.predict(example.values.reshape(1, -1))
example_prediction = gb.predict_proba(example.values.reshape(1, -1))
example_prediction_prob = 0 if example_prediction > 0.5 else 1
prediction_index print(f'Prediction {example_prediction}; Probability: {example_prediction_prob}')
Prediction [1]; Probability: [[0.10952357 0.89047643]]
To obtain a CIU score, we need to compute the minimum and maximum observed value of each feature in the dataset.
def min_max_features(X_train):
= dict()
min_max for i in range(len(X_train.columns)):
min_max[X_train.columns[i]] return min_max
= min_max_features(X_train) min_max
def explain_ciu(example,model):
= determine_ciu(
)return ciu
= explain_ciu(example,dt_fit)
dt_ciu = explain_ciu(example,rf_fit)
rf_ciu = explain_ciu(example,gb_fit) gb_ciu
We can obtain a textual explanation of CIU which indicates which feature(s) can be important for our test example
['The feature "mean radius", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean texture", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean perimeter", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean area", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean smoothness", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean compactness", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean concavity", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean concave points", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean symmetry", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean fractal dimension", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "radius error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "texture error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "perimeter error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "area error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "smoothness error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "compactness error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "concavity error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "concave points error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "symmetry error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "fractal dimension error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst radius", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst texture", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst perimeter", which is highly important (CI=100.0%), is very typical for its class (CU=100.0%).', 'The feature "worst area", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst smoothness", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst compactness", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst concavity", which is highly important (CI=100.0%), is very typical for its class (CU=100.0%).', 'The feature "worst concave points", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst symmetry", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst fractal dimension", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).']
['The feature "mean radius", which is important (CI=32.26%), is very typical for its class (CU=90.0%).', 'The feature "mean texture", which is important (CI=35.48%), is unlikely for its class (CU=27.27%).', 'The feature "mean perimeter", which is not important (CI=12.9%), is typical for its class (CU=50.0%).', 'The feature "mean area", which is not important (CI=16.13%), is unlikely for its class (CU=40.0%).', 'The feature "mean smoothness", which is not important (CI=12.9%), is typical for its class (CU=50.0%).', 'The feature "mean compactness", which is not important (CI=9.68%), is unlikely for its class (CU=33.33%).', 'The feature "mean concavity", which is not important (CI=16.13%), is not typical for its class (CU=20.0%).', 'The feature "mean concave points", which is not important (CI=19.35%), is not typical for its class (CU=16.67%).', 'The feature "mean symmetry", which is important (CI=38.71%), is very typical for its class (CU=100.0%).', 'The feature "mean fractal dimension", which is not important (CI=6.45%), is not typical for its class (CU=0.1%).', 'The feature "radius error", which is not important (CI=22.58%), is typical for its class (CU=71.43%).', 'The feature "texture error", which is not important (CI=22.58%), is very typical for its class (CU=85.71%).', 'The feature "perimeter error", which is not important (CI=22.58%), is unlikely for its class (CU=42.86%).', 'The feature "area error", which is important (CI=38.71%), is unlikely for its class (CU=33.33%).', 'The feature "smoothness error", which is not important (CI=3.23%), is very typical for its class (CU=100.0%).', 'The feature "compactness error", which is not important (CI=12.9%), is typical for its class (CU=50.0%).', 'The feature "concavity error", which is not important (CI=6.45%), is very typical for its class (CU=100.0%).', 'The feature "concave points error", which is not important (CI=9.68%), is typical for its class (CU=66.67%).', 'The feature "symmetry error", which is not important (CI=6.45%), is typical for its class (CU=50.0%).', 'The feature "fractal dimension error", which is not important (CI=16.13%), is very typical for its class (CU=100.0%).', 'The feature "worst radius", which is very important (CI=51.61%), is very typical for its class (CU=87.5%).', 'The feature "worst texture", which is very important (CI=67.74%), is unlikely for its class (CU=33.33%).', 'The feature "worst perimeter", which is very important (CI=70.97%), is typical for its class (CU=63.64%).', 'The feature "worst area", which is very important (CI=61.29%), is typical for its class (CU=57.89%).', 'The feature "worst smoothness", which is not important (CI=6.45%), is typical for its class (CU=50.0%).', 'The feature "worst compactness", which is not important (CI=9.68%), is unlikely for its class (CU=33.33%).', 'The feature "worst concavity", which is very important (CI=64.52%), is very typical for its class (CU=85.0%).', 'The feature "worst concave points", which is important (CI=38.71%), is not typical for its class (CU=16.67%).', 'The feature "worst symmetry", which is important (CI=25.81%), is typical for its class (CU=50.0%).', 'The feature "worst fractal dimension", which is not important (CI=3.23%), is not typical for its class (CU=0.1%).']
['The feature "mean radius", which is not important (CI=16.49%), is not typical for its class (CU=0.65%).', 'The feature "mean texture", which is highly important (CI=90.14%), is not typical for its class (CU=3.76%).', 'The feature "mean perimeter", which is not important (CI=2.63%), is not typical for its class (CU=0.1%).', 'The feature "mean area", which is not important (CI=3.36%), is very typical for its class (CU=100.0%).', 'The feature "mean smoothness", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "mean compactness", which is important (CI=37.26%), is not typical for its class (CU=0.1%).', 'The feature "mean concavity", which is not important (CI=4.0%), is not typical for its class (CU=8.92%).', 'The feature "mean concave points", which is important (CI=38.25%), is not typical for its class (CU=3.57%).', 'The feature "mean symmetry", which is not important (CI=8.91%), is very typical for its class (CU=100.0%).', 'The feature "mean fractal dimension", which is not important (CI=1.54%), is not typical for its class (CU=0.1%).', 'The feature "radius error", which is not important (CI=9.35%), is not typical for its class (CU=0.1%).', 'The feature "texture error", which is not important (CI=6.53%), is very typical for its class (CU=100.0%).', 'The feature "perimeter error", which is not important (CI=1.48%), is not typical for its class (CU=0.1%).', 'The feature "area error", which is very important (CI=57.97%), is not typical for its class (CU=0.1%).', 'The feature "smoothness error", which is not important (CI=16.51%), is not typical for its class (CU=0.1%).', 'The feature "compactness error", which is not important (CI=4.39%), is not typical for its class (CU=0.1%).', 'The feature "concavity error", which is not important (CI=4.03%), is not typical for its class (CU=0.1%).', 'The feature "concave points error", which is not important (CI=5.76%), is very typical for its class (CU=100.0%).', 'The feature "symmetry error", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "fractal dimension error", which is not important (CI=21.47%), is not typical for its class (CU=17.33%).', 'The feature "worst radius", which is not important (CI=1.27%), is very typical for its class (CU=100.0%).', 'The feature "worst texture", which is very important (CI=60.61%), is not typical for its class (CU=13.75%).', 'The feature "worst perimeter", which is important (CI=41.37%), is not typical for its class (CU=23.17%).', 'The feature "worst area", which is not important (CI=19.51%), is typical for its class (CU=67.91%).', 'The feature "worst smoothness", which is not important (CI=18.24%), is unlikely for its class (CU=48.97%).', 'The feature "worst compactness", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).', 'The feature "worst concavity", which is not important (CI=10.79%), is very typical for its class (CU=100.0%).', 'The feature "worst concave points", which is important (CI=42.94%), is not typical for its class (CU=4.32%).', 'The feature "worst symmetry", which is not important (CI=5.86%), is not typical for its class (CU=0.1%).', 'The feature "worst fractal dimension", which is not important (CI=0.0%), is not typical for its class (CU=0.1%).']
Although CIU is a brilliant and simple technique, I believe it has the following drawbacks:
In regression problems, the range of possible values for the target variable can be infinite, which somehow does not make sense when we want to compute CIU. The authors said that they had put a limit on the range of values.
Computing the range of values can be a little bit misleading, especially when we have outliers in the dataset.
It is not clear how we can get a global explanation for the model using CIU.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/mcnakhaee, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Nakhaee (2021, Nov. 1). Muhammad Nakhaee: Explaining Machine Learning Models Using Contextual Importance and Contextual Utility. Retrieved from https://mcnakhaee.com/posts/2021-05-22-contextual-importance-and-contextual-utility/
BibTeX citation
@misc{nakhaee2021explaining, author = {Nakhaee, Muhammad Chenariyan}, title = {Muhammad Nakhaee: Explaining Machine Learning Models Using Contextual Importance and Contextual Utility}, url = {https://mcnakhaee.com/posts/2021-05-22-contextual-importance-and-contextual-utility/}, year = {2021} }