Introduction to XAI: data-driven models

IN104 - Explainability in AI Project

IN104 – Explainability in AI Project – April 2021

Above is the introductory presentation to the practical explainability module of the Computer Science Course led by professor Natalia Díaz Rodríguez at ENSTA in Paris. I had the opportunity to do in-person classes and hands-on sessions on the topic of Explainability in AI. Students were asked to work on a coding project and a report.

Transcript of the Presentation

Explaining the inner workings of decision-making systems is quickly gathering importance in artificial intelligence.

Machine learning is a branch of artificial intelligence (AI), also defined as a capability of a system to adapt to new circumstances and to detect and extrapolate patterns (Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach).

To understand what I am going to introduce next, it’s important to have an idea of the three machine learning paradigms: supervised learning, unsupervised learning, and reinforcement learning. 

I’ll give you a high-level overview of them, but feel free to ask questions if needed. 

  • Supervised learning methods aim at making a prediction starting from a dataset of labeled data, and a relationship between inputs and the outputs (e.g., size of the house and price of the house or spam vs non-spam). Problems that can be solved by supervised learning methods can be divided into Regression and Classification problems. The former produces results within a continuous output, while the latter provides a discrete output.    
  • Unsupervised learning methods try to find some structure in the data (e.g. clusters), given a dataset without labels.
  • Reinforcement learning methods use a trial-and-error approach. Starting from a system that is capable of sampling data, the goal is to find an optimal policy (i.e., behavior or a sequence of actions) by receiving positive or negative feedback (i.e., reward).

We can now define explainability as the general problem of explaining the inner workings of AI algorithms. The challenging part is to make these explanations meaningful for the observer.

Explainability seems to be closely tied to the type of models and their objectives. 

Describing what aspects of the input suggest a given output 
Explaining what allows the system to behave the way 
it does and why:
Explanation or Causation

Due to this, we can distinguish between data-driven models and goal-driven models. 

  • In data-driven models, such as neural-network, explainability typically describes what aspects of the input suggest a given output (i.e., interpretation). 
  • In goal-driven models, such as reinforcement learning, explainability provide information about what allows the system to behave the way it does and why (i.e., explanation or causation). 

In both cases, explainability refers to an agent’s capability or system’s module to explain its internal workings considering the human as a target of the explanations

Other terms such as transparency, expressivity, understandability, predictability, and communicability are also used to refer to this concept. I suggest you read the paper written by Alejandro Barredo Arrieta et al., and some of the other references I’ll add at the end of the presentation to have a broader idea about the topic.

Since human intuition is grounded in causal, not statistical, logic, there has been a growing interest in causal inference in explainability.

Causal inference concerns the study of how and why causes influence their effects. For example:

  • We can observe a symptom and infer something about a disease – the joint probability of X and Y 
  • We can do an action – such as taking an aspirin, and infer what is the effect of that action – the probability of observing Y by doing X
  • We can interrogate ourselves about other possible causes, or what would happen under different circumstances – the probability of observing Yx, in circumstances x prime and y prime.

If you are interested in causal inference approaches in explainability I suggest you have a look at the lecture of Judea Pearl on this topic.

Revealing the internal states of a robot, explaining the reason for a prediction, or acting accordingly to humans’ expectations, explainability can help humans to provide more informative examples to a robot learner, build trust, and facilitate collaboration with the AI system (Human Trust: McKinney et al. 2020, Wang et al. 2016 Teamwork and Ethical Decision-Making: Huang et al. 2018, Schaefer et al. 2016 Interactive machine/robot Learning: Chao et al. 2010, Kwon et al. 2018).

I am reporting here an example of interactive robot learning, which is probably a less known example when we talk about explainability in AI.

Interactive machine/robot Learning
Chao et al. 2010

Motivated by human social learning, the authors of this research believe that a transparent learning process can help guide the human teacher to provide the most informative instruction. 

They claim that active learning is an inherently transparent machine learning approach because by querying the teacher, the learner reveals what is known and what is unclear.

When we refer to data-driven models we can distinguish between global vs local explainability and model-specific vs model-agnostic interpretation methods.  

  • Local explainability tries to explain why did the model make a certain prediction for an instance or a group of instances (e.g., Why did the model make a certain prediction for an instance? Why did the model make specific predictions for a group of instances?)
  • Global explainability tries to understand how the model makes decisions, based on how the algorithm learns components such as weights and other parameters (How does the trained model make predictions? How do parts of the model affect predictions?)

An example of local explainability is the one provided by LIME, which is one of the methods we are going to explore together.

Steps of the LIME algorithm. 
Picture by Giorgio Visani

Model-specific methods are tied to the model itself. Therefore, we refer to interpretable models. Examples of interpretable models are linear regression and decision trees.

In contrast, model-agnostic methods add a layer to the model that extracts information to help humans understand the outputs of the model.  

Due to this, these methods allow developers to use any machine learning model and still obtain a general idea about what aspects of the input influenced the output. Some of the most popular methods to build explainability in data-driven models are Partial Dependence Plots, SHapley Addictive exPlanations (SHAP), Class Activation Mapping (CAM), and Local Interpretable Model-agnostic Explanations (LIME).

Mentioning the book of Molnar: 

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted outcome of a machine learning model (J. H. Friedman 2001). A partial dependence plot can show whether the relationship between the target and a feature is linear, monotonic or more complex. For example, when applied to a linear regression model, partial dependence plots always show a linear relationship.

CAM – Class Activation Mapping

Selvaraju et al. 2017Grad-CAM Tutorial, CAM Tutorial MIT

A class activation map for a particular category indicates the discriminative image regions used by the CNN to identify that category.

The SHAP method explains individual predictions. SHAP is based on game theory and uses Shapley Values. The Shapley value of a feature represents the average marginal contribution of that specific value across all possible coalitions.

A prediction can be explained by assuming that each feature value of the instance is a “player” in a game where the prediction is the payout. The Shapley values tell us how to fairly distribute the “payout” among the features.

LIME technique was proposed by Ribeiro and his colleagues in 2016 (here you can find an example notebook that details how to use LIME for explanations of tabular data). However, as for the others, this technique can also be applied for explanations with text and image data. The basic idea is to understand why a machine learning model (deep neural network) predicts that an instance (image) belongs to a certain class (labrador in this case). Differently from SHAP, LIME does not guarantee that the prediction is fairly distributed among the features.

LIME – Local Interpretable Model-agnostic Explanations – Ribeiro et al. 2016
Interpretable Machine Learning with LIMEArteaga

For your project, I’ll ask you to choose three datasets – a tabular data dataset (e.g., e.g., Titanic Survivors), a text dataset (e.g., Quora Insincere Questions Classification), and an image dataset (e.g., Chest X-Ray Images – Pneumonia). For each dataset, we will explore different methods to generate explanations about your prediction and evaluate your methods. We will then provide an interpretation of the explanations given by each method and compare your results for each dataset.


At the beginning of this lecture, we saw three paradigms of machine learning – however, we explored methods for explaining data-driven models.

  • What are the challenges of making explainable reinforcement learning agents? 
  • What are current methods for making RL agents explainable? 

Another challenging aspect we did not discuss is how can we model the observer to provide meaningful explanations? There exist some plan explainability approaches that aim to build explainability by minimizing the difference between the human model of the robot and the actual model of the robot. For the ones that are curious about this, we can further discuss methods to account for the observer.


Bennetot, A., Donadello, I., Qadi, A.E., Dragoni, M., Frossard, T., Wagner, B., Saranti, A., Tulli, S., Trocan, M., Chatila, R., Holzinger, A., Garcez, A.S., & Rodríguez, N.D. (2021). A Practical Tutorial on Explainable AI Techniques. ArXiv, abs/2111.14260.

Artificial Intelligence – AIMA Exercises, https://aimacode.github.io/aima-exercises/

Russell S. J. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach (old edition)

Ribeiro, M.T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Sokol, K., & Flach, P.A. (2020). Explainability fact sheets: a framework for systematic assessment of explainable approaches. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency.

Lipton, Z.C. (2018). The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16, 3 (May-June 2018), 31–57.

Halpern, J.Y., & Pearl, J. (2005). Causes and Explanations: A Structural-Model Approach. Part II: Explanations. The British Journal for the Philosophy of Science, 56, 889 – 911.

Madumal, P., Miller, T., Sonenberg, L., & Vetere, F. (2020). Distal Explanations for Explainable Reinforcement Learning Agents. ArXiv, abs/2001.10284.

Madumal, P., Miller, T., Sonenberg, L., & Vetere, F. (2020). Explainable Reinforcement Learning Through a Causal Lens. AAAI.

Arrieta, A., Diaz-Rodriguez, N., Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. ArXiv, abs/1910.1004