This repository contains important papers on concept bottleneck models organized by year of publishing.
Interactive Concept Bottleneck Models
A closer look at the intervention procedure of concept bottleneck models
Label Free Concept Bottleneck Models
Concept-based Explainable Artificial Intelligence: A Survey
Post-hoc Concept Bottleneck Models
Addressing Leakage in Concept Bottleneck Models
Do Concept Bottleneck Models learn as intended
Description: Investigates which regions in the input space CBMs use to make predictions. Claims that pretrained concepts do not correspond to anything semantically meaningful in the input space suggesting that CBMs might be using confounding information to make concept label predictions.
Promises and Pitfalls of Black-Box Concept Learning Models
Description: Empirically shows that current methods such as concept whitening models and sequential CBMs that attempt to address the information leakage problem in concept bottleneck models are largely ineffective.
Editing a Classifier by Rewriting Its Prediction Rules
IS DISENTANGLEMENT ALL YOU NEED? COMPARING CONCEPT-BASED & DISENTANGLEMENT APPROACHES
Description: Introduces a framework to extract concepts from feature vectors that are later used to predict target labels. Three methods:
- Independent: Train to predict concept c from input x independently from predicting label y from concept c.
- Sequential: Train to predict concept c first then predict the label from predicted concepts c.
- Joint: Simultaneously predicts concept c and target label y using a joint loss function.
Limitations:
-
Does not investigate the possibility of concept botteneck models learning spurious input features to make concept predictions.
-
The joint framework (the preferred framework in the paper) might learn features directly from the input to predict target labels giving less value to the pre-specified concepts and more to uninterpretable attributes.
Now You See Me (CME): Concept-based Model Extraction
Description: Introduces a model extraction framework that is used to analyse concept information in DNN models. Specifically,
- Discovers concepts learnt by a DNN model: Does so by approximating two functions, a function that predicts intermediate concept labels and a function that predicts the target labels from concept predictions.
- Analyses how DNNs use concept information when predicting labels: utilizes latent space analysis methods to inspect which concepts are learned and how these concepts are represented across different DNN layers.
- Identifies the most important concept information: Does so by picking the 32 highest coefficients from a logistic regression model trained to predict target labels from ground-truth concept labels.
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Description: Explores the idea of complete concepts in deep neural networks. Specifically,
- Defines completeness of concepts in deep neural networks
- Introduces a completeness score to evaluate how sufficient concepts are for model predictions
- Introduces a method to discover complete, interpretable concepts
- ConceptSHAP: studies how important individual concepts are to the overall completeness score
DEBIASING CONCEPT-BASED EXPLANATIONS WITH CAUSAL ANALYSIS
Description: Introduces a causal prior graph which attempts to model unobserved confounding information the model might be using to make its predictions. Uses a two stage regression technique to remove the detected confounding information.