Concept Bottleneck Models

human-AI collaboration

Vaibhav Balloli


January 6, 2024

A brief introduction to Concept Models

Imagine deploying a neural network. That’s it, that’s the joke.

Now really imagine deploying a neural network or any other machine learning model that outputs some prediction on the given input. For example, let’s say it is a dog classifier i.e. given an image of a dog, it would output what breed of dog it could be. Now what if the model here first classified if the dog has certain visual features like is_furry is set to true, is_golden is set to true, has_white_hair is set to false. Only after predicting these features does the model then classify the dog to be a golden retriever. So, why is this important? After all, does it matter if these predictions are latent or shown? The key feature in this exercise is this - let’s say the the image was grainy and the network predicted is_furry feature to be false - you could now correct it and set it to true because it’s pretty easy for us humans to do so even without knowing what a golden retriever is. The model will now most likely classify the image as golden_retriever, thanks to your correction!

This is the promise of Concept Models - the ability for humans to intervene in the prediction process by being allowed to modify the presence or absence of concepts in the input - images, text, etc. This idea was first formalized in Concept Bottleneck Models and numerous works have followed since that build up on this idea and really try to make it compatible to humans!

Concept Bottleneck Models(CBMs)

CBMs are class of models that comprise of two stages of prediction - the first stage predicts the concepts i.e. the concept head and the second stage predicts the class by utilizing the concepts predicted. The key observation here is the list of concepts used to predict the final class is not necessarily exhaustive. A naive outlook at this indicates more concepts increase the degrees of freedom of control. The reason these models are called bottleneck models is due to the concept prediction stage that acts as a bottleneck before the actual class prediction. The figure below provides an illustration of CBMs.

Concept Bottleneck Models with the Concept Head and the Classifier

Preparing your data: what goes into training a traditional CBM

  1. Pre-defined list concepts. Let’s take the example of the CUB dataset. Some concepts in this dataset include black undertail color, yellow throat color, etc.
  2. Annotating these pre-defined list of concepts and the associated class label with it.
  3. That’s it!

Concepts you need to build your own CBM

Yes, very much intended. Moving on, here are the concepts you need to build your own CBM:

  1. Concept Predictor Model: depending on your modality, it can be pre-trained encoders, pre-trained image classifiers, etc. Pre-training does help a lot as evidenced by tons of literature in the deep learning community over the past 7 years maybe. The key part is here the activation function used to predict the concepts: while sigmoid might seem like the obvious choice, sigmoid is shown to squash the activations a lot and you end up with not so great performance. ReLUs on the other hand perform well but their intervention needs a special computation. The intuition here is if you were to intervene in sigmoid, you’d either set the concept close to zero if it is absent or close to one if it is present, which is not the case for ReLUs simply because the maximum activation you could achieve here is pretty high. Therefore, the authors suggest computing the 95th and 5th percentile of the activations for each of the concept to indicate presence and absence respectively and set the intervention values the same when deploying.
  2. A simple neural network or actually any machine learning model you fancy that takes in these inputs and predicts the final label or value depending on classification or regression.


BibTeX citation:
  author = {Balloli, Vaibhav},
  title = {Concept {Bottleneck} {Models}},
  date = {2024-01-06},
  url = {},
  langid = {en}
For attribution, please cite this work as:
Balloli, Vaibhav. 2024. “Concept Bottleneck Models.” January 6, 2024.