This is the official repository for
Local Temperature Scaling for Probability Calibration
Zhipeng Ding, Xu Han, Peirong Liu, and Marc Niethammer
ICCV 2021 eprint arxiv
If you use LTS or some part of the code, please cite:
@inproceedings{ding2021local,
title={Local temperature scaling for probability calibration},
author={Ding, Zhipeng and Han, Xu and Liu, Peirong and Niethammer, Marc},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={6889--6899},
year={2021}
}
Different from previous probability calibration methods, LTS is a spatially localized probability calibration approach for semantic segmentation.
In the following figure, Left: Predicted probabilities (confidence) by a U-Net. Middle: Average accuracy of each bin for 10 bins of reliability diagram with an equal bin width indicating different probability ranges that need to be optimized for different locations. Right: Temperature value map obtained via optimization, revealing different optimal localized Temperature scaling values at different locations.
In the following two figures, the top row shows the global reliability diagrams for different methods for the entire image. The three rows underneath correspond to local reliability diagrams for the different methods for different local patches. Note that temperature scaling (TS) and image-based temperature scaling (IBTS) can calibrate probabilities well across the entire image. Visually, they are only slightly worse than LTS. However, when it comes to local patches, LTS can still successfully calibrate probabilities while TS and IBTS can not. In general, LTS improves local probability calibrations.
With KKT conditions, we can prove that
When the to-be-calibrated segmentation network is overconfident,
minimizing NLL w.r.t. TS, IBTS, and LTS results in solutions that are also the solutions of
maximizing entropy of the calibrated probability w.r.t. TS, IBTS and LTS under the condition of overconfidence.
Similarly, there is another theorem to validate the effectiveness of TS, IBTS and LTS under the condition of underconfidence in Appendix.
The overall architecture for probability calibration via (local) temperature scaling is shown in the following figure. The output logit map of a pre-trained semantic segmentation network (Seg) is locally scaled to produces the calibrated probabilities. OP denotes optimization or prediction via a deep convolutional network to obtain the (local) temperature values.
Specifically, in this paper, we use a simple tree-like convolutional network (See figure below) as in (Lee et al.). However other neural network architectures could also work as illustrated by (Bai et al.). The following figures are the high-level illustration of the tree-like CNN. Left subfigure is for LTS and right subfigure is for IBTS. Detailed descriptions can be found in Appendix.
As an example, we use the Tiramisu model for semantic segmentation on CamVid dataset. Note that other deep segmentation networks and datasets can also be used.
Tiramisu is a fully convolutional densenet. The implementation and training details can be found this github repository. You need to modify the code accordingly in order to make it addaptive to your settings.
After getting logits from the segmentation model and properly set the dataloader, the next step is to train calibration model. To train LTS, simply run
python Tiramisu_calibration.py --gpu 0 --model-name LTS --epochs 200 --batch-size 4 --lr 1e-4 --seed 2021 --save-per-epoch 1
The table below is a collection of probability calibration models that can be used as baselines. You could pull these reposteries and modify the code accoradingly.
To evaluate the four calibration metrics (ECE, MCE, SCE, and ACE) defined in the paper, simply run
python probability_measure_CamVid.py --gpu 0 --model_name LTS
python probability_measure_Local_CamVid.py --gpu 0 --model_name LTS
For multi-atlas segmentation experiment to validate the probability calibration, please refer to VoteNet-Family for details.