The goal of Calibration Baselines is to provide a starting point into post-hoc calibration. It is challenging to compare post-hoc calibration methods as they require a pre-trained model whose choice greatly impacts the final calibration. Therefore all methods need to be compared based on the same pre-trained model. By implementing current state-of-the-art methods with simple and concise code, this repo will ease the burden of researchers starting in the field.
os
tqdm
wget
tarfile
torch
torchvision
pytorchcv
numpy
scipy
sklearn
Check as well the specific requirements for imgclsmob included in its folder.
Currently the following methods are included:
- Temperature Scaling (TS)
- Vector Scaling (VS)
- Matrix Scaling (MS)
- Matrix Scaling w/ ODIR (MS-ODIR)
- Dirichlet w/ L2 regularization (Dir-L2)
- Dirichlet w/ ODIR (Dir-ODIR)
- Ensemble Temperature Scaling (ETS)
- Accuracy preserving Isotonic Regression (IRM)
- Accuracy preserving Isotonic Regression with Temperature Scaling (IRM-TS)
- Isotonic Regression One vs All (IROvA)
- Isotonic Regression One vs All with Temperature Scaling (IROvA-TS)
Create a folder named 'datasets' and place your datasets there. Alternatively, to download all datasets, except ImageNet, simply run
python download_datasets.py
Currently the following datasets are supported:
The following datasets are available to evaluate ood calibration:
To train and evaluate some of the post-hoc calibration methods on CIFAR10 use the command
python run_cifar10.py
The results will be saved in the folder data
. Here are some preliminary results for CIFAR10. For the same architecture, each model is calibrated 5 times on 5 different splits and the results are averaged. Below we display the Top1 ECE calibration error.
Architecture | Vanilla | TS | VS | MS | MS-ODIR | Dir-L2 | Dir-ODIR | ETS | IRM. | IROvA |
---|---|---|---|---|---|---|---|---|---|---|
DenseNet-40 (k=12) | 2.14 | 2.15 | 1.65 | 1.62 | 1.76 | 1.79 | 1.79 | 2.64 | 0.85 | 0.88 |
ResNet20 | 1.18 | 1.03 | 1.01 | 1.09 | 1.12 | 0.98 | 1.11 | 1.18 | 0.75 | 0.79 |
ResNet56 | 2.47 | 2.06 | 1.68 | 1.84 | 1.73 | 2.23 | 1.74 | 2.85 | 0.54 | 0.85 |
WRN-16-10 | 0.82 | 0.69 | 0.82 | 0.90 | 0.82 | 0.78 | 0.83 | 1.03 | 0.62 | 0.53 |
WRN-28-10 | 1.36 | 1.39 | 1.11 | 1.07 | 1.09 | 1.36 | 1.10 | 2.14 | 0.52 | 0.63 |
Methods from the following papers were added:
- On Calibration of Modern Neural Networks
- Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration
- Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning
To be added:
Given that the focus is on post-hoc calibration methods, we use pre-trained models, which are obtained from imgclsmob, (already cloned into this repo).
The code for Matrix Scaling, Diriclet Calibration and respective variants was adapted from official code repository for Dirichlet Calbiration: Dirichlet Calibration Python implementation.