The code repository for a challenging label noise called BadLabel and a robust label-noise learning (LNL) algorithm called Robust DivideMix.
The paper link: https://arxiv.org/abs/2305.18377
This paper is accepted at IEEE TPAMI 2024.
DOI: 10.1109/TPAMI.2024.3355425
Python (3.8)
Pytorch (1.8.0)
CUDA
Numpy
You can run the following command to synthesize BadLabel.
cd gen_badlabels
./gen_badlabels.sh
Run the following command to automatically evaluate BadLabel on multiple LNL algorithms. Additionally, in this shell script, we provide the execution commands for each LNL algorithm and the hyperparameter settings we used. Based on this, you can also evaluate a specific LNL algorithm separately.
cd eval_badlabels
./eval_badlabels.sh
We have shared various label noises generated by us under the eval_badlabels/noise directory for quick experimental verification.
If you want to quickly evaluate BadLabel on your own algorithm, we also provide MNIST, CIFAR-10 and CIFAR-100 training sets with injected BadLabel in Google Drive. You can easily load the datasets using load_badlabels_dataset.py under the eval_badlabels directory.
Here we share our evaluation results on CIFAR-10, CIFAR-100, and MNIST.
We evaluated using Standard Training (no defense) [paper] and 11 state-of-the-art LNL methods as baselines. Specifically, these methods are as follows: Co-teaching [paper, code], T-Revision [paper, code], RoG [paper, code], DivideMix [paper, code], AdaCorr [paper, code], Peer Loss [paper, code], ELR [paper, code], Negative LS [paper, code], PGDF [paper, code], ProMix [paper, code], SOP [paper, code].
Method | Noise type / Noise ratio | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sym. | Asym. | IDN | BadLabel | ||||||||||||
20% | 40% | 60% | 80% | 20% | 40% | 20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | ||
Standard Training | Best | 85.21 | 79.90 | 69.79 | 43.00 | 88.02 | 85.22 | 85.42 | 78.93 | 68.97 | 55.34 | 76.76±1.08 | 58.79±1.49 | 39.64±1.13 | 17.80±0.91 |
Last | 82.55 | 64.79 | 41.43 | 17.20 | 87.28 | 77.04 | 85.23 | 74.06 | 52.22 | 28.04 | 75.31±0.24 | 55.72±0.17 | 35.66±0.23 | 13.44±0.26 | |
Co-teaching | Best | 89.19 | 84.80 | 58.25 | 21.76 | 90.65 | 63.11 | 85.72 | 73.42 | 45.84 | 33.43 | 80.41±0.78 | 56.81±3.86 | 14.42±1.22 | 10.51±0.71 |
Last | 89.03 | 84.65 | 57.95 | 21.06 | 90.52 | 56.33 | 85.48 | 72.97 | 45.53 | 25.27 | 79.48±0.75 | 55.54±3.74 | 12.99±1.09 | 4.24±2.44 | |
T-Revision | Best | 89.79 | 86.83 | 78.14 | 64.54 | 91.23 | 89.60 | 85.74 | 78.45 | 69.31 | 56.26 | 76.99±1.38 | 57.21±1.64 | 36.01±1.10 | 14.93±0.50 |
Last | 89.59 | 86.57 | 76.85 | 60.54 | 91.09 | 89.40 | 85.43 | 69.18 | 58.15 | 33.15 | 75.71±1.68 | 55.02±1.34 | 33.99±0.29 | 13.16±0.68 | |
RoG | Best | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
Last | 87.48 | 74.81 | 52.42 | 16.02 | 89.61 | 81.63 | 85.34 | 76.68 | 63.79 | 37.11 | 85.88±0.32 | 64.20±0.91 | 35.89±1.34 | 8.64±0.76 | |
DivideMix | Best | 96.21 | 95.08 | 94.80 | 81.95 | 94.82 | 94.20 | 91.97 | 85.84 | 81.59 | 59.06 | 84.81±0.78 | 58.44±1.45 | 28.38±0.56 | 6.87±0.59 |
Last | 96.04 | 94.74 | 94.56 | 81.58 | 94.46 | 93.50 | 90.77 | 82.94 | 81.19 | 47.81 | 82.13±0.78 | 57.65±1.96 | 16.21±1.24 | 6.12±0.45 | |
AdaCorr | Best | 90.66 | 87.17 | 80.97 | 35.97 | 92.35 | 88.60 | 85.88 | 79.54 | 69.36 | 55.86 | 76.97±0.83 | 57.17±0.71 | 37.14±0.38 | 14.72±0.86 |
Last | 90.46 | 86.78 | 80.66 | 35.67 | 92.17 | 88.34 | 85.70 | 79.05 | 59.13 | 30.48 | 74.71±0.26 | 54.92±0.22 | 34.71±0.22 | 11.94±0.12 | |
Peer Loss | Best | 90.87 | 87.13 | 79.03 | 61.91 | 91.47 | 87.50 | 86.46 | 81.07 | 69.87 | 55.51 | 75.28±1.43 | 55.75±1.39 | 36.17±0.23 | 15.87±0.30 |
Last | 90.65 | 86.85 | 78.83 | 61.43 | 91.11 | 81.24 | 85.72 | 74.43 | 54.57 | 33.76 | 74.00±1.43 | 53.73±1.25 | 34.37±0.68 | 14.71±0.22 | |
ELR | Best | 92.85 | 91.30 | 87.99 | 54.67 | 92.42 | 89.40 | 87.62 | 82.08 | 73.23 | 57.26 | 85.73±0.15 | 62.58±1.33 | 35.24±1.12 | 11.71±0.70 |
Last | 89.37 | 87.78 | 85.69 | 46.71 | 92.31 | 89.11 | 85.31 | 78.05 | 68.12 | 48.99 | 81.88±0.25 | 56.45±0.31 | 30.45±0.30 | 8.67±0.79 | |
Negative LS | Best | 87.42 | 84.40 | 75.22 | 43.62 | 88.34 | 85.03 | 89.82 | 83.66 | 75.76 | 64.21 | 78.77±0.66 | 57.68±0.89 | 36.57±0.88 | 16.46±0.82 |
Last | 87.30 | 84.21 | 75.07 | 43.50 | 65.23 | 47.22 | 81.87 | 82.10 | 70.95 | 45.62 | 73.99±0.90 | 52.45±1.03 | 26.66±0.81 | 3.21±0.44 | |
PGDF | Best | 96.63 | 96.12 | 95.05 | 80.69 | 96.05 | 89.87 | 91.81 | 85.75 | 76.84 | 59.60 | 82.72±0.47 | 61.50±1.87 | 34.46±1.44 | 6.37±0.34 |
Last | 96.40 | 95.95 | 94.75 | 79.76 | 95.74 | 88.45 | 91.30 | 84.31 | 69.54 | 34.81 | 79.95±0.36 | 56.26±1.03 | 30.14±0.85 | 4.56±0.45 | |
ProMix | Best | 97.40 | 96.98 | 90.80 | 61.15 | 97.04 | 96.09 | 94.72 | 91.32 | 76.22 | 54.01 | 94.95±1.43 | 48.36±1.72 | 24.87±1.47 | 9.51±1.51 |
Last | 97.30 | 96.91 | 90.72 | 52.25 | 96.94 | 96.03 | 94.63 | 91.01 | 75.12 | 45.80 | 94.59±1.64 | 44.08±0.49 | 21.33±0.46 | 7.93±1.34 | |
SOP | Best | 96.17 | 95.64 | 94.83 | 89.94 | 95.96 | 93.60 | 90.32 | 83.26 | 71.54 | 57.14 | 84.96±0.35 | 66.25±1.35 | 42.59±1.25 | 12.70±0.89 |
Last | 96.12 | 95.46 | 94.71 | 89.78 | 95.86 | 93.30 | 90.13 | 82.91 | 63.14 | 29.86 | 82.64±0.27 | 61.89±0.25 | 36.51±0.26 | 8.63±0.17 |
Method | Noise type / Noise ratio | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sym. | IDN | BadLabel | |||||||||||
20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | ||
Standard Training | Best | 61.41 | 51.21 | 38.82 | 19.89 | 70.06 | 62.48 | 53.21 | 45.77 | 56.75±0.98 | 35.42±0.77 | 17.70±1.02 | 6.03±0.24 |
Last | 61.17 | 46.27 | 27.01 | 9.27 | 69.94 | 62.32 | 52.55 | 40.45 | 56.30±0.13 | 34.90±0.17 | 17.05±0.28 | 4.18±0.16 | |
Co-teaching | Best | 62.80 | 55.02 | 34.66 | 7.72 | 66.16 | 57.55 | 45.38 | 23.83 | 54.30±0.78 | 26.02±2.13 | 3.97±0.11 | 0.99±0.21 |
Last | 62.35 | 54.84 | 33.44 | 6.78 | 66.02 | 57.33 | 45.24 | 23.72 | 53.97±0.71 | 25.74±1.21 | 3.67±0.14 | 0.00±0.00 | |
T-Revision | Best | 65.19 | 60.43 | 43.01 | 4.03 | 68.77 | 62.86 | 54.23 | 45.67 | 57.86±1.02 | 40.60±1.33 | 13.06±1.20 | 1.92±0.56 |
Last | 64.95 | 60.26 | 42.77 | 3.12 | 68.53 | 62.39 | 53.07 | 41.85 | 57.26±1.54 | 38.40±0.96 | 12.65±0.58 | 1.43±0.95 | |
RoG | Best | - | - | - | - | - | - | - | - | - | - | - | - |
Last | 66.68 | 60.79 | 53.08 | 22.73 | 66.39 | 60.80 | 56.00 | 48.62 | 70.55±0.55 | 58.61±0.65 | 25.74±0.28 | 4.13±0.41 | |
DivideMix | Best | 77.36 | 75.02 | 72.25 | 57.56 | 72.79 | 67.82 | 61.08 | 51.50 | 65.55±0.65 | 42.72±0.44 | 19.17±1.28 | 4.67±0.87 |
Last | 76.87 | 74.66 | 71.91 | 57.08 | 72.50 | 67.37 | 60.55 | 47.86 | 64.96±0.47 | 40.92±0.36 | 13.04±0.85 | 1.10±0.21 | |
AdaCorr | Best | 66.31 | 59.78 | 47.22 | 24.15 | 68.89 | 62.63 | 54.91 | 45.22 | 56.22±0.82 | 35.38±1.27 | 16.87±1.36 | 4.81±0.22 |
Last | 66.03 | 59.48 | 47.04 | 23.90 | 68.72 | 62.45 | 54.68 | 41.95 | 55.69±0.44 | 33.88±0.88 | 14.88±0.52 | 3.76±1.24 | |
Peer Loss | Best | 61.97 | 51.09 | 39.98 | 18.82 | 69.63 | 63.32 | 55.01 | 46.20 | 55.58±1.79 | 37.11±2.01 | 19.53±1.29 | 6.42±0.52 |
Last | 60.64 | 43.64 | 26.23 | 7.65 | 69.38 | 62.70 | 53.90 | 42.14 | 55.00±1.41 | 35.85±1.48 | 18.65±0.22 | 5.74±0.76 | |
ELR | Best | 72.55 | 68.75 | 60.01 | 26.89 | 70.27 | 66.04 | 60.59 | 52.81 | 68.21±0.62 | 43.75±0.21 | 14.39±0.35 | 1.09±0.18 |
Last | 72.13 | 68.60 | 59.78 | 23.95 | 70.13 | 65.87 | 60.41 | 52.57 | 67.97±0.17 | 43.40±0.22 | 13.97±0.38 | 0.98±0.11 | |
Negative LS | Best | 63.65 | 57.17 | 44.18 | 21.31 | 69.20 | 62.67 | 54.49 | 46.96 | 57.76±0.56 | 36.80±0.21 | 17.96±0.31 | 5.88±0.11 |
Last | 63.54 | 56.98 | 43.98 | 21.19 | 63.38 | 55.72 | 42.87 | 24.69 | 56.42±0.71 | 33.38±0.22 | 11.42±0.38 | 1.28±0.14 | |
PGDF | Best | 81.90 | 78.50 | 74.05 | 52.48 | 75.87 | 71.72 | 62.76 | 53.16 | 69.44±0.26 | 46.39±0.39 | 19.05±0.37 | 5.08±0.13 |
Last | 81.37 | 78.21 | 73.64 | 52.11 | 74.90 | 71.32 | 62.06 | 51.68 | 68.18±0.16 | 45.38±0.15 | 16.84±0.24 | 0.72±0.25 | |
ProMix | Best | 79.99 | 80.21 | 71.44 | 44.97 | 76.61 | 71.92 | 66.04 | 51.96 | 69.80±1.58 | 37.73±1.09 | 15.92±1.88 | 4.62±0.95 |
Last | 79.77 | 79.95 | 71.25 | 44.64 | 76.44 | 71.66 | 65.94 | 51.77 | 69.68±0.99 | 37.24±0.84 | 14.88±1.02 | 3.42±0.22 | |
SOP | Best | 77.35 | 75.20 | 72.39 | 63.13 | 72.52 | 63.84 | 56.79 | 50.20 | 65.80±0.68 | 45.61±0.34 | 22.68±0.27 | 2.88±0.11 |
Last | 77.11 | 74.89 | 72.10 | 62.87 | 72.11 | 63.15 | 53.35 | 40.77 | 65.51±0.12 | 45.24±0.26 | 21.55±0.18 | 2.48±0.16 |
Method | Noise type / Noise ratio | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sym. | IDN | BadLabel | |||||||||||
20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | ||
Standard Training | Best | 98.68 | 97.47 | 97.05 | 77.65 | 93.27 | 77.08 | 53.78 | 34.49 | 87.75 | 74.37 | 45.66 | 23.87 |
Last | 94.29 | 80.32 | 51.78 | 22.29 | 87.72 | 70.86 | 47.70 | 23.55 | 82.53 | 61.31 | 39.01 | 15.93 | |
Co-teaching | Best | 99.19 | 98.96 | 98.73 | 77.30 | 93.91 | 83.84 | 63.26 | 30.07 | 90.04 | 67.44 | 42.88 | 11.59 |
Last | 97.28 | 94.88 | 92.09 | 70.10 | 91.92 | 74.40 | 57.73 | 28.05 | 87.37 | 60.01 | 11.33 | 10.13 | |
T-Revision | Best | 99.24 | 99.06 | 98.56 | 96.24 | 90.90 | 78.82 | 58.58 | 11.49 | 85.34 | 69.27 | 45.48 | 21.83 |
Last | 99.15 | 99.02 | 98.44 | 96.14 | 87.74 | 69.92 | 46.17 | 11.35 | 81.99 | 60.24 | 38.26 | 16.48 | |
RoG | Best | - | - | - | - | - | - | - | - | - | - | - | - |
Last | 95.87 | 83.08 | 56.65 | 21.80 | 88.92 | 71.80 | 53.72 | 25.80 | 85.62 | 65.98 | 40.58 | 18.12 | |
DivideMix | Best | 99.53 | 99.40 | 98.52 | 88.05 | 95.74 | 82.61 | 54.11 | 28.05 | 85.63 | 64.76 | 44.77 | 21.18 |
Last | 98.79 | 96.23 | 91.90 | 61.79 | 88.90 | 68.17 | 43.70 | 21.17 | 83.34 | 62.04 | 42.39 | 19.70 | |
AdaCorr | Best | 99.01 | 99.01 | 98.34 | 93.70 | 92.22 | 79.46 | 53.14 | 28.04 | 84.68 | 64.86 | 42.76 | 20.92 |
Last | 93.27 | 77.24 | 49.89 | 23.37 | 87.33 | 67.71 | 44.98 | 22.53 | 80.53 | 59.87 | 38.34 | 17.78 | |
Peer Loss | Best | 99.10 | 98.95 | 98.19 | 93.81 | 92.34 | 85.43 | 58.22 | 47.34 | 88.11 | 67.34 | 45.87 | 24.05 |
Last | 92.85 | 76.92 | 50.98 | 21.82 | 87.21 | 65.20 | 44.62 | 21.84 | 80.49 | 59.62 | 38.85 | 18.87 | |
Negative LS | Best | 99.14 | 98.79 | 97.90 | 85.98 | 93.90 | 82.84 | 55.74 | 31.78 | 88.04 | 69.95 | 47.80 | 22.60 |
Last | 99.00 | 98.73 | 97.86 | 85.92 | 83.56 | 77.70 | 49.73 | 23.75 | 10.87 | 25.80 | 27.03 | 10.32 | |
ProMix | Best | 99.75 | 99.77 | 98.07 | 85.50 | 99.14 | 96.12 | 69.88 | 41.21 | 99.66 | 69.35 | 42.80 | 28.95 |
Last | 99.67 | 99.74 | 97.76 | 65.21 | 97.37 | 92.74 | 61.09 | 30.35 | 99.56 | 66.33 | 35.80 | 19.09 | |
SOP | Best | 99.21 | 98.56 | 97.76 | 86.30 | 92.68 | 77.37 | 58.00 | 29.21 | 91.00 | 67.60 | 48.81 | 28.57 |
Last | 98.65 | 94.05 | 65.03 | 24.48 | 91.39 | 75.97 | 53.29 | 26.88 | 84.66 | 61.78 | 37.07 | 13.95 |
Here we present learning curves of multiple LNL algorithms on CIFAR-10 and CIFAR-100 datasets with different types and ratios of label noise.
FIGURE 1: Learning curves of multiple LNL algorithms on CIFAR-10.
FIGURE 2: Learning curves of multiple LNL algorithms on CIFAR-100.
Run the following command to evaluate Robust DivideMix on different datasets.
cd robust_LNL_algo
./eval_robust_dividemix.sh
Here we share our evaluation results of Robust DivideMix and two baseline methods on CIFAR-10 and CIFAR-100 datasets with multiple types of noise.
Noise type | Method / Noise ratio | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard Training | DivideMix | Robust DivideMix | |||||||||||
20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | ||
Sym. | Best | 85.21 | 79.90 | 69.79 | 43.00 | 96.21 | 95.08 | 94.80 | 81.95 | 95.45±0.36 | 94.84±0.13 | 94.25±0.11 | 61.59±1.24 |
Last | 82.55 | 64.79 | 41.43 | 17.20 | 96.04 | 94.74 | 94.56 | 81.58 | 95.28±0.38 | 94.71±0.16 | 94.11±0.12 | 60.98±1.21 | |
Asym. | Best | 88.02 | 85.22 | - | - | 94.82 | 94.20 | - | - | 91.77±0.46 | 86.88±0.82 | - | - |
Last | 87.28 | 77.04 | - | - | 94.46 | 93.50 | - | - | 90.62±0.38 | 84.02±1.65 | - | - | |
IDN | Best | 85.42 | 78.93 | 68.97 | 55.34 | 91.97 | 85.84 | 81.59 | 59.06 | 90.44±1.09 | 89.71±0.74 | 78.12±0.31 | 60.64±0.46 |
Last | 85.23 | 74.06 | 52.22 | 28.04 | 90.77 | 82.94 | 81.19 | 47.81 | 87.30±1.72 | 89.16±0.69 | 72.33±1.08 | 50.38±0.68 | |
BadLabel | Best | 76.76 | 58.79 | 39.64 | 17.80 | 84.81 | 58.44 | 28.38 | 6.87 | 92.07±1.06 | 86.70±3.83 | 76.47±3.89 | 27.41±3.25 |
Last | 75.31 | 55.72 | 35.66 | 13.44 | 82.13 | 57.65 | 16.21 | 6.12 | 91.76±1.27 | 85.96±4.33 | 73.29±3.81 | 25.20±2.72 | |
Average | Best | 83.85 | 75.71 | 59.47 | 38.71 | 91.95 | 83.39 | 68.26 | 49.17 | 92.43±0.74 | 89.53±1.38 | 82.95±1.43 | 49.88±1.65 |
Last | 82.59 | 67.90 | 43.10 | 19.56 | 90.85 | 82.21 | 63.99 | 45.17 | 91.24±0.93 | 88.46±1.71 | 79.91±1.67 | 45.52±1.54 |
Noise type | Method / Noise ratio | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard Training | DivideMix | Robust DivideMix | |||||||||||
20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | 20% | 40% | 60% | 80% | ||
Sym. | Best | 61.41 | 51.21 | 38.82 | 19.89 | 77.36 | 75.02 | 72.25 | 57.56 | 77.35±0.28 | 74.40±0.20 | 70.74±0.45 | 48.13±0.80 |
Last | 61.17 | 46.27 | 27.01 | 9.27 | 76.87 | 74.66 | 71.91 | 57.08 | 77.06±0.28 | 74.16±0.23 | 69.93±0.59 | 47.84±0.82 | |
IDN | Best | 70.06 | 62.48 | 53.21 | 45.77 | 72.79 | 67.82 | 61.08 | 51.50 | 73.49±0.28 | 69.47±0.18 | 63.64±0.21 | 52.74±0.73 |
Last | 69.94 | 62.32 | 52.55 | 40.45 | 72.50 | 67.37 | 60.55 | 47.86 | 73.10±0.20 | 68.88±0.13 | 61.03±0.31 | 46.84±0.17 | |
BadLabel | Best | 56.75 | 35.42 | 17.70 | 6.03 | 65.55 | 42.72 | 19.17 | 4.67 | 65.29±0.76 | 46.64±0.48 | 41.80±1.19 | 21.48±0.39 |
Last | 56.30 | 34.90 | 17.05 | 4.18 | 64.96 | 40.92 | 13.04 | 1.10 | 64.49±0.96 | 45.26±0.40 | 35.91±0.67 | 16.91±0.41 | |
Average | Best | 62.74 | 49.70 | 36.58 | 23.90 | 71.90 | 61.85 | 50.83 | 37.91 | 72.04±0.44 | 63.50±0.29 | 58.73±0.62 | 40.78±0.64 |
Last | 62.47 | 47.83 | 32.20 | 17.97 | 71.44 | 60.98 | 48.50 | 35.35 | 71.55±0.48 | 62.77±0.25 | 55.62±0.52 | 37.20±0.47 |
Below, we present the learning curves of multiple LNL algorithms on CIFAR-10 and CIFAR-100 with different BadLabel noise ratios.
FIGURE 3: Learning curves of multiple LNL algorithms on CIFAR-10 with different BadLabel noise ratios.
FIGURE 4: Learning curves of multiple LNL algorithms on CIFAR-100 with different BadLabel noise ratios.