Open Cities AI Challenge: Segmenting Buildings for Disaster Resilience. The pipeline follows severstal.
2nd place out of 1106 with 0.8575 jaccard index (top 1 -- 0.8598).
- GPU(s) with 32Gb RAM (e.g. Tesla V100)
- NVIDIA apex
pip install -r requirements.txt
First download the train and test data from the competition link into data folder.
Then you must prepare train and test datasets. For train use:
python ./src/data/prepare_train.py
Actually, it's hard to install all dependencies like GDAL
etc. So, it's better to use code
from competition forum in Getting started
topic by @johnowhitaker
. Or even better download extracted tiles from kaggle dataset
kaggle d download kbrodt/oc-t1-1024-z19-p1
kaggle d download kbrodt/oc-t1-1024-z19-p2
kaggle d download kbrodt/oc-t1-1024-z19-p3
For test simply use
python ./src/data/prepare_test.py
To train the model run
sh ./train.sh
On 6 GPUs Tesla V100 it will take around 50h. This will generates trained models and submission file.
If you want only predict the test set, first you need to download model weights from yandex disk, unzip and execute:
sh ./submit.sh
On 6 GPUs Tesla V100 it takes around 1h.
It turned out the approach is quite straightforward. You can get good results by training Unet-like models with heavy encoders using only tier 1 data and some tricks. The main idea is to take into account the rare tiles. One way is to assign some class to the tile and use the inverse probability of that class in the dataset to oversample them. It significantly speeds up the learning process. The other trick is to change binary cross-entropy loss to multiclass cross-entropy (in our case 2 output channels) and take argmax instead of searching an optimal threshold. It is also preferred to train the model in the same conditions as at inference, i. e. at inference we have 1024x1024 tiles, so we need maximally to preserve this resolution during training. To overfit on testset one you can do pseudo-labelling. After obtaining a strong single model do train 5 more models and do ensemble them by simple averaging.
- Train tier 1, 1024x1024 tile's size with 19 zoom level
- 5 folds stratified by
area_scene
- Balanced sampler by
area_scene
- Unet-like with heavy encoders:
senet154
,se_resnext50_32x4d
,densenet161
,inceptionv4
,efficientnet-b4
- Cross-entropy loss (not to search threshold for binarization)
- 2 rounds of pseudo-labeling
-
Although we have train tier 2 dataset with "dirty" labels, we can pretrain on it and finetune on tier 1, but it doesn't work for me. Another way to use tier 2 is to fix "dirty" labels. I tried to train on tier 2 predictions of models trained on tier 1 with knowledge-distillation (KD), but it works the same if we train only on tier 1. I only managed train a single model
efficientnet-b3
with score 85.02. -
Different zoom levels (18 and 20) greatly increases data size, hence increases training time, so I gave up it.
-
Instead of 1-channel footprints I used 3-channel mask footprint/boundary/contact, but I didn't managed better results.
-
MixUp training
- Do first rounds of pseudo-labeling with heavier encoders (instead using
effnets
) - Use different zoom levels if have time
- Fix/filter/clean/remove "dirty" labels of tier 2 using trained models on tier 1 by calculating jaccard index between "dirty" label and prediction. Add labels with high score, because it signifies that labels are close enough to predictions, where latter are obtained on clean tier 1 dataset. Remove labels with low score. See some examples and ideas here.