-
Notifications
You must be signed in to change notification settings - Fork 46
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Add datamodule & model class for biomasters regression example - Add notebooks to show the inference for biomasters & chesapeake bay
- Loading branch information
Showing
14 changed files
with
1,760 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# lightning.pytorch==2.1.2 | ||
seed_everything: 42 | ||
data: | ||
metadata_path: configs/metadata.yaml | ||
batch_size: 10 | ||
num_workers: 8 | ||
train_chip_dir: data/biomasters/train_cube | ||
train_label_dir: data/biomasters/train_agbm | ||
val_chip_dir: data/biomasters/test_cube | ||
val_label_dir: data/biomasters/test_agbm | ||
model: | ||
ckpt_path: checkpoints/clay-v1-base.ckpt | ||
lr: 1e-3 | ||
wd: 0.05 | ||
b1: 0.9 | ||
b2: 0.95 | ||
feature_maps: | ||
- 2 | ||
- 5 | ||
- 7 | ||
- 9 | ||
- 11 | ||
trainer: | ||
accelerator: auto | ||
strategy: ddp | ||
devices: auto | ||
num_nodes: 1 | ||
precision: bf16-mixed | ||
log_every_n_steps: 5 | ||
max_epochs: 100 | ||
default_root_dir: checkpoints/regression | ||
fast_dev_run: False | ||
num_sanity_val_steps: 0 | ||
# limit_train_batches: 0.25 | ||
# limit_val_batches: 0.25 | ||
accumulate_grad_batches: 4 | ||
logger: | ||
- class_path: lightning.pytorch.loggers.WandbLogger | ||
init_args: | ||
entity: developmentseed | ||
project: clay-regression | ||
log_model: false | ||
callbacks: | ||
- class_path: lightning.pytorch.callbacks.ModelCheckpoint | ||
init_args: | ||
dirpath: checkpoints/regression | ||
auto_insert_metric_name: False | ||
filename: biomasters_epoch-{epoch:02d}_val-score-{val/score:.3f} | ||
monitor: val/score | ||
mode: min | ||
save_last: False | ||
save_top_k: 2 | ||
save_weights_only: True | ||
verbose: True | ||
- class_path: lightning.pytorch.callbacks.LearningRateMonitor | ||
init_args: | ||
logging_interval: step | ||
- class_path: src.callbacks.LayerwiseFinetuning | ||
init_args: | ||
phase: 10 | ||
train_bn: True | ||
plugins: | ||
- class_path: lightning.pytorch.plugins.io.AsyncCheckpointIO |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
## Download data | ||
The data comes as multifile zip, it can be downloaded from the | ||
[BioMassters](https://huggingface.co/datasets/nascetti-a/BioMassters/) | ||
huggingface repository. Grab a coffee, this is about 250GB in size. | ||
|
||
The next step is to unzip training data. The data comes in a multi-file | ||
zip archive. So it needs to be unzipped using a library that can handle | ||
the format. 7z works quite well in this case. Grabb another coffee, this | ||
will take a while. | ||
|
||
```bash | ||
sudo apt install p7zip-full | ||
``` | ||
|
||
### Extract train feature | ||
|
||
|
||
```bash | ||
7z e -o/home/tam/Desktop/biomasters/train_features/ /datadisk/biomasters/raw/train_features.zip | ||
``` | ||
|
||
Should look something like this | ||
|
||
``` | ||
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 | ||
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz (A0652),ASM,AES-NI) | ||
Scanning the drive for archives: | ||
1 file, 10247884383 bytes (9774 MiB) | ||
Extracting archive: /datadisk/biomasters/raw/train_features.zip | ||
-- | ||
Path = /datadisk/biomasters/raw/train_features.zip | ||
Type = zip | ||
Physical Size = 10247884383 | ||
Embedded Stub Size = 4 | ||
64-bit = + | ||
Total Physical Size = 149834321503 | ||
Multivolume = + | ||
Volume Index = 13 | ||
Volumes = 14 | ||
Everything is Ok | ||
Folders: 1 | ||
Files: 189078 | ||
Size: 231859243932 | ||
Compressed: 149834321503 | ||
``` | ||
|
||
### Extract train AGBM | ||
|
||
```bash | ||
7z e -o/home/tam/Desktop/biomasters/train_agbm/ /datadisk/biomasters/raw/train_agbm.zip | ||
``` | ||
|
||
Should look something like this | ||
|
||
``` | ||
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 | ||
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz (A0652),ASM,AES-NI) | ||
Scanning the drive for archives: | ||
1 file, 575973495 bytes (550 MiB) | ||
Extracting archive: /datadisk/biomasters/raw/train_agbm.zip | ||
-- | ||
Path = /datadisk/biomasters/raw/train_agbm.zip | ||
Type = zip | ||
Physical Size = 575973495 | ||
Everything is Ok | ||
Folders: 1 | ||
Files: 8689 | ||
Size: 2280706098 | ||
Compressed: 575973495 | ||
``` | ||
|
||
### Extract test features | ||
|
||
```bash | ||
7z e -o/home/tam/Desktop/biomasters/test_features/ /datadisk/biomasters/raw/test_features_splits.zip | ||
``` | ||
|
||
Should look something like this | ||
|
||
``` | ||
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 | ||
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz (A0652),ASM,AES-NI) | ||
Scanning the drive for archives: | ||
1 file, 6912625480 bytes (6593 MiB) | ||
Extracting archive: /datadisk/biomasters/raw/test_features_splits.zip | ||
-- | ||
Path = /datadisk/biomasters/raw/test_features_splits.zip | ||
Type = zip | ||
Physical Size = 6912625480 | ||
Embedded Stub Size = 4 | ||
64-bit = + | ||
Total Physical Size = 49862298440 | ||
Multivolume = + | ||
Volume Index = 4 | ||
Volumes = 5 | ||
Everything is Ok | ||
Folders: 1 | ||
Files: 63348 | ||
Size: 78334396224 | ||
Compressed: 49862298440 | ||
``` | ||
|
||
### Extract test AGBM | ||
|
||
```bash | ||
7z e -o/home/tam/Desktop/biomasters/test_agbm/ /datadisk/biomasters/raw/test_agbm.tar | ||
``` | ||
|
||
Should look something like this | ||
|
||
``` | ||
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 | ||
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz (A0652),ASM,AES-NI) | ||
Scanning the drive for archives: | ||
1 file, 729766400 bytes (696 MiB) | ||
Extracting archive: /datadisk/biomasters/raw/test_agbm.tar | ||
-- | ||
Path = /datadisk/biomasters/raw/test_agbm.tar | ||
Type = tar | ||
Physical Size = 729766400 | ||
Headers Size = 1421312 | ||
Code Page = UTF-8 | ||
Everything is Ok | ||
Folders: 1 | ||
Files: 2773 | ||
Size: 727862586 | ||
Compressed: 729766400 | ||
``` | ||
|
||
## Prepare data | ||
|
||
This will take the average of all timesteps available for each tile. | ||
The time steps for Sentinel-2 are not complete, not all months are | ||
provided for all tiles. In addtion, the Clay model does not take time | ||
series as input. So aggregating the time element is simplifying but | ||
ok for the purpose of this example. | ||
|
||
**In addition, we skip the one orbit because it nodata most of the time** | ||
|
||
|
||
### Prepare training features | ||
|
||
```bash | ||
python finetune/regression/preprocess_data.py \ | ||
--features=/home/tam/Desktop/biomasters/train_features/ \ | ||
--cubes=/home/tam/Desktop/biomasters/train_cubes/ \ | ||
--processes=12 \ | ||
--sample=1 \ | ||
--overwrite | ||
``` | ||
|
||
### Prepare test features | ||
|
||
```bash | ||
python finetune/regression/preprocess_data.py \ | ||
--features=/home/tam/Desktop/biomasters/test_features/ \ | ||
--cubes=/home/tam/Desktop/biomasters/test_cubes/ \ | ||
--processes=12 \ | ||
--sample=1 \ | ||
--overwrite | ||
``` |
Oops, something went wrong.