Skip to content

Training & Testing Details

Junyong Lee edited this page Oct 5, 2022 · 1 revision

Training & testing the network

Training

# multi GPU (with DistributedDataParallel) example
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
            --is_train \
            --mode MTU2_amp_DVD \
            --config config_MTU2 \
            --trainer trainer_multi_opt \
            --data DVD \
            -LRS CA \
            -b 2 \
            -th 8 \
            -dl \
            -ss \
            -dist

# resuming example (trainer will load checkpoint saved at 100 epoch, training will resume form 101 epoch)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
            ... \
            -th 8 \
            -r 100 \
            -ss \
            -dist

# single GPU (with DataParallel) example
CUDA_VISIBLE_DEVICES=0 python -B run.py \
            ... \
            -ss
  • Options
    • --is_train: If it is specified, run.py will train the network. Default: False
    • --mode: The name of a model to train. The logging folder named with the [mode] will be created as [LOG_ROOT]/PG2022_RealTime_VDBLR/[mode]/. Default: PVDNet_DVD
    • --config: The name of config file located as in ./config/[config].py. Default: None, and the default should not be changed.
    • --trainer: The name of trainer file located as ./models/trainers/[trainer].py. Default: trainer
    • --data: The name of the dataset: DVD | nah. Default: DVD
      • The data structure can be modified in the function set_train_path(..) in ./configs/config.py.
    • --network: The name of network file (of PVDNet) located as ./models/archs/[network].py. Default: MTU
    • -LRS: Learning rate scheduler for training: CA(Cosine annealing scheduler) | LD(step decay schedule). Default: LD
    • -b, --batch_size: The batch size. For the multi GPU (DistributedDataParallel), the total batch size will be, nproc_per_node * b. Default: 8
    • -th, --thread_num: The number of threads (num_workers) used for the data loader. Default: 8
    • -dl, --delete_log: The option whether to delete logs under [mode] (i.e., [LOG_ROOT]/PG2022_RealTime_VDBLR/[mode]/*). The option works only when --is_train is specified. Default: False
    • -r, --resume: Resume training with specified epoch # (e.g., -r 100). Note that -dl should not be specified with this option.
    • -ss, --save_sample: Save sample images for both training and testing. Images will be saved in [LOG_ROOT]/PG2022_RealTime_VDBLR/[mode]/sample/. Default: False
    • -dist: Enables multi-processing with DistributedDataParallel. Default: False

Testing

CUDA_VISIBLE_DEVICES=0 python run.py --config [config] --mode [mode] --network [network] --trainer [trainer] --data [DATASET] --ckpt** [checkpoint path] --eval_mode [evaluation mode]
# e.g.,
CUDA_VISIBLE_DEVICES=0 python run.py \
    --config config_MTU10 \
    --mode MTU10_DVD \
    --network MTU \
    --trainer trainer_multi_opt \
    --data DVD \
    --ckpt_abs_name ckpt/MTU10_DVD.pytorch \
    --eval_mode eval \
    --is_quan \
    --is_qual

Note:

  • Specify only [mode] of the trained model. [config] doesn't have to be specified, as it will be automatically loaded.
  • Testing results will be saved in [LOG_ROOT]/PG2022_RealTime_VDBLR/[mode]/result/quanti_quali/[mode]_[epoch]/[data]/.
  • Options
    • --mode: The name of a model to test.
    • --data: The name of a dataset to evaluate: DVD | nah | REDS. Default: DVD
      • The data structure can be modified in the function set_eval_path(..) in ./configs/config.py.
      • random is for testing models with any video frames, which should be placed as [DATASET_ROOT]/random/[video_name]/*.[jpg|png].
    • -ckpt_name: Load the checkpoint with the name of the checkpoint under [LOG_ROOT]/PVDNet_TOG2021/[mode]/checkpoint/train/epoch/ckpt/ (e.g., python run.py --mode PVDNet_DVD --data DVD --ckpt_name PVDNet_DVD_00100.pytorch).
    • -ckpt_abs_name. Loads the checkpoint of the absolute path (e.g., python run.py --mode PVDNet_DVD --data DVD --ckpt_abs_name ./ckpt/PVDNet_DVD.pytorch).
    • -ckpt_epoch: Loads the checkpoint of the specified epoch (e.g., python run.py --mode PVDNet_DVD --data DVD --ckpt_epoch 100).
    • -ckpt_sc: Loads the checkpoint with the best validation score (e.g., python run.py --mode PVDNet_DVD --data DVD --ckpt_sc).
    • --is_quan: If it is specified, the code will compute and print PSNR and SSIM.
    • --is_qual: If it is specified, the code will save the resulting video frames.
Clone this wiki locally