Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSOD Low mAP #44

Open
AdamCuellar opened this issue Feb 4, 2020 · 1 comment
Open

DSOD Low mAP #44

AdamCuellar opened this issue Feb 4, 2020 · 1 comment

Comments

@AdamCuellar
Copy link

AdamCuellar commented Feb 4, 2020

Not necessarily an issue, but the mAP I got from DSOD512 training on VOC 07+12 and testing on 07 was quite low, approximately 0.13.

Only thing I really changed was using Adam instead of AdamAccumulate because it throws an error on tf 2.0. I also used softmax.

Also, metrics don't show during training other than the loss itself.

def trainMultiGPU():
    # set up data sets
    gt_util_voc = GTUtility("data/VOC2012train/")
    gt_util_voc7 = GTUtility("data/VOC2007train/")
    gt_util_voc_val = GTUtility("data/VOC2012val/", validation=True)
    gt_util_voc7_val = GTUtility("data/VOC2007val/", validation=True)

    gt_util_train = GTUtility.merge(gt_util_voc, gt_util_voc7)
    gt_util_val = GTUtility.merge(gt_util_voc_val, gt_util_voc7_val)

    experiment = 'dsod300_voc12_7'
    batch_size = 16

    # class_weights = prior_util.compute_class_weights(gt_util_train)
    class_weights = np.array(
        [0.00007169, 1.20864663, 1.23607288, 0.81087541, 1.32018959, 1.65339534, 1.47852761, 0.45099343, 0.84154551,
         0.33765636, 1.41315118, 1.32907548, 0.63492811, 1.15680594, 1.18978997, 0.07548318, 0.91531396, 1.21262288,
         1.15910985, 1.49269817, 1.08304682])

    # DSOD paper
    # batch size 128
    # 320k iterations
    # initial learning rate 0.1

    epochs = 1000
    initial_epoch = 0

    with tf.device("/cpu:0"):
        # set up DSOD 512
        model = DSOD512(num_classes=gt_util_train.num_classes, softmax=True)

    prior_util = PriorUtil(model)
    gen_train = InputGenerator(gt_util_train, prior_util, batch_size, model.image_size, augmentation=True)
    gen_val = InputGenerator(gt_util_val, prior_util, batch_size, model.image_size, augmentation=True)

    # weight decay
    regularizer = keras.regularizers.l2(5e-4)  # None if disabled
    for l in model.layers:
        if l.__class__.__name__.startswith('Conv'):
            l.kernel_regularizer = regularizer

    checkdir = './checkpoints/' + time.strftime('%Y%m%d%H%M') + '_' + experiment
    if not os.path.exists(checkdir):
        os.makedirs(checkdir)

    optim = keras.optimizers.Adam(lr=1e-3)

    # loss = SSDLoss(alpha=1.0, neg_pos_ratio=3.0)
    loss = SSDFocalLoss(lambda_conf=1.0, class_weights=class_weights)

    model = multi_gpu_model(model, gpus=2)
    model.compile(optimizer=optim, loss=loss.compute, metrics=loss.metrics)

    # add some callbacks
    reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
    early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)

    history = model.fit(
        gen_train.generate(),
        steps_per_epoch=gen_train.num_batches,
        epochs=epochs,
        verbose=1,
        callbacks=[
            keras.callbacks.ModelCheckpoint(checkdir + '/weights.{epoch:03d}.h5', verbose=1, save_weights_only=True,
                                            save_best_only=True, period=3),
            Logger(checkdir),
            reduce_lr,
            early_stopping
        ],
        validation_data=gen_val.generate(),
        validation_steps=gen_val.num_batches,
        class_weight=None,
        workers=1,
        use_multiprocessing=False,
        initial_epoch=initial_epoch)
@mvoelk
Copy link
Owner

mvoelk commented Feb 5, 2020

I had convergence issues with small batch size and was forced to use AdamAccumulate. The initial learning rate of 0.1 and the batch size of 128 were already suspicious to me.

The missing metricas are a known issue. They are more or less a hack and do not work with tf-keras and probably not with multi GPU either. I did not have the time to fix the tf 2 training.

See also #14 and #25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants