You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I often want to resume training from a checkpoint with --load-model. When I do that, I don't want to lose all the information in the log and metrics.csv files. The obvious way to do that is to create a new log directory for the continuation and use --log-dir and --redirect to tell it to put all new files in the new directory. But it doesn't work. Instead it ignores those options and uses the same log directory as the original training run, deleting and overwriting the existing logs in the process. To prevent that, you first need to copy your existing log directory to a new location. I've several times lost work by forgetting to do that.
How about making it so that --load-model does not override --log-dir and --redirect? That's just telling it what model to load. It wouldn't prevent you from saving logs to a different directory.
The text was updated successfully, but these errors were encountered:
I'm not sure why overwriting log_dir doesn't work properly when load_model is set. The arguments are parsed in the order as they are defined in train.py. Since --load-model is the first argument there, the loaded model's hparams should always be overwritten by specifying further arguments via config file or CLI.
namespace contains the options that were passed in. That line overwrites them with ones from the checkpoint, before they've yet had a chance to be processed.
I somehow thought that arguments are parsed in the order in which they are defined in the code but a quick test showed that this is clearly not true. So yes, that line is the problem. We should probably only update the namespace with arguments that were not specified by the user.
I often want to resume training from a checkpoint with
--load-model
. When I do that, I don't want to lose all the information in thelog
andmetrics.csv
files. The obvious way to do that is to create a new log directory for the continuation and use--log-dir
and--redirect
to tell it to put all new files in the new directory. But it doesn't work. Instead it ignores those options and uses the same log directory as the original training run, deleting and overwriting the existing logs in the process. To prevent that, you first need to copy your existing log directory to a new location. I've several times lost work by forgetting to do that.How about making it so that
--load-model
does not override--log-dir
and--redirect
? That's just telling it what model to load. It wouldn't prevent you from saving logs to a different directory.The text was updated successfully, but these errors were encountered: