Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO: also configure logging for sub-processes(not master) #106

Open
DelinQu opened this issue Oct 24, 2022 · 4 comments
Open

TODO: also configure logging for sub-processes(not master) #106

DelinQu opened this issue Oct 24, 2022 · 4 comments

Comments

@DelinQu
Copy link

DelinQu commented Oct 24, 2022

Hi victoresque,
Thanks for your hero repo! I used hydra_DDP branch to build my application, but got some problems in get_logger. Specifically, the program util.py loads the '.hydra/hydra.yaml' file from the directory, but hydra.yaml only exists in the 'output directory' such as 'outputs/2022-09-25/15-16-17' so python can't find it. I'm a little puzzled about the path of hydra.yaml. Maybe get_logger should load the /hydra.yaml from output directory? Could anyone help me! Thanks in advance!

image

(base) python train.py                     
Traceback (most recent call last):
  File "/mnt/petrelfs/qudelin/PJLAB/RS/VRS-Transformer/train.py", line 19, in <module>
    logger = get_logger("train")
  File "/mnt/petrelfs/qudelin/PJLAB/RS/VRS-Transformer/src/utils/util.py", line 19, in get_logger
    hydra_conf = OmegaConf.load('.hydra/hydra.yaml')
  File "/mnt/petrelfs/qudelin/miniconda3/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 187, in load
    with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/petrelfs/qudelin/PJLAB/RS/VRS-Transformer/.hydra/hydra.yaml'
@SunQpark
Copy link
Collaborator

Hi @DelinQu, thank you for raising this issue.

It seems that get_logger is problematic as you pointed out and I'm currently working on this. I'll let you know if there is any progress.

Also, if you are interested in using hydra_DDP branch of this repo, I would recommend using my clone version of that. I made a few commits which solves serious bug(It was not using DDP at all) a while ago on that branch, but forgot to make a PR to apply that fix to this repo. I will make PR soon, but it could take some time.

@DelinQu
Copy link
Author

DelinQu commented Oct 25, 2022

Hi @DelinQu, thank you for raising this issue.

It seems that get_logger is problematic as you pointed out and I'm currently working on this. I'll let you know if there is any progress.

Also, if you are interested in using hydra_DDP branch of this repo, I would recommend using my clone version of that. I made a few commits which solves serious bug(It was not using DDP at all) a while ago on that branch, but forgot to make a PR to apply that fix to this repo. I will make PR soon, but it could take some time.

Thanks for your replying SunQpark, I will follow your DDP! 😃

@DelinQu DelinQu closed this as completed Oct 25, 2022
@SunQpark
Copy link
Collaborator

SunQpark commented Oct 25, 2022

Oh thanks, but I'll let this issue open yet!

@SunQpark SunQpark reopened this Oct 25, 2022
@DelinQu
Copy link
Author

DelinQu commented Oct 27, 2022

Oh thanks, but I'll let this issue open yet!

Hi SunQpark,
your repo has really helped me tremendously, but I got another problem when the training process is early stopped:
image

Although it doesn't affect my models much, the error persists. My configuration file is as follows:

n_cpu: 8
n_gpu: 8
batch_size: 4
learning_rate: 0.0001
weight_decay: 0
scheduler_step_size: 50
scheduler_gamma: 0.1
status: train
trainer:
  epochs: 500
  logging_step: 100
  monitor: min loss/valid
  save_topk: 5
  early_stop: 10
  tensorboard: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants