Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync trainer state with evaluators #2733

Open
vfdev-5 opened this issue Oct 6, 2022 · 10 comments
Open

Sync trainer state with evaluators #2733

vfdev-5 opened this issue Oct 6, 2022 · 10 comments

Comments

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 6, 2022

🚀 Feature

There can be use-cases when we would like to get trainer's epoch/iteration or/and other items from trainer.state. Let's propose an API such that we could get easily trainer's state from evaluator.

Context : https://discuss.pytorch.org/t/get-current-epoch-inside-process-function-of-evaluator/162926

@louis-she
Copy link
Contributor

Many handlers/metrics provide a global_step_transform as an argument to get the steps it wants.

@jalajk24
Copy link

Can I work on this? I am pretty new to this

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Jan 30, 2023

@jalajk24 right now it is still under discussions whether we need to work on something here. Do you have any ideas or suggestions on the topic ?

@guptaaryan16
Copy link
Contributor

I am proposing a new API function for Engine class that can fetch the epoch from an instance of trainer.
It can work in this way. This can also return the current trainer epoch

def fetch_trainer_epoch(trainer: Engine):
      epoch = trainer.state.epoch
      self.state.trainer_epoch = epoch
      return epoch

@vfdev-5 does this makes sense?

It can be called like optimizer.step()

@louis-she
Copy link
Contributor

The core question of the issue is whether to abstract a trainer in ignite. It's not a good idea from what I know of ignite, or at least the core of it.

@guptaaryan16
Copy link
Contributor

Hey @louis-she ,I guess the API can be helpful to compare the performances of two or more different training methods, also it can help in training of ensemble models. I have been working in the space of the GANs and adversarial training and I have noticed that sometimes you need to combine two training methods to get better results, so this may be a helpful addition in the Engine class

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Feb 18, 2023

@guptaaryan16 can you please give a concrete example of what you are talking about ?

@guptaaryan16
Copy link
Contributor

guptaaryan16 commented Feb 18, 2023

Sure @vfdev-5 , I think it will be mostly useful for hyperparameter tuning and testing of variation of results to make the training easier; like reducing the number of epochs and testing the different training methods.

For instance, I can share a small thing happened when I was training a model using Cifar-10 and Gaussian Augmentation training(https://arxiv.org/abs/1902.02918) to measure the Average Certified Radius(ACR) of the model using Randomized smoothing. There I noticed that if I included a PGD adversarial training(https://arxiv.org/pdf/1706.06083.pdf) in addition to the Gaussian Augmentation training I can get a very high ACR, but to get the specific hyper parameters you need to get the current training epoch and see where the evaluators are getting best results. So it may be helpful to have this API but you can also get the specific epoch without having this .

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Feb 18, 2023

@guptaaryan16 thanks for details but I was wondering more about code details. Can you provide some code to highlight your idea. As for HP tuning and multiple experiments, you can check

get the specific hyper parameters you need to get the current training epoch and see where the evaluators are getting best results.

I think there is nothing impossible here. I imagine that you have a handler to run validation:

best_acr = 0.0

def run_validation():
    evaluator.run(val_data)
    metrics = evaluator.state.metrics
    if metrics["ACR"] > best_acr:
        best_acr = metrics["ACR"]
        current_epoch = trainer.state.epoch
        # save locally a bundle:
        fp = f"/path/to/output/{current_epoch}_best_acr.pt"
        torch.save({
            "best_acr": best_acr,
            "epoch": current_epoch,
            "model": model.state_dict(),
            ...
        })

@guptaaryan16
Copy link
Contributor

guptaaryan16 commented Feb 18, 2023

yes @vfdev-5 I do not have the specific code for that but I can imagine that it was written along the same lines(that project did not use ignite )
Also I was thinking about can we access the epochs directly instead using the trainer.state.epoch to trainer.epoch as it can make a bit more sense because I don't think we can have different states within the same trainer anyways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants