Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer.fit is missing the optimizers parameter #2503

Open
antoinebrl opened this issue Sep 1, 2023 · 3 comments
Open

Trainer.fit is missing the optimizers parameter #2503

antoinebrl opened this issue Sep 1, 2023 · 3 comments
Assignees
Labels
enhancement New (engineering) enhancements, such as features or API changes.

Comments

@antoinebrl
Copy link
Contributor

Hello,
The .fit() method of the Trainer is missing the optimizers parameters although it is part of its documentation.

It would be nice to keep the optimizers and the schedulers together. As the schedulers depend on the dataloaders I would argue the .fit() method should take the optimizers too as input.

@antoinebrl antoinebrl added the bug Something isn't working label Sep 1, 2023
@mvpatel2000
Copy link
Contributor

This is a good suggestion! We'd love to get a community PR on this issue :) otherwise, we will add to our roadmap though unfortunately it might not be done in the immediate future

@mvpatel2000 mvpatel2000 self-assigned this Sep 1, 2023
@wlrd
Copy link

wlrd commented Sep 5, 2023

@antoinebrl @mvpatel2000 once we pass the optimizers into the .fit() method, what do we want to do with them? I am happy to take a look and help out on this if I can get some guidance on how I use the optimizers here

@mvpatel2000
Copy link
Contributor

@wlrd Sure! In short, you'll need to duplicate the setup we do in def __init__ for setting up optimizers starting here:

https://github.com/mosaicml/composer/blob/dev/composer/trainer/trainer.py#L988

I believe this includes just updating State to use the new optimizer.

The next hard step is adding support for DeepSpeed and FSDP. You can see the DeepSpeed step here: https://github.com/mosaicml/composer/blob/dev/composer/trainer/trainer.py#L1004

For FSDP, it becomes much more complicated.... I think you can maybe just recreate the optimizer and repoint it to the model. I'm happy to scope this part out in more detail later.

I would recommend a series of 3 PRs:

  1. Add optimizers for no parallelism and DDP case (just update State?) along with a unit test verifying it works. Raise a ValueError if DeepSpeed or FSPD are enabled and say it's not supported yet
  2. Add DeepSpeed support
  3. Add FSDP support

@hanlint hanlint added enhancement New (engineering) enhancements, such as features or API changes. and removed bug Something isn't working labels Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New (engineering) enhancements, such as features or API changes.
Projects
None yet
Development

No branches or pull requests

4 participants