Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Aggregator | Executor: torch_model_adapter] #243

Closed
EricDinging opened this issue Oct 31, 2023 · 5 comments
Closed

[Aggregator | Executor: torch_model_adapter] #243

EricDinging opened this issue Oct 31, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@EricDinging
Copy link
Contributor

EricDinging commented Oct 31, 2023

What happened + What you expected to happen

@fanlai0990 @AmberLJC

  1. I validated Revert "Fix argument order & renaming" #236, comparing the result of the current version, and the old version (before the PR) on FEMNIST fed-yogi task.
  • Old
    Screenshot from 2023-10-31 00-17-57
  • Current
    Screenshot from 2023-10-31 00-17-14
  1. However, I find the testing accuracy is pretty strange, for both current version and the old version.
  • Old
    Screenshot from 2023-10-31 00-20-44
  • Current
    Screenshot from 2023-10-31 00-21-04

I have a suspicion that in this line where the executor receives the model update from aggregator and just about to start testing, the yogi optimizer is executed, which is unnecessary. I think the optimizer should only be executed in the aggregator at the end of the round. However, here in executor, the optimizer is executed every time when there is a model_train or model_test event.

self.model_adapter.set_weights(model_weights)

  1. Why in previous version fed-yogi works? I had a suspicion that in some config files, such as

    - gradient_policy: yogi # {"fed-yogi", "fed-prox", "fed-avg"}, "fed-avg" by default

    it's written as "yogi" instead of "fed-yogi". As a result, in optimizers.py, the "real" optimizer is still fed-avg as there is no if statement for "yogi". So the fed-yogi bug is not exposed.

  2. In summary, I think there is still bug in executor set_weight. It works for fed-avg, but not for other optimizers. Let me know what you think.

Versions / Dependencies

#242

Reproduction script

fedscale driver start benchmark/configs/femnist/conf.yml

Issue Severity

None

@EricDinging EricDinging added the bug Something isn't working label Oct 31, 2023
@EricDinging
Copy link
Contributor Author

I tried to change the model from resnet18 to mobilenet_v2.

Screenshot from 2023-11-01 09-58-39

I also changed the server side learning rate from 0.05 to 0.01.

Screenshot from 2023-11-01 11-46-43

As you can see, the test accuracy did not improve across 100 rounds. However, the training loss did decrease from 4 to 1 roughly for mobilenet and from 4.12 to 0.62 for lr=0.01.

@fanlai0990
Copy link
Member

Okay. I think we can use fedavg to train a model and see whether the training loss decreases to 1. This helps us understand whether the training part is correct. If it decreases to 1, then we'll know there must be something wrong on the testing side.

@EricDinging
Copy link
Contributor Author

The FedAvg case I ran previously (lr = 0.05, apart from optimizer, the rest is exactly the same as above):
Screenshot from 2023-11-01 12-28-15
Screenshot from 2023-11-01 12-27-35
let me know what you think

@SISICHEN565
Copy link

Hello, I met the similar problem when I try to use Yogi in the Oort with FEMNIST. Although the training loss can decrease to 1, the test loss is still high and remains around 4, and the test accuracy can not increase and remains below 0.01.

In addition, everything looks normal without using Yogi.

Are there any bugs in the code for Yogi? Do you have any idea how to deal with it?

@EricDinging
Copy link
Contributor Author

EricDinging commented Dec 17, 2023

@SISICHEN565
Thank you for your feedback! It is a known issue and we are fixing it. Check out this pr #245
When you are running FEMNIST task with fed-yogi, you could add this setup in benchmark/configs/femnist/conf.yml. The default config might not work well for FEMNIST. Also try running for 500 rounds.

    - yogi_eta: 0.01
    - yogi_tau: 0.001
    - yogi_beta: 0.01
    - yogi_beta2: 0.99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants