The performance of my model becomes much worse when using ACA backpropagation in DataParallel wrapper. #3

wadx2019 · 2023-09-18T14:41:54Z

Hello, I am a graduate student. I tried to execute my project based on torch_ACA solver in multiple gpus with DP wrapper recently. However, I found that the performance will decrease much compared with that in a single gpu, while the naive backpropagation still works. Can you give me some instructions or possible reasons?

juntang-zhuang · 2023-10-28T20:30:23Z

I believe it's because there's no proper error tolerance or grad reduce operation in the case of data parallel if you have a distributed setup. Sorry I did not wrote that since I don't have much machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance of my model becomes much worse when using ACA backpropagation in DataParallel wrapper. #3

The performance of my model becomes much worse when using ACA backpropagation in DataParallel wrapper. #3

wadx2019 commented Sep 18, 2023

juntang-zhuang commented Oct 28, 2023

The performance of my model becomes much worse when using ACA backpropagation in DataParallel wrapper. #3

The performance of my model becomes much worse when using ACA backpropagation in DataParallel wrapper. #3

Comments

wadx2019 commented Sep 18, 2023

juntang-zhuang commented Oct 28, 2023