Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some confusion about the method of the paper #27

Open
JorunoJobana opened this issue Jun 29, 2023 · 3 comments
Open

Some confusion about the method of the paper #27

JorunoJobana opened this issue Jun 29, 2023 · 3 comments

Comments

@JorunoJobana
Copy link

大佬您好,传统梯度反向链式传播会用到上一步的梯度计算结果,但文中的方法在更新后不存储梯度,是否意味着后续梯度计算中多了重复的计算,类似时间换空间的做法。这么理解正确吗?

@QipengGuo
Copy link
Collaborator

没有重复计算,链式法则只会用到上一步的梯度结果,而不是之前所有步的梯度。上一步的还保留,但历史的梯度被动态清空了。

@JorunoJobana
Copy link
Author

JorunoJobana commented Jun 29, 2023 via email

@QipengGuo
Copy link
Collaborator

这里的概念有一些复杂,上一步的梯度还在显存中,但它其实不在G3这里存储。我们画的G3指的是.grad属性。但实际上pytorch除了.grad还在计算图中存储了梯度。这里牵扯到了pytorch的autograd graph 以及梯度的在graph上的传递过程

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants