Some confusion about the method of the paper #27

JorunoJobana · 2023-06-29T06:24:20Z

大佬您好，传统梯度反向链式传播会用到上一步的梯度计算结果，但文中的方法在更新后不存储梯度，是否意味着后续梯度计算中多了重复的计算，类似时间换空间的做法。这么理解正确吗？

QipengGuo · 2023-06-29T06:45:22Z

没有重复计算，链式法则只会用到上一步的梯度结果，而不是之前所有步的梯度。上一步的还保留，但历史的梯度被动态清空了。

JorunoJobana · 2023-06-29T07:20:32Z

非常感谢您百忙中的回复。那对于论文中的这张图，严格来说是不是前一个节点的梯度也是实心圆(In Memory)。比如在计算参数P2的梯度G2时，梯度G3也是在内存中的。感谢您不吝赐教

------------------ 原始邮件 ------------------ 发件人: "Guo ***@***.***>; 发送时间: 2023年6月29日(星期四) 下午2:45 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [OpenLMLab/LOMO] Some confusion about the method of the paper (Issue #27) 没有重复计算，链式法则只会用到上一步的梯度结果，而不是之前所有步的梯度。上一步的还保留，但历史的梯度被动态清空了。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

QipengGuo · 2023-06-29T08:27:08Z

这里的概念有一些复杂，上一步的梯度还在显存中，但它其实不在G3这里存储。我们画的G3指的是.grad属性。但实际上pytorch除了.grad还在计算图中存储了梯度。这里牵扯到了pytorch的autograd graph 以及梯度的在graph上的传递过程

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some confusion about the method of the paper #27

Some confusion about the method of the paper #27

JorunoJobana commented Jun 29, 2023

QipengGuo commented Jun 29, 2023

JorunoJobana commented Jun 29, 2023 via email

QipengGuo commented Jun 29, 2023

Some confusion about the method of the paper #27

Some confusion about the method of the paper #27

Comments

JorunoJobana commented Jun 29, 2023

QipengGuo commented Jun 29, 2023

JorunoJobana commented Jun 29, 2023 via email

QipengGuo commented Jun 29, 2023