You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's known that equivariance is implemented by employing a higher degree of weight sharing in convolution, corresponding to the endowed symmetry. More-symmetric networks (higher order symmetries) should therefore have more weight sharing and therefore less trainable parameters.
I'm noticing in my own implementation of E(2) equivariant networks that GCNNs equivariant to $D_1$ have significantly less parameters than an identically constructed $D_{16}$ network. I'm wondering why this is? Naively, it should be the case that the $D_{16}$ has less parameters than $D_1$, because there is significantly more parameter sharing. This is concerning to me cause then it could be argued that the performance bump of a higher-order network such as $D_{16}$ can be attributed to more trainable parameters, rather than exploiting more symmetry.
Is there something in the backend of escnn / e2cnn that I'm not aware of? Is counting trainable parameters in PyTorch the usual way not appropriate for these GCNNs? Is my theoretical understanding of group convolution incorrect?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
It's known that equivariance is implemented by employing a higher degree of weight sharing in convolution, corresponding to the endowed symmetry. More-symmetric networks (higher order symmetries) should therefore have more weight sharing and therefore less trainable parameters.
I'm noticing in my own implementation of E(2) equivariant networks that GCNNs equivariant to$D_1$ have significantly less parameters than an identically constructed $D_{16}$ network. I'm wondering why this is? Naively, it should be the case that the $D_{16}$ has less parameters than $D_1$ , because there is significantly more parameter sharing. This is concerning to me cause then it could be argued that the performance bump of a higher-order network such as $D_{16}$ can be attributed to more trainable parameters, rather than exploiting more symmetry.
Is there something in the backend of escnn / e2cnn that I'm not aware of? Is counting trainable parameters in PyTorch the usual way not appropriate for these GCNNs? Is my theoretical understanding of group convolution incorrect?
Beta Was this translation helpful? Give feedback.
All reactions