Question about pooling with Fourier representations in R^3 #65
-
I'm hoping that I can get some feedback on the best way to do pooling operations when using Fourier representations in
On this topic, I'm also hoping I can get some feedback on an idea I had to do pooling in the spatial domain, similar to the way that
Some specific questions I have about this algorithm:
Edit: I accidentally posted this discussion before I was finished writing it, so I apologize if that caused any confusion. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @kalekundert Thanks for opening this discussion! I think this can be useful for many other users First, note that all pointwise average pooling layers support any representations. Only max-pooling and norm max pooling layers have specific requirements. While the standard average pooling can be unstable to continuous rotations (it uses box-filters which are equivariant to 90deg rotations but are not really isotropic in all directions), the antialiased layers use wider and smoother isotropic filters (Gaussian filters sampled on wider grids) so they are more stable. That being said, discretisation always play an important role in practice. No downsampling operation will even be perfectly equivariant to continuous rotations but the antialiasing operator reduces the artefacts which deteriorate equivariance. Regarding your proposed pooling layer, I think that design make sense! If I remember correctly, a similar pooling strategy was also used in some other works, maybe this paper for processing point clouds. Let me answer your questions now:
I expect this operation will actually reduce the equivariance: the max operation is a very non-linear operation which introduces yet another source of discretisation/aliasing. To reduce this problem, you will need enough samples during the inverse FT. Simple average pooling, instead, is theoretically equivariant even without applying the IFT/FT so it is only affected by the discretization of the filter (and not the discretization of the IFT/FT). Note that this does not necessary mean this layer is not useful! Max pooling is probably more expressive and some small equivariance error might be tolerable. In this case, this layer might provide a better trade off than average pooling. However, this is a question which is better answered empirically.
That's correct. It will probably be more expensive than a normal average pooling but not more expensive than the non-linearities which are already being used in the network. Assuming one typically uses more non-linear layers than downsampling layers, I don't expect this to be prohibitively expensive.
Clearly! 😄 I hope this answer your questions and let me know if you want to try to implement this module! Thanks again for the interesting question :) Best, |
Beta Was this translation helpful? Give feedback.
Hi @kalekundert
Thanks for opening this discussion! I think this can be useful for many other users
First, note that all pointwise average pooling layers support any representations. Only max-pooling and norm max pooling layers have specific requirements.
The reason why average pooling (theoretically) works, is that it is just a convolution with an isotropic filter.
Indeed, these isotropic filters actually belong to the space of equivariant kernels which are learnable by any RdConv layer.
While the standard average pooling can be unstable to continuous rotations (it uses box-filters which are equivariant to 90deg rotations but are not really isotropic in all directions), the antialiased l…