Question about pooling with Fourier representations in R^3 #65

kalekundert · 2023-07-12T19:35:38Z

kalekundert
Jul 12, 2023

I'm hoping that I can get some feedback on the best way to do pooling operations when using Fourier representations in $\mathbb{R}^3$. Every way that I can think of to do this seems pretty flawed:

Most of the Pointwise* pooling modules explicitly state that "not all representations support this kind of pooling. In general, only representations which support pointwise non-linearities do." This makes sense to me, since you wouldn't expect a pointwise operation to maintain equivariance when applied to a Fourier representation, so I'm assuming that all of these modules are off-the-table.
The PointwiseAvgPoolAntialiased3D module doesn't contain the above warning, surprisingly. It's also used in the se3_3Dcnn.py example, which does seem to maintain rotational equivariance, and I think that it might be the algorithm used by [Weiler2018]. That said, this is still a pointwise algorithm, so it shouldn't be equivariant when applied to Fourier representations. And in my own tests, it's not even close to maintaining equivariance. I guess the thing I really don't understand here is how/why the SE(3) example maintains equivariance despite using this module. Is it something to do with the residual blocks? Is the SE(3) example not equivariant, but the error is just small?
The NormMaxPool module only works in $\mathbb{R}^2$, although this is presumably just an implementation detail and not a theoretical limitation. That said, my understanding is that max pooling breaks translational equivariance unless there's an antialiasing step [Zhang2019], so this wouldn't be an ideal solution even if it worked.
R3Conv(stride=2) can be thought of as a kind of pooling, but it also breaks translational equivariance.

On this topic, I'm also hoping I can get some feedback on an idea I had to do pooling in the spatial domain, similar to the way that FourierPointwise works:

Choose a grid on which to evaluate the inverse Fourier transform of the filters.
Perform an inverse Fourier transform for each voxel in the input layer using this grid.
Apply any pointwise pooling algorithm to the resulting image.
- Presumably a pointwise nonlinearity could be applied at the same time.
Perform a Fourier transform to produce the output layer.

Some specific questions I have about this algorithm:

Would this approach be expected to do a better job of maintaining equivariance than any of the existing options? It's possible both that I'm overlooking some flaw in this algorithm, and/or not understanding the capabilities of the existing algorithms.
Is there anything about this approach that seems prohibitively expensive? I'd guess that it'd be about as expensive as FourierPointwise, but I'm not sure.
If this does seem like a worthwhile algorithm, is there any chance I can get help implementing it? I think I might be able to do it myself, but it would be right at the edge of my abilities.

Edit: I accidentally posted this discussion before I was finished writing it, so I apologize if that caused any confusion.

Answered by Gabri95

Jul 13, 2023

Hi @kalekundert

Thanks for opening this discussion! I think this can be useful for many other users

First, note that all pointwise average pooling layers support any representations. Only max-pooling and norm max pooling layers have specific requirements.
The reason why average pooling (theoretically) works, is that it is just a convolution with an isotropic filter.
Indeed, these isotropic filters actually belong to the space of equivariant kernels which are learnable by any RdConv layer.

While the standard average pooling can be unstable to continuous rotations (it uses box-filters which are equivariant to 90deg rotations but are not really isotropic in all directions), the antialiased l…

View full answer

Gabri95 · 2023-07-13T13:45:05Z

Gabri95
Jul 13, 2023
Maintainer

Hi @kalekundert

Thanks for opening this discussion! I think this can be useful for many other users

First, note that all pointwise average pooling layers support any representations. Only max-pooling and norm max pooling layers have specific requirements.
The reason why average pooling (theoretically) works, is that it is just a convolution with an isotropic filter.
Indeed, these isotropic filters actually belong to the space of equivariant kernels which are learnable by any RdConv layer.

While the standard average pooling can be unstable to continuous rotations (it uses box-filters which are equivariant to 90deg rotations but are not really isotropic in all directions), the antialiased layers use wider and smoother isotropic filters (Gaussian filters sampled on wider grids) so they are more stable.

That being said, discretisation always play an important role in practice. No downsampling operation will even be perfectly equivariant to continuous rotations but the antialiasing operator reduces the artefacts which deteriorate equivariance.

Regarding your proposed pooling layer, I think that design make sense!
That allows to construct something like a max pooling layer for continuous groups.
However, note that if you just apply average pooling (regardless of antialiasing), your solution is essentially equivalent to just using average pooling over the Fourier features (the Fourier /Inverse Fourier transforms FT/IFT will commute with the averaging operation).

If I remember correctly, a similar pooling strategy was also used in some other works, maybe this paper for processing point clouds.

Let me answer your questions now:

Would this approach be expected to do a better job of maintaining equivariance than any of the existing options? It's possible both that I'm overlooking some flaw in this algorithm, and/or not understanding the capabilities of the existing algorithms.

I expect this operation will actually reduce the equivariance: the max operation is a very non-linear operation which introduces yet another source of discretisation/aliasing. To reduce this problem, you will need enough samples during the inverse FT. Simple average pooling, instead, is theoretically equivariant even without applying the IFT/FT so it is only affected by the discretization of the filter (and not the discretization of the IFT/FT).

Note that this does not necessary mean this layer is not useful! Max pooling is probably more expressive and some small equivariance error might be tolerable. In this case, this layer might provide a better trade off than average pooling. However, this is a question which is better answered empirically.

Is there anything about this approach that seems prohibitively expensive? I'd guess that it'd be about as expensive as FourierPointwise, but I'm not sure.

That's correct. It will probably be more expensive than a normal average pooling but not more expensive than the non-linearities which are already being used in the network. Assuming one typically uses more non-linear layers than downsampling layers, I don't expect this to be prohibitively expensive.

If this does seem like a worthwhile algorithm, is there any chance I can get help implementing it?

Clearly! 😄
I'd recommend to just start from the code of FourierPointwise and just replace the non-linear activation in its forward pass with a downsampling operation. This is pretty much everything you need to do (expect maybe writing some docs and unittests).

I hope this answer your questions and let me know if you want to try to implement this module!

Thanks again for the interesting question :)

Best,
Gabriele

2 replies

kalekundert Jul 14, 2023
Author

Thanks for the very helpful reply! I have a couple follow-up questions, and they ended up being kinda long, so I wanted to say in advance that I really appreciate any time you take reading and answering them. Don't feel obligated to reply to everything, either. All of these questions are just for my own understanding and aren't preventing me from using the library.

It's a good point that the average pooling operations are just convolutions with isotropic filters; I wasn't thinking of them like that, but it makes sense. That said, it's still not obvious to me that isotropic filters should necessarily maintain equivariance. I tried to arrive at this result from the ideas presented in [Weiler2018], but ended up getting stuck. I'm hoping you can show me how to proceed.

I understand that any convolutional kernel satisfying the "G-steerability constraint" will maintain equivariance:

$$ \kappa(rx) = \rho_2(r) \kappa(x) \rho_1(r)^{-1} $$

In the case that the input and output fields are both scalar, then $\rho_1$ and $\rho_2$ are both trivial representations, and the above constraint reduces to $\kappa(rx) = \kappa(x)$. Clearly this form of the constraint is satisfied (only) by isotropic kernels.

However, the case where the input and output fields can both be treated as Fourier coefficients is more complicated. In this case, the fibers are transformed by irreducible representations of SO(3), which are the Wigner-D matrices $D$, so the constraint becomes:

$$ \kappa^{jl}(rx) = D^j(r) \kappa^{jl}(x) D^{l}(r)^{-1} $$

Here, the input fibers are in $\mathbb{R}^{2l + 1}$ and the output fibers are in $\mathbb{R}^{2j+1}$. The kernel is defined as $\kappa \colon \mathbb{R}^3 \mapsto \mathbb{R}^{2j+1} \times \mathbb{R}^{2l+1}$. To arrive at a solution to this constraint, we can begin by vectorizing the kernel:

$$ \mathrm{vec}(\kappa^{jl}(rx)) = [D^j \otimes D^l](r) \mathrm{vec}(\kappa^{jl}(x)) $$

$D^j \otimes D^l$ is a direct product of irreducible representations, but is not irreducible itself. We can get an irreducible representation by applying a change of basis matrix $Q$ derived from the Clebsh-Gordon coefficients:

$$ [D^j \otimes D^l](r) = Q^T \left[ \bigoplus_{J=|j-l|}^{j+l} D^J(r) \right] Q $$

If we define $\eta^{jl}(x) \coloneqq Q \; \mathrm{vec}(\kappa^{jl}(x))$, then the steerability constraint becomes:

$$ \eta^{jl}(rx) = \left[ \bigoplus_{J=|j-l|}^{j+l} D^J(r) \right] \eta^{jl}(x) $$

The solution to this equation is recognized as being a direct sum of the spherical harmonics. It follows that any linear combination of the spherical harmonics can be made into an equivariant kernel.

I can imagine that if you wanted to make an isotropic kernel, you would start with just $\eta^{jl} = Y_0^0$. But then you'd have to undo the change of basis and reshape the resulting vector back into a tensor. It's not clear to me that the result of those operations would be a tensor that is isotropic in all of the spatial dimensions and constant in the channel dimension. Is this something that follows from the particular properties of the Clebsh-Gordon coefficients, or is there perhaps an easier way to think about this?

I also tried using BlockBasisExpansion to explicitly calculate the convolutional kernel corresponding to the $Y^0_0$ spherical harmonic, but I got stuck with that, too. Here the snippet I came up with:

import torch
from escnn.gspaces import rot3dOnR3
from escnn.nn import FieldType, R3Conv

gspace = rot3dOnR3()
group = gspace.fibergroup

irreps = group.bl_irreps(1)
fourier_repr = group.spectral_regular_representation(*irreps)

in_type = FieldType(gspace, [fourier_repr])
out_type = FieldType(gspace, [fourier_repr])

conv = R3Conv(in_type, out_type, 3)
be = conv.basisexpansion

w = torch.zeros(44)
w[0] = 1

print(be.forward(w))  # All zeros.

My understanding is that the inputs to be.forward() are the coefficients for the linear combination of spherical harmonics. So I reasoned that $(1, 0, 0, \cdots)$ would select just the first, isotropic harmonic and none of the others. That said, I only expected to give 9 coefficients, because $j = l = 1$ and so $\displaystyle \bigoplus_{J=|j-l|}^{j+l} D^J(r)$ leads to a direct sum of 1 $Y^m_0$ harmonic, 3 $Y^m_1$ harmonics, and 5 $Y^m_2$ harmonics. But 44 coefficients were required, and I couldn't figure out where that number came from. Maybe I'm neglecting the radial component of the kernel?

If it's not too hard, I'd appreciate if you could explain what each position in the w vector represents, and show how to calculate a kernel based only on the $Y^0_0$ harmonic.

Thanks for your comments on the Fourier pooling idea. I'll admit that I don't understand why average pooling commutes with the FT/IFT, but I haven't given it much thought yet and I'm willing to just accept it. (As you can probably tell, I spent all my time trying to wrap my mind around the isotropic kernels thing.) I agree with you about max-pooling being more susceptible to discritization effects, but potentially more expressive. Like you say, I'd need to do an empirical comparison to see which approach is best for my application.

The implementation also sounds much easier than I thought it'd be. I thought I'd have to implement the FT/IFT myself, but just replacing the nonlinear activation in FourierPointwise sounds much more manageable. No guarantees that I'll get around to it any time soon, but it's definitely something I'd like to try.

Gabri95 Jul 18, 2023
Maintainer

hey @kalekundert

I'm happy this was helpful!
Let me try to quickly reply your questions:

Consider the kernel constraint:

$$ \kappa(rx) = \rho_2(r) \kappa(x) \rho_1(r)^{-1} $$

Note that in average pooling we don't mix different channels, so $\rho_1 = \rho_2$.
An isotropic convolution then is just a filter of the form $\kappa(x) = r(||x||) I$ where $r(||x||)$ is a function only of the radius and $I$ is the identity matrix.
You ca verify this kernel satisfies the constraint for any $\rho_1 = \rho_2$, with no need to go deeper into spherical harmonics.

If it's not too hard, I'd appreciate if you could explain what each position in the w vector represents, and show how to calculate a kernel based only on the $Y^0_0$ harmonic.

You can easily see that using the BasisManager api.
Try this code!

conv = R3Conv(in_type, out_type, 3)
be = conv.basisexpansion

for i, attr in enumerate(be.get_basis_info()):
    print(f'weight w[{i}]:')
    print(attr)

The implementation also sounds much easier than I thought it'd be. I thought I'd have to implement the FT/IFT myself, but just replacing the nonlinear activation in FourierPointwise sounds much more manageable. No guarantees that I'll get around to it any time soon, but it's definitely something I'd like to try.

Happy to hear this! Let me know if you try to implement it and if you need any help!

Best,
Gabriele

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about pooling with Fourier representations in R^3 #65

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Question about pooling with Fourier representations in R^3 #65

kalekundert Jul 12, 2023

Replies: 1 comment · 2 replies

Gabri95 Jul 13, 2023 Maintainer

kalekundert Jul 14, 2023 Author

Gabri95 Jul 18, 2023 Maintainer

kalekundert
Jul 12, 2023

Replies: 1 comment 2 replies

Gabri95
Jul 13, 2023
Maintainer

kalekundert Jul 14, 2023
Author

Gabri95 Jul 18, 2023
Maintainer