Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in backward pass SPO+, causing error on batched input #41

Open
JopSchaap opened this issue Sep 23, 2024 · 2 comments
Open

Bug in backward pass SPO+, causing error on batched input #41

JopSchaap opened this issue Sep 23, 2024 · 2 comments
Labels
question Further information is requested

Comments

@JopSchaap
Copy link

Hi PyEpo team,

There seems to be an error when I call backward on a loss produced by the SPOPluss module, when the loss was called with a batched input. Take for instance the test case as defined below, here I call the SPOPluss module using a batched input. Here the initial forward pass works correctly but when I call backward it gives the following error:

ERROR: test_batched_spoplus (__main__.TestLoss)

Traceback (most recent call last):
 File "<REST_OF_PATH>/tests/test_epo_spo_pluss.py", line 36, in test_batched_spoplus
 loss.backward() # Here it errors
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
 torch.autograd.backward(
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
 _engine_run_backward(
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
 return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 301, in apply
 return user_fn(self, *args)
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/pyepo/func/spoplus.py", line 120, in backward
 return grad_output * grad, None, None, None, None
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 1
import unittest

from pyepo.func.spoplus import SPOPlus
from pyepo.model.grb.knapsack import knapsackModel
import torch

WEIGHTS = [[0,0,0,0], [1, 1, 1, 1]]
CAPACITY = 2

class TestLoss(unittest.TestCase):

    def fix_params(self, net, output):
        with torch.no_grad():
            net.weight.fill_(0.0)
            output = torch.tensor(output, dtype=float)
            net.bias = torch.nn.Parameter(output.flatten())

    def test_batched_spoplus(self):
        model = knapsackModel(WEIGHTS, CAPACITY)
        spo_plus = SPOPlus(model)
        batched_input_data = torch.ones((2,4), dtype=float)
        net = torch.nn.Linear(4, 4, dtype=float)
        optim = torch.optim.SGD(net.parameters(),)


        self.fix_params(net, [4.0, 4.0, 1.0, 1.0])
        true_cost = torch.tensor([[0, 1, 2, 3]]*2, dtype=float)
        pred_cost = net(batched_input_data)
        true_sol = torch.tensor([[0, 0, 1, 1]]*2)
        true_obj = torch.tensor([7.0, 7.0], dtype=float)

        optim.zero_grad()
        loss = spo_plus(pred_cost, true_cost, true_sol, true_obj)

        # The code gets here correctly

        loss.backward() # Here it errors

if __name__ == "__main__":
    unittest.main()

I was able to fix this bug myself, and I think it is related to an incorrect shape in backward pass for SPO+.

To fix this bug you could change the backward function:

def backward(ctx, grad_output):
"""
Backward pass for SPO+
"""
w, wq = ctx.saved_tensors
optmodel = ctx.optmodel
if optmodel.modelSense == EPO.MINIMIZE:
grad = 2 * (w - wq)
if optmodel.modelSense == EPO.MAXIMIZE:
grad = 2 * (wq - w)
return grad_output * grad, None, None, None, None

To the following:

    @staticmethod
    def backward(ctx, grad_output: torch.Tensor):
        """
        Backward pass for SPO+
        """
        w, wq = ctx.saved_tensors
        optmodel = ctx.optmodel
        if optmodel.modelSense == EPO.MINIMIZE:
            grad = 2 * (w - wq)
        if optmodel.modelSense == EPO.MAXIMIZE:
            grad = 2 * (wq - w)
        # this line is new
        grad_output = grad_output.unsqueeze(-1).expand(grad.shape)
        return grad_output * grad, None, None, None, None

Kind regards,

Jop

@LucasBoTang
Copy link
Collaborator

Hi Jop,

Thank you so much for your detailed feedback and for providing the potential fix. I’m currently looking into the issue and will continue testing it to ensure everything works correctly. In our tests, such as the one in this notebook, we haven't encountered this error. You might want to take a look at it as well to see if there are any differences in the setup that could help.

However, I’ll make sure to review your case more thoroughly and update accordingly.

Thanks again for your contribution and patience!

Best regards,
Bo

@LucasBoTang LucasBoTang added the question Further information is requested label Oct 17, 2024
@JopSchaap
Copy link
Author

Hi Bo,

Sorry for the late reply, and thank you for the response.

But after looking at the notebook you shared and looking more into my own code I realize now why it is going wrong.
It seems like the SPO+ function takes a differently sized input tensor then I assumed.

Namely, I assumed that the true_obj tensor should be 1d with shape (batch_size) this is also what I did in the test above. However, it seems that the SPO+ implementation requires the true_obj tensor to be of shape (batch_size x 1).

If you feel that this is expected behavior feel free to close this issue and the pull request.

Kind regards,
Jop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants