Bug in backward pass SPO+, causing error on batched input #41

JopSchaap · 2024-09-23T12:32:55Z

Hi PyEpo team,

There seems to be an error when I call backward on a loss produced by the SPOPluss module, when the loss was called with a batched input. Take for instance the test case as defined below, here I call the SPOPluss module using a batched input. Here the initial forward pass works correctly but when I call backward it gives the following error:

ERROR: test_batched_spoplus (__main__.TestLoss)

Traceback (most recent call last):
 File "<REST_OF_PATH>/tests/test_epo_spo_pluss.py", line 36, in test_batched_spoplus
 loss.backward() # Here it errors
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
 torch.autograd.backward(
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
 _engine_run_backward(
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
 return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 301, in apply
 return user_fn(self, *args)
 File "<REST_OF_PATH>/venv/lib/python3.10/site-packages/pyepo/func/spoplus.py", line 120, in backward
 return grad_output * grad, None, None, None, None
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 1

import unittest

from pyepo.func.spoplus import SPOPlus
from pyepo.model.grb.knapsack import knapsackModel
import torch

WEIGHTS = [[0,0,0,0], [1, 1, 1, 1]]
CAPACITY = 2

class TestLoss(unittest.TestCase):

    def fix_params(self, net, output):
        with torch.no_grad():
            net.weight.fill_(0.0)
            output = torch.tensor(output, dtype=float)
            net.bias = torch.nn.Parameter(output.flatten())

    def test_batched_spoplus(self):
        model = knapsackModel(WEIGHTS, CAPACITY)
        spo_plus = SPOPlus(model)
        batched_input_data = torch.ones((2,4), dtype=float)
        net = torch.nn.Linear(4, 4, dtype=float)
        optim = torch.optim.SGD(net.parameters(),)


        self.fix_params(net, [4.0, 4.0, 1.0, 1.0])
        true_cost = torch.tensor([[0, 1, 2, 3]]*2, dtype=float)
        pred_cost = net(batched_input_data)
        true_sol = torch.tensor([[0, 0, 1, 1]]*2)
        true_obj = torch.tensor([7.0, 7.0], dtype=float)

        optim.zero_grad()
        loss = spo_plus(pred_cost, true_cost, true_sol, true_obj)

        # The code gets here correctly

        loss.backward() # Here it errors

if __name__ == "__main__":
    unittest.main()

I was able to fix this bug myself, and I think it is related to an incorrect shape in backward pass for SPO+.

To fix this bug you could change the backward function:

PyEPO/pkg/pyepo/func/spoplus.py

Lines 110 to 120 in 8cf389e

    
               def backward(ctx, grad_output): 
        
                   """ 
        
                   Backward pass for SPO+ 
        
                   """ 
        
                   w, wq = ctx.saved_tensors 
        
                   optmodel = ctx.optmodel 
        
                   if optmodel.modelSense == EPO.MINIMIZE: 
        
                       grad = 2 * (w - wq) 
        
                   if optmodel.modelSense == EPO.MAXIMIZE: 
        
                       grad = 2 * (wq - w) 
        
                   return grad_output * grad, None, None, None, None

To the following:

    @staticmethod
    def backward(ctx, grad_output: torch.Tensor):
        """
        Backward pass for SPO+
        """
        w, wq = ctx.saved_tensors
        optmodel = ctx.optmodel
        if optmodel.modelSense == EPO.MINIMIZE:
            grad = 2 * (w - wq)
        if optmodel.modelSense == EPO.MAXIMIZE:
            grad = 2 * (wq - w)
        # this line is new
        grad_output = grad_output.unsqueeze(-1).expand(grad.shape)
        return grad_output * grad, None, None, None, None

Kind regards,

Jop

The text was updated successfully, but these errors were encountered:

LucasBoTang · 2024-10-14T23:03:08Z

Hi Jop,

Thank you so much for your detailed feedback and for providing the potential fix. I’m currently looking into the issue and will continue testing it to ensure everything works correctly. In our tests, such as the one in this notebook, we haven't encountered this error. You might want to take a look at it as well to see if there are any differences in the setup that could help.

However, I’ll make sure to review your case more thoroughly and update accordingly.

Thanks again for your contribution and patience!

Best regards,
Bo

JopSchaap · 2024-11-13T15:31:08Z

Hi Bo,

Sorry for the late reply, and thank you for the response.

But after looking at the notebook you shared and looking more into my own code I realize now why it is going wrong.
It seems like the SPO+ function takes a differently sized input tensor then I assumed.

Namely, I assumed that the true_obj tensor should be 1d with shape (batch_size) this is also what I did in the test above. However, it seems that the SPO+ implementation requires the true_obj tensor to be of shape (batch_size x 1).

If you feel that this is expected behavior feel free to close this issue and the pull request.

Kind regards,
Jop

JopSchaap mentioned this issue Sep 23, 2024

Fix backward of batched SPOPlus #42

Closed

LucasBoTang added the question Further information is requested label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in backward pass SPO+, causing error on batched input #41

Bug in backward pass SPO+, causing error on batched input #41

JopSchaap commented Sep 23, 2024

LucasBoTang commented Oct 14, 2024

JopSchaap commented Nov 13, 2024

Bug in backward pass SPO+, causing error on batched input #41

Bug in backward pass SPO+, causing error on batched input #41

Comments

JopSchaap commented Sep 23, 2024

LucasBoTang commented Oct 14, 2024

JopSchaap commented Nov 13, 2024