Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some problems #67

Open
TruongKhang opened this issue Feb 18, 2020 · 5 comments
Open

Some problems #67

TruongKhang opened this issue Feb 18, 2020 · 5 comments

Comments

@TruongKhang
Copy link

Hi, I used your template and I had some problems as follows:

  1. In trainer.py, at line 53, the command met(output, target) should be changed to met(output, target).item(). If we don't, the memory space will increase after each batch_size. I tested with PyTorch 0.4.1 and realized this problem.
  2. In my opinion, the computed evaluation metrics are not correct. Should we add the batch_size to the update function like this:
# at line 53
self.train_metrics.update(met.__name__, met(output, target).item(), n=batch_size)

How do you think about that?

@SunQpark
Copy link
Collaborator

  1. That part of the trainer assumes that metric functions return python scalar, not a tensor. If some of your metric functions return tensor, OOM may happen since computation graph will continue to extend from that returned tensor. Hence, call .item() inside your metric function.
  2. That result of using n=1 yield wrong average metric, only when the batch size is not a constant. If it's constant, n just cancels out during calculation. In the example MNIST code, average metric computation will be "slightly" wrong because the last batch of each epoch has different size than other batches. Your fix on this part is correct, when that batch_size comes from current data like batch_size=output.shape[0], not from global configuration like batch_size=data_loader.batch_size

@TruongKhang
Copy link
Author

yes, I agree.

@sunlightsgy
Copy link

I think it should be:

outputs, targets = [], []
with torch.no_grad():
    for batch_idx, (data, target) in enumerate(self.valid_data_loader):
        data, target = data.to(self.device), target.to(self.device)

        output = self.model(data)
        pred = torch.argmax(output, dim=1)
        loss = self.criterion(output, target)
        outputs.append(output.detach())
        targets.append(target.detach())
        self.writer.set_step((epoch - 1) * len(self.valid_data_loader) + batch_idx, 'valid')
        self.valid_metrics.update('loss', loss.item())
        
        self.writer.add_image('input', make_grid(data.cpu(), nrow=8, normalize=True))
    outputs = torch.cat(outputs, dim=0)
    targets = torch.cat(targets, dim=0)
    for met in self.metric_ftns:
        self.valid_metrics.update(met.__name__, met(outputs, targets))

@TruongKhang
Copy link
Author

@sunlightsgy , I get your point. Assume that we have a dataset size N, each image has size M x L, the batch_size is B. If N%B = 0, the results of two approaches will be the same, right? But it is not true if N%B != 0.

@sunlightsgy
Copy link

@sunlightsgy , I get your point. Assume that we have a dataset size N, each image has size M x L, the batch_size is B. If N%B = 0, the results of two approaches will be the same, right? But it is not true if N%B != 0.

If the metric is accuracy, what you said is right. Other metrics may fail as long as len(target) is not fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants