Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the idea behind breaking if classes[cls] >= self.shots in metadata creation? #32

Open
Jonas-Meier opened this issue Jan 27, 2021 · 6 comments

Comments

@Jonas-Meier
Copy link

In earlier versions of metadata_coco.py, the loop over annotations of an image would always break if we find an annotation for a category we already sampled enough annotations for:

if classes[cls] >= self.shots:
break

Now, we only break in this case if we are in phase 1 and continue sampling of annotations for phase 2:
if classes[cls] >= self.shots:
if self.phase == 2:
continue
else:
break

What is generally the idea of breaking instead of continuing in this case?
Why having phase-dependent actions in the current version and why don't always continue the loop and thus look if there are some annotations of categories we haven't yet sampled enough annotations for?

@YoungXIAO13
Copy link
Owner

Hi @Jonas-Meier

When the break is used, we are able to find samples of different categories from various images, which create a large variation in the object appearance as well as the context. Although this variation might potentially benefit the generalization capacity of network, it would create an unfair data selection in the few-shot setting (as mentioned here).

That's why @Ze-Yang has re-proposed a new data selection strategy, which is designed to replace the original data selection strategy used in Meta R-CNN.

@Jonas-Meier
Copy link
Author

I understand that we want to sample annotations of many different images for maximum variance, but isn't that already ensured by

if self.phase == 1:
break

for phase 1? These lines make sure that we sample at most one annotation per image in phase 1.

Why still breaking for classes[cls] >= self.shots in phase 1?
Assume that, for example, we have categories car and person and that we are in phase 1, trying to sample 200 annotations for both categories. Now we are at a point where we already sampled 200 annotations for category car but still are missing some annotations for category person, lets say we're at 175 for that category.
If we now iterate over a new image with annotations for both categories, car and person, we would break the iteration if we got a car annotation first, since we already have enough annotations for that class. But we could still sample an annotation for category person from that image. Or is there any reason for not sampling any annotation of that image at all?

@YoungXIAO13
Copy link
Owner

I see, I think the break here is more like a mechanism preventing the out of memory issue. As the loop continues adding objects, it's likely to require a very large memory requirement without this deign, especially for large dataset set as COCO.

Briefly, you can pick as many objects as you want for phase 1 as long as the memory requirement is satisfied.

@Jonas-Meier
Copy link
Author

Do you mean the used RAM when finally saving the sampled images and masks to a file? How would a firm continue cause more RAM usage as a break? Shouldn't the memory only depend on the amount of images and annotations sampled?
I could imagine that we could save some memory when allowing to sample multiple annotations from a single image, since we would use less images which would result in smaller prn_images.pt file. The prn_mask.pt file however should be always of the same size (for same shots parameter).

@YoungXIAO13
Copy link
Owner

Yes, I mean RAM. I forgo that this condition would prevent the continuous adding of objects...

What you said makes sense, sorry for the misleading arguments in previous reply.

@Jonas-Meier
Copy link
Author

No problem. Thank you for the interesting discussion and clarification.

I finally would think that there is no reason left for using break over continue in both phases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants