You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm extremely excited about support for large scale/fast I/O in PyTorch. I am trying to run the example and downloaded ImageNet. As you might be aware, ImageNet is no longer available for download from http://www.image-net.org/download and is now hosted at Kaggle. I downloaded the dataset, but it seems there's a change in the format from the previous version and can no longer be loaded with PyTorch's inbuilt Dataset class. This leads to errors in creating shards.
Here's the error I get:-
The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in ./data
The structure of the downloaded dataset contains:-
.
├── Annotations
│ └── CLS-LOC
│ ├── train
│ └── val
├── Data
│ └── CLS-LOC
│ ├── test
│ ├── train
│ └── val
└── ImageSets
└── CLS-LOC
├── test.txt
├── train_cls.txt
├── train_loc.txt
└── val.txt
Can we come up with a work-around which works out of the box with the current distribution of ImageNet? The original PyTorch ImageNet example works with it as we only need the image files. I think the error originates from the parsing of metadata while making shards, so a workaround should be possible I think. Happy to help with this.
Best,
Spandan
The text was updated successfully, but these errors were encountered:
Spandan-Madan
changed the title
ImageNet downloaded from kaggle does not contain the right format
change in ImageNet format after being hosted on kaggle.
Apr 18, 2021
I found the solution - we can fall back on the ImageFolder dataset class that comes inbuilt with PyTorch. The ImageNet class inherits from this anyway, and the problem can be easily solved with this fix.
Happy to create a Pull Request with this fix. Let me know!
Hi,
I'm extremely excited about support for large scale/fast I/O in PyTorch. I am trying to run the example and downloaded ImageNet. As you might be aware, ImageNet is no longer available for download from
http://www.image-net.org/download
and is now hosted at Kaggle. I downloaded the dataset, but it seems there's a change in the format from the previous version and can no longer be loaded with PyTorch's inbuilt Dataset class. This leads to errors in creating shards.Here's the error I get:-
The structure of the downloaded dataset contains:-
Can we come up with a work-around which works out of the box with the current distribution of ImageNet? The original PyTorch ImageNet example works with it as we only need the image files. I think the error originates from the parsing of metadata while making shards, so a workaround should be possible I think. Happy to help with this.
Best,
Spandan
The text was updated successfully, but these errors were encountered: