This is a scrapy pipeline that provides an easy way to store files and images using various folder structures.
Given this scraped file: 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
, you can choose the following folder structures:
Using the file name
class: scrapy-folder-tree.ImagesHashTreePipeline
full
├── 0
. ├── 5
. . ├── b
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using the crawling time
class: scrapy-folder-tree.ImagesTimeTreePipeline
full
├── 0
. ├── 11
. . ├── 48
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
Using the crawling date
class: scrapy-folder-tree.ImagesDateTreePipeline
full
├── 2022
. ├── 1
. . ├── 24
. . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg
pip install scrapy-folder-tree
Use the following settings in your project:
ITEM_PIPELINES = {
'scrapy_folder_tree.FilesHashTreePipeline': 300
}