-
Notifications
You must be signed in to change notification settings - Fork 0
carloscosta/hacksena
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a scrapy project I did for learning. The spider downloads and scrap Brazilian Lottery's results. Each lottery results is published as HTML files inside a .zip file. This project will teach you how to use scrapy to process .zip files, how to organize multiples pipelines and how to produce JSON output from Items() The URLs of each lottery and the spider code itself is defined in hacksena/spiders/hacksena_spider.py The Spider yields a FiledownloadItem(Item) defined in hacksena/items.py Then 3 different pipelines are executed in a given sequence, check hacksena/settings.py to see the definitions. The sequence is: 1) HacksenaPipeline(object): This pipeline extract the zip file content (the HTML file), returning Item() with file_data = Selector() 2) ResultsPipeline(object): This pipeline grab the HTML extracted previously, use the Selector() to extratec the relevant parts and returns ResultItem() in preparition to write down a JSON file 3) JsonWriterPipeline(object): This pipeline effective write down the JSON output I hope you enjoy scrapy too as I am enjoying it :) Carlos.
About
Learning Scrapy by Doing Spiders
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published