IMDB Scraper

Overview

This is a fork of the original IMDB Scraper repo.

This is a Scrapy project which can be used to crawl IMDB website to scrape movies' information and then store the data in json format or/and save them in an elasticsearch index.

Configuration

Search query

You can set the search queries in a json config file config/queries.json. You can get your own query from here: imdb.com/search/title.

ElasticSearch

You can store scraped info in elasticsearch, just enable the pipeline in the ITEM_PIPELINE dict in config/scrapy.py (enabled by default) and set the following env vars:

ES_HOST, ES_PORT, ES_USERNAME, ES_SECRET, ES_INDEX

JSON Output

If you enable the FEED_URI and FEED_FORMAT settings in config/scrapy.py, data will be stored in json file named movie.json located at IMDB-Scraper/imdb-scraper/data/movie.json.

Getting started

Clone the repo and navigate into IMDB-Scraper folder.

$ git clone https://github.com/dojutsu-user/IMDB-Scraper.git
$ cd IMDB-Scraper/

Create and activate a virtual environment.

(IMDB-Scraper) $ pipenv shell

Install all dependencies.

(IMDB-Scraper) $ pipenv install

Navigate into imdb_scraper folder.

(IMDB-Scraper) $ cd imdb_scraper/

Start the crawler.

(IMDB-Scraper) $ scrapy crawl movie

Disclaimer

The project and the obtained dataset is intended only for educational purpose. It is completely open source and has no value intentions to commercialise complete or any part of the same. The developer is on no part the owner of any resources used and does not claim to hold the permissions to use the project.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
imdb_scraper		imdb_scraper
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDB Scraper

Overview

Configuration

Search query

ElasticSearch

JSON Output

Getting started

Disclaimer

About

Releases

Packages

Languages

License

otto-torino/IMDB-Scraper

Folders and files

Latest commit

History

Repository files navigation

IMDB Scraper

Overview

Configuration

Search query

ElasticSearch

JSON Output

Getting started

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages