Spider

This repo comprises a scrapper written in Python, which supports scraping indeed.com for now.

Features

It takes two files as input. one for the job list and another for locations. Depth of crawling must be specified else it will set the default depth to 3.

It is a multi-threaded scrapper. for each querry it initialises a thread.
Provides inbuilt Proxy pool API, IP rotater.
Requests are made at random interval for real user simulation.
Data is stored in pandas dataframe and dumped into a json file.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
APIs		APIs
test_files		test_files
.gitignore		.gitignore
README.md		README.md
indeed_crawler.py		indeed_crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider

Features

About

Releases

Packages

Languages

AyushSenapati/Spider

Folders and files

Latest commit

History

Repository files navigation

Spider

Features

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages