scrapews

A news scraper made in Python using the packages requests and lxml.

from scrapews.scrapers import NewYorkTimes


ny_scraper = NewYorkTimes()

ny_scraper.scrape()
ny_scraper.send_to_server()

print(ny_scraper.data.get('articles'))

Idea

The core ideia of the scrapews scraper is to request the HTML of a news site and extract from it, through XPath expressions, the primary information about an article, such as title, description and url.

Combining with a RESTful API service, the scraper can be used to feed a content agregator app, for example.

Check out the base_scraper class for more understanding of the code.

Instalation

First Clone this repo

git clone https://github.com/mateusvictor/scrapews.git

Change into the project directory

cd scrapews/

Create a Virtualenv in the project directory

python -m venv venv

Activate the virtualenv

venv\Scripts\activate.bat

Install the project dependencies

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
scrapews		scrapews
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapews

Idea

Instalation

About

Releases

Packages

Languages

mateusvictor/scrapews

Folders and files

Latest commit

History

Repository files navigation

scrapews

Idea

Instalation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages