Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 715 Bytes

README.md

File metadata and controls

27 lines (18 loc) · 715 Bytes

Extract, Transform and Load Articles from News Websites

How to use it:

1️⃣ Download repository.

2️⃣ Install required libraries:

pip install -r requirements.txt

3️⃣ To start scraping and the ETL process, just type on the terminal:

python pipeline.py

✅ It's done!

The script will:

  • Extract: Scrap articles from the front page of the websites:
    1. El Universal
    2. CNN en Español
  • Transform: Clean the data from empty values and enrich them with tokenization, i.e. separate the words within the title and the body for a posterior analysis.
  • Load: Load the data to a local SQLite database.