How to use it:
1️⃣ Download repository.
2️⃣ Install required libraries:
pip install -r requirements.txt
3️⃣ To start scraping and the ETL process, just type on the terminal:
python pipeline.py
✅ It's done!
The script will:
- Extract: Scrap articles from the front page of the websites:
- Transform: Clean the data from empty values and enrich them with tokenization, i.e. separate the words within the title and the body for a posterior analysis.
- Load: Load the data to a local SQLite database.