News Scraper Tutorial

Overview

Welcome to the News Scraper tutorial repository! This repository serves as a comprehensive guide for web scraping using BeautifulSoup and Selenium. In this tutorial, you will learn how to scrape news from websites like detik[dot]com.

Additionally, we'll focus on scraping news related to Indonesia's presidential candidates for 2024, using keywords such as "anies baswedan", "prabowo subianto", and "ganjar pranowo".

Repository Structure

img/
img_save/
notebook/
- static_web.ipynb: Jupyter notebook for scraping news from traditional static websites.
- dynamic_web.ipynb: Jupyter notebook for scraping news from dynamic web applications.
deployment_script/: Contains scripts and files for deployment using Flask.
requirements.txt

Tutorial Contents

Static Web Scraping Tutorial: Explore the notebook/static_web.ipynb notebook to learn how to scrape news from traditional static websites.
Dynamic Web Scraping Tutorial: The notebook/dynamic_web.ipynb notebook guides you through scraping news from dynamic web applications.

Sentiment Analysis Tutorial: Learn lexicon-based sentiment analysis using TextBlob. Understand the sentiment behind news articles related to the selected keywords. Build a machine learning model from scratch for sentiment analysis.

Getting Started

Clone the repository to your local machine

git clone https://github.com/Ubeydkhoiri/news-scraper.git

Install dependencies
```
pip install -r requirements.txt
```
Navigate to the repository
```
cd news-scraper
```
Run flask app.py
```
python deployment_script/app.py
```
After flask app runs, you can copy http://127.0.0.1:5000 on your web-browser. Edit your route http://127.0.0.1:5000/export?keyword='anies baswedan' to export all news with tag 'anies baswedan'. And http://127.0.0.1:5000/updatedata to run the scraper and update the data
Explore the tutorials in the notebook directory and deploy the Flask application using the scripts in deployment_script.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
deployment_script		deployment_script
img		img
notebook		notebook
save_img		save_img
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Scraper Tutorial

Overview

Repository Structure

Tutorial Contents

Getting Started

Note: Stay tuned for more tutorials as the project progresses!

About

Releases

Packages

Languages

Ubeydkhoiri/news-scraper-with-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

News Scraper Tutorial

Overview

Repository Structure

Tutorial Contents

Getting Started

Note: Stay tuned for more tutorials as the project progresses!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages