MACHINE LEARNING

Overview

The Machine Learning part of this application is a recommendation system. We create a model for an article recommendation system that classifies fire, criminal, and health-related topics. The readers' read data is sent to the cloud and the title of the articles will be processed in the model. After that, based on what the user has read, the model will recommend appropriate categories.

Datasets

The dataset we use comes from CNN Indonesia (train and validation data) and Kompas.com (for testing). We collect the article title, author, category, article link, image link, and some content from the article. To retrieve these datasets, we use two methods:

HTML parser Beautiful Soup
Automates web browser Selenium

We take the html tags that store the data such as <h1> and <h2> for the title,<p> the content, <a> tags for the article links, and <img> which holds the image links. The scrapped data is then stored in CSV format. For more details visit datasets.

Model Architecture

We use the Embedding Layer to convert the words into a numerical representation. Each word will be represented with a word space vector.
Bidirectional LSTM layer. LSTM is a type of recurrence model that can overcome the vanishing gradient problem in artificial neural networks.
Dropout Layer is used to avoid overfitting in the model.

The model achieved a loss of 0.1744 and an accuracy of 0.9292 on the training data. While in the validation data, the model achieved a loss of 0.6373 and an accuracy of 0.7874.

Model Summary

Model Accuracy & Loss

How to replicate our projects

01 Data Preprocessing

To run this model you need to follow these steps:

Download the datasets here
Upload the dataset in your notebook environment
Install the required libraries
Pre-process the data

02 Modelling

Tokenize to vectorize the text corpus
Build and compile the model with the architectures as mentioned above
Do a model evaluation
Convert the model to .h5 format

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
datasets		datasets
model		model
scrapping news		scrapping news
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MACHINE LEARNING

Overview

Datasets

Model Architecture

How to replicate our projects

01 Data Preprocessing

02 Modelling

About

Releases

Packages

Contributors 2

Languages

EmergenZ-Team/EmergenZ-ML

Folders and files

Latest commit

History

Repository files navigation

MACHINE LEARNING

Overview

Datasets

Model Architecture

How to replicate our projects

01 Data Preprocessing

02 Modelling

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages