Skip to content

About The purpose of the study was to learn more about the problem of misinformation. In this work, we proposed a machine-learning-based framework to automate the process of data annotation on Hindi Fake News Dataset. Our primary focus was on data annotation and automation

License

Notifications You must be signed in to change notification settings

mehakagg1313/Hindi-Fake-News-Fact-Checker

Repository files navigation

Hindi Fake News Fact Checker

Hindi Fake News Dataset and Analysis we have tried to extend the automation of data annotation to Hindi. This research concentrates on automating the task of identifying the links that lead to fake news and annotating it accordingly using a machine learning model. A manually annotated Hindi Fake News links dataset was used for training the model which works on algorithms such as Gaussian Naive Bayes, k-Nearest Neighbours, Support Vector Machine, LSTM and others. Although the task of data annotation when done manually is very tedious and time-consuming it plays a huge role in determining the accuracy of the model. Data annotation is the key to building a successful AI model with high accuracy. The higher the accuracy is the better are the results of the model. The accuracy of the model depends greatly on the quality of the annotated data. If there is even a slight inaccuracy in data annotation the overall accuracy of the entire model is greatly affected.

flowchart (1)

A machine-learning and a deep learning based framework to automate the process of data annotation. Our main contributions are:

First collected data from various fact check websites. After extraction, the next step was pre-processing of data. For pre-processing we removed the punctuations and stopwords from the dataset followed by stemming and lemmatizing. Finally we vectorized the entire dataset using th TF/IDF Vectorizer. Finally we applied baseline Machine Learning and Deep Learning Models: Gaussian Naive Bayes, Linear Regression, K-Nearest Neighbors, Support Vector Machines and Random Forest Search and Long Short-Term Memory. The proposed models are tested on 10%, 20%, 30% and 40% test data of the dataset prepared. Our model has shown very promising results with high accuracy of 81.44% for the Random Forest model implemented on 10% test data. The highest accuracy for the LSTM model having 100 epochs and a batch size of 64 implemented on 10% test data was 64.70%.

About

About The purpose of the study was to learn more about the problem of misinformation. In this work, we proposed a machine-learning-based framework to automate the process of data annotation on Hindi Fake News Dataset. Our primary focus was on data annotation and automation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published