The idea is to develop a machine learning program to identify whether an article might be fake news or not.
Dataset link: https://www.kaggle.com/c/fake-news/data
train.csv: A full training dataset with the following attributes:
-
id: unique id for a news article
-
title: the title of a news article
-
author: author of the news article
-
text: the text of the article; could be incomplete
-
label: a label that marks the article as potentially unreliable
- 1: unreliable
- 0: reliable
-
test.csv: A testing training dataset with all the same attributes at train.csv without the label.
This is the countplot for the datapoints belonging to a specific class.
This the distibution of the Length of the Title of the News.
This is the distribution of the length of the Text in the News.