diff --git a/Fake-News-Classification/Dataset/README.md b/Fake-News-Classification/Dataset/README.md new file mode 100644 index 000000000..d22ba02d2 --- /dev/null +++ b/Fake-News-Classification/Dataset/README.md @@ -0,0 +1,97 @@ +# Fake News Classification using DL + +## PROJECT TITLE + +Fake News Detection using Deep Learning + +## GOAL + +To identify whether the given news is fake or not. + +## DATASET + +The link for the dataset used in this project: https://www.kaggle.com/competitions/fake-news/data?select=train.csv + + +## DESCRIPTION + +This project aims to identify whether the given news is fake or not by extracting meaning and semantics of the given news. + +## WHAT I HAD DONE + +1. Data collection: From the link of the dataset given above. +2. Data preprocessing: Preprocessed the news by combining title and text to create a new feature and did some augementation like tokeinizing and vectorising before passing them to model training +3. Model selection: Self Designed model having a Embedding Layer followed by Global Pooling Layer and then 2 Dense layers and then output layer.Second model had a Embedding layer followed by a RNN layer and a Dense output layer. +4. Comparative analysis: Compared the accuracy score of all the models. + +## MODELS SUMMARY + +Model-1: "sequential" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding (Embedding) (None, 12140, 182) 30222010 + + global_average_pooling1d ( (None, 182) 0 + GlobalAveragePooling1D) + + dense (Dense) (None, 96) 17568 + + dense_1 (Dense) (None, 24) 2328 + + dense_2 (Dense) (None, 1) 25 + +================================================================= +Total params: 30241931 (115.36 MB) +Trainable params: 30241931 (115.36 MB) +Non-trainable params: 0 (0.00 Byte) + +Model-2: "sequential_3" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding_3 (Embedding) (None, 12140, 100) 16605500 + + simple_rnn (SimpleRNN) (None, 10) 1110 + + dense_5 (Dense) (None, 1) 11 + +================================================================= +Total params: 16606621 (63.35 MB) +Trainable params: 16606621 (63.35 MB) +Non-trainable params: 0 (0.00 Byte) + +## LIBRARIES NEEDED + +The following libraries are required to run this project: + +- nltk +- pandas +- matplotlib +- tensorflow +- keras +- sklearn + +## EVALUATION METRICS + +The evaluation metrics I used to assess the models: + +- Accuracy +- Loss + +It is shown using Confusion Matrix in the Images folder + +## RESULTS +Results on Val dataset: +For Model-1: +Accuracy:96.11% +loss: 0.1350 + +For Model-2: +Accuracy:85.03% +loss: 0.1439 + +## CONCLUSION +Based on results we can draw following conclusions: + +1.The model-1 showed high validation accuracy of 96.11% and loss of 0.1350.Thus the model-1 worked fairly well identifying 2874 fake articles from a total of 3044.The first model performed better.The second model had good training accuracy but less test accuracy hinting towards overfitting.Maybe the key reason being in fake news it is important to capture overall sentiment better than individual word sentiment. diff --git a/Fake-News-Classification/Images/Dataset.png b/Fake-News-Classification/Images/Dataset.png new file mode 100644 index 000000000..1e317542d Binary files /dev/null and b/Fake-News-Classification/Images/Dataset.png differ diff --git a/Fake-News-Classification/Images/EDA.png b/Fake-News-Classification/Images/EDA.png new file mode 100644 index 000000000..17401792b Binary files /dev/null and b/Fake-News-Classification/Images/EDA.png differ diff --git a/Fake-News-Classification/Images/EDA1.png b/Fake-News-Classification/Images/EDA1.png new file mode 100644 index 000000000..ddf9bc5f0 Binary files /dev/null and b/Fake-News-Classification/Images/EDA1.png differ diff --git a/Fake-News-Classification/Images/metrics.png b/Fake-News-Classification/Images/metrics.png new file mode 100644 index 000000000..1dc7d3150 Binary files /dev/null and b/Fake-News-Classification/Images/metrics.png differ diff --git a/Fake-News-Classification/Images/model.png b/Fake-News-Classification/Images/model.png new file mode 100644 index 000000000..a8cd84df6 Binary files /dev/null and b/Fake-News-Classification/Images/model.png differ diff --git a/Fake-News-Classification/Images/model2.png b/Fake-News-Classification/Images/model2.png new file mode 100644 index 000000000..b95d80c07 Binary files /dev/null and b/Fake-News-Classification/Images/model2.png differ diff --git a/Fake-News-Classification/Images/model2metrics.png b/Fake-News-Classification/Images/model2metrics.png new file mode 100644 index 000000000..e7631795b Binary files /dev/null and b/Fake-News-Classification/Images/model2metrics.png differ diff --git a/Fake-News-Classification/Model/PridictionModel.ipynb b/Fake-News-Classification/Model/PridictionModel.ipynb new file mode 100644 index 000000000..6b37ebbe1 --- /dev/null +++ b/Fake-News-Classification/Model/PridictionModel.ipynb @@ -0,0 +1,2513 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:29.636394Z", + "iopub.status.busy": "2021-05-25T06:50:29.636041Z", + "iopub.status.idle": "2021-05-25T06:50:29.643277Z", + "shell.execute_reply": "2021-05-25T06:50:29.642127Z", + "shell.execute_reply.started": "2021-05-25T06:50:29.636365Z" + } + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import sklearn\n", + "import itertools\n", + "import numpy as np\n", + "import seaborn as sb\n", + "import re\n", + "import nltk\n", + "import pickle\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.feature_extraction.text import TfidfVectorizer\n", + "from sklearn.metrics import accuracy_score\n", + "from sklearn.metrics import confusion_matrix\n", + "from matplotlib import pyplot as plt\n", + "from sklearn.linear_model import PassiveAggressiveClassifier,LogisticRegression\n", + "from nltk.stem import WordNetLemmatizer\n", + "from nltk.corpus import stopwords" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:29.656569Z", + "iopub.status.busy": "2021-05-25T06:50:29.656203Z", + "iopub.status.idle": "2021-05-25T06:50:32.048864Z", + "shell.execute_reply": "2021-05-25T06:50:32.047882Z", + "shell.execute_reply.started": "2021-05-25T06:50:29.65654Z" + } + }, + "outputs": [], + "source": [ + "train_df = pd.read_csv('train.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.05136Z", + "iopub.status.busy": "2021-05-25T06:50:32.051032Z", + "iopub.status.idle": "2021-05-25T06:50:32.089516Z", + "shell.execute_reply": "2021-05-25T06:50:32.088399Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.051329Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtitleauthortextlabel
00House Dem Aide: We Didn’t Even See Comey’s Let...Darrell LucusHouse Dem Aide: We Didn’t Even See Comey’s Let...1
11FLYNN: Hillary Clinton, Big Woman on Campus - ...Daniel J. FlynnEver get the feeling your life circles the rou...0
22Why the Truth Might Get You FiredConsortiumnews.comWhy the Truth Might Get You Fired October 29, ...1
3315 Civilians Killed In Single US Airstrike Hav...Jessica PurkissVideos 15 Civilians Killed In Single US Airstr...1
44Iranian woman jailed for fictional unpublished...Howard PortnoyPrint \\nAn Iranian woman has been sentenced to...1
55Jackie Mason: Hollywood Would Love Trump if He...Daniel NussbaumIn these trying times, Jackie Mason is the Voi...0
66Life: Life Of Luxury: Elton John’s 6 Favorite ...NaNEver wonder how Britain’s most iconic pop pian...1
77Benoît Hamon Wins French Socialist Party’s Pre...Alissa J. RubinPARIS — France chose an idealistic, traditi...0
88Excerpts From a Draft Script for Donald Trump’...NaNDonald J. Trump is scheduled to make a highly ...0
99A Back-Channel Plan for Ukraine and Russia, Co...Megan Twohey and Scott ShaneA week before Michael T. Flynn resigned as nat...0
1010Obama’s Organizing for Action Partners with So...Aaron KleinOrganizing for Action, the activist group that...0
1111BBC Comedy Sketch \"Real Housewives of ISIS\" Ca...Chris TomlinsonThe BBC produced spoof on the “Real Housewives...0
1212Russian Researchers Discover Secret Nazi Milit...Amando FlavioThe mystery surrounding The Third Reich and Na...1
1313US Officials See No Link Between Trump and RussiaJason DitzClinton Campaign Demands FBI Affirm Trump's Ru...1
1414Re: Yes, There Are Paid Government Trolls On S...AnotherAnnieYes, There Are Paid Government Trolls On Socia...1
\n", + "
" + ], + "text/plain": [ + " id title \\\n", + "0 0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 1 FLYNN: Hillary Clinton, Big Woman on Campus - ... \n", + "2 2 Why the Truth Might Get You Fired \n", + "3 3 15 Civilians Killed In Single US Airstrike Hav... \n", + "4 4 Iranian woman jailed for fictional unpublished... \n", + "5 5 Jackie Mason: Hollywood Would Love Trump if He... \n", + "6 6 Life: Life Of Luxury: Elton John’s 6 Favorite ... \n", + "7 7 Benoît Hamon Wins French Socialist Party’s Pre... \n", + "8 8 Excerpts From a Draft Script for Donald Trump’... \n", + "9 9 A Back-Channel Plan for Ukraine and Russia, Co... \n", + "10 10 Obama’s Organizing for Action Partners with So... \n", + "11 11 BBC Comedy Sketch \"Real Housewives of ISIS\" Ca... \n", + "12 12 Russian Researchers Discover Secret Nazi Milit... \n", + "13 13 US Officials See No Link Between Trump and Russia \n", + "14 14 Re: Yes, There Are Paid Government Trolls On S... \n", + "\n", + " author \\\n", + "0 Darrell Lucus \n", + "1 Daniel J. Flynn \n", + "2 Consortiumnews.com \n", + "3 Jessica Purkiss \n", + "4 Howard Portnoy \n", + "5 Daniel Nussbaum \n", + "6 NaN \n", + "7 Alissa J. Rubin \n", + "8 NaN \n", + "9 Megan Twohey and Scott Shane \n", + "10 Aaron Klein \n", + "11 Chris Tomlinson \n", + "12 Amando Flavio \n", + "13 Jason Ditz \n", + "14 AnotherAnnie \n", + "\n", + " text label \n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... 1 \n", + "1 Ever get the feeling your life circles the rou... 0 \n", + "2 Why the Truth Might Get You Fired October 29, ... 1 \n", + "3 Videos 15 Civilians Killed In Single US Airstr... 1 \n", + "4 Print \\nAn Iranian woman has been sentenced to... 1 \n", + "5 In these trying times, Jackie Mason is the Voi... 0 \n", + "6 Ever wonder how Britain’s most iconic pop pian... 1 \n", + "7 PARIS — France chose an idealistic, traditi... 0 \n", + "8 Donald J. Trump is scheduled to make a highly ... 0 \n", + "9 A week before Michael T. Flynn resigned as nat... 0 \n", + "10 Organizing for Action, the activist group that... 0 \n", + "11 The BBC produced spoof on the “Real Housewives... 0 \n", + "12 The mystery surrounding The Third Reich and Na... 1 \n", + "13 Clinton Campaign Demands FBI Affirm Trump's Ru... 1 \n", + "14 Yes, There Are Paid Government Trolls On Socia... 1 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(15)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.10674Z", + "iopub.status.busy": "2021-05-25T06:50:32.106434Z", + "iopub.status.idle": "2021-05-25T06:50:32.120541Z", + "shell.execute_reply": "2021-05-25T06:50:32.119386Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.106712Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(20800, 5)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.124489Z", + "iopub.status.busy": "2021-05-25T06:50:32.12414Z", + "iopub.status.idle": "2021-05-25T06:50:32.140229Z", + "shell.execute_reply": "2021-05-25T06:50:32.139288Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.124461Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 20800 entries, 0 to 20799\n", + "Data columns (total 5 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 id 20800 non-null int64 \n", + " 1 title 20242 non-null object\n", + " 2 author 18843 non-null object\n", + " 3 text 20761 non-null object\n", + " 4 label 20800 non-null int64 \n", + "dtypes: int64(2), object(3)\n", + "memory usage: 812.6+ KB\n" + ] + } + ], + "source": [ + "train_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([10413, 10387], dtype=int64)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# def create_distribution(dataFile):\n", + "# return sb.countplot(x='label', data=dataFile, palette='hls')\n", + "\n", + "# #by calling below we can see that training, test and valid data seems to be failry evenly distributed between the classes\n", + "# create_distribution(train_df)\n", + "train_df['label'].value_counts().values" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjoAAAGdCAYAAAAbudkLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy88F64QAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAosElEQVR4nO3df1SWdZ7/8ReCIiBcisp9x4gj05DR4mRRB8GZoKOiJuKOU7bhsnbW1MaSJTXT41ZWR0gbf0yxmlkr5o/RdndsO/1gxNVD4/oLKSqVbJ0YxeIWt/BGjEDx+v7h1+vsLUqaNyIfn49z7nO8r/t9X/f18ZxLnl7cNwTYtm0LAADAQJ3a+wAAAADaCqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFhB7X0A7ens2bP6+uuvFR4eroCAgPY+HAAAcBls29bJkycVHR2tTp1av2ZzQ4fO119/rZiYmPY+DAAA8CNUVVWpT58+rc7c0KETHh4u6dxfVERERDsfDQAAuBx1dXWKiYlxvo635oYOnfPfroqIiCB0AADoYC7nbSe8GRkAABiL0AEAAMYidHBd+vDDDzV69GhFR0crICBAb7/9ts/jtm1r3rx5io6OVkhIiNLS0rR///6L7su2bY0cOfKi+5k/f75SUlIUGhqq7t27t3juN998oxEjRig6OlrBwcGKiYnR448/rrq6Oj+tFLix+eNcnzJlim6++WaFhISod+/eGjNmjD7//HOfmS+++EJjxoxRr169FBERocGDB2vbtm3O44WFhQoICLjoraamps3Wj7ZH6OC6dOrUKd1+++0qKCi46OMLFy7U4sWLVVBQoNLSUrndbg0bNkwnT55sMbt06dJLfh+3qalJDzzwgH77299e9PFOnTppzJgxeuedd/TFF1+osLBQW7Zs0aOPPvrjFwfA4Y9zPTExUatWrVJFRYX+9Kc/ybZtpaenq7m52ZkZNWqUzpw5o61bt6qsrEwDBw5URkaGPB6PJOnBBx9UdXW1z2348OFKTU1VVFRU2/4loG3ZNzCv12tLsr1eb3sfClohyd60aZNz/+zZs7bb7bZffPFFZ9v3339vW5Zlv/rqqz7PLS8vt/v06WNXV1e32M//tWrVKtuyrMs6nt///vd2nz59rnQZAH7A1Zzr/9cnn3xiS7IPHTpk27ZtHz9+3JZkf/jhh85MXV2dLcnesmXLRfdRU1Njd+7c2X7zzTevclVoC1fy9ZsrOuhwKisr5fF4lJ6e7mwLDg5WamqqduzY4Wz77rvv9NBDD6mgoEBut9svr/3111/rj3/8o1JTU/2yPwCXdrnn+v916tQprVq1SrGxsc7PSevZs6fi4+P15ptv6tSpUzpz5oxWrFghl8ulxMTEi+7nzTffVGhoqO6//37/LwzXFKGDDuf8pWaXy+Wz3eVyOY9J0hNPPKGUlBSNGTPmql/zoYceUmhoqH7yk58oIiJCr7/++lXvE0DrLvdcl6Rly5apW7du6tatm4qKilRcXKwuXbpIOvcR5OLiYn388ccKDw9X165dtWTJEhUVFV30vXmS9K//+q/KyspSSEiI/xeGa4rQQYd14ftubNt2tr3zzjvaunWrli5d6pfXWrJkiT766CO9/fbb+stf/qLp06f7Zb8Aflhr5/p548eP18cff6ySkhLFxcVp3Lhx+v777535qVOnKioqSn/+85+1Z88ejRkzRhkZGaqurm7xejt37tSBAwc0ceLEtlsUrhlCBx3O+W9DXfg/upqaGud/flu3btVf/vIXde/eXUFBQQoKOvezMX/zm98oLS3tR73mrbfeqjFjxmjFihVavnz5Rf+BBOA/l3Oun2dZluLi4nTPPffo3//93/X5559r06ZNks79e/Duu+9qw4YNGjx4sO68804tW7ZMISEhWr16dYvXff311zVw4MBLflsLHQuhgw4nNjZWbrdbxcXFzrampiaVlJQoJSVFkjR79mx9+umnKi8vd27SuSszq1atuqrXt21bktTY2HhV+wHQuss51y/Ftm3nHP3uu+8kqcUvf+zUqZPOnj3rs62+vl5vvfUWV3MMckP/Cghcv+rr63Xo0CHnfmVlpcrLyxUZGam+ffsqNzdXeXl5iouLU1xcnPLy8hQaGqqsrCxJ5/4neLE3IPft21exsbHO/SNHjujbb7/VkSNH1Nzc7ATRz3/+c3Xr1k3vv/++jh07prvvvlvdunXTgQMHNGvWLA0ePFj9+vVr078D4EZwtef6l19+qY0bNyo9PV29e/fWV199pQULFigkJET33XefJCk5OVk9evTQhAkT9MwzzygkJEQrV65UZWWlRo0a5XM8Gzdu1JkzZzR+/Phr95eAttW2HwC7vvHx8uvXtm3bbEktbhMmTLBt+9zHTp999lnb7XbbwcHB9j333GN/9tlnre5TF/l4+YQJEy76Otu2bbNt27a3bt1qJycn25Zl2V27drXj4uLsp556yq6trfX/ooEb0NWe61999ZU9cuRIOyoqyu7cubPdp08fOysry/788899Xqe0tNROT0+3IyMj7fDwcHvQoEH2+++/3+J4kpOT7aysrDZdM67elXz9DrDt/38d/gZUV1cny7Lk9Xr5pZ4AAHQQV/L1m/foAAAAY/EenTbUb/Z77X0IwHXrry+O+uEhALhKhA4AXAX+QwO0rr3/U8O3rgAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGuuLQ+fDDDzV69GhFR0crICBAb7/9ts/jtm1r3rx5io6OVkhIiNLS0rR//36fmcbGRk2bNk29evVSWFiYMjMzdfToUZ+Z2tpaZWdny7IsWZal7OxsnThxwmfmyJEjGj16tMLCwtSrVy/l5OSoqanpSpcEAAAMdcWhc+rUKd1+++0qKCi46OMLFy7U4sWLVVBQoNLSUrndbg0bNkwnT550ZnJzc7Vp0yZt2LBB27dvV319vTIyMtTc3OzMZGVlqby8XEVFRSoqKlJ5ebmys7Odx5ubmzVq1CidOnVK27dv14YNG/Qf//EfmjFjxpUuCQAAGOqKfwXEyJEjNXLkyIs+Ztu2li5dqrlz52rs2LGSpNWrV8vlcmn9+vWaMmWKvF6v3njjDa1Zs0ZDhw6VJK1du1YxMTHasmWLhg8froqKChUVFWnXrl1KSkqSJK1cuVLJyck6ePCg+vfvr82bN+vAgQOqqqpSdHS0JGnRokV6+OGHNX/+fH4bOQAA8O97dCorK+XxeJSenu5sCw4OVmpqqnbs2CFJKisr0+nTp31moqOjlZCQ4Mzs3LlTlmU5kSNJgwYNkmVZPjMJCQlO5EjS8OHD1djYqLKysoseX2Njo+rq6nxuAADAXH4NHY/HI0lyuVw+210ul/OYx+NRly5d1KNHj1ZnoqKiWuw/KirKZ+bC1+nRo4e6dOnizFwoPz/fec+PZVmKiYn5EasEAAAdRZt86iogIMDnvm3bLbZd6MKZi83/mJn/a86cOfJ6vc6tqqqq1WMCAAAdm19Dx+12S1KLKyo1NTXO1Re3262mpibV1ta2OnPs2LEW+z9+/LjPzIWvU1tbq9OnT7e40nNecHCwIiIifG4AAMBcfg2d2NhYud1uFRcXO9uamppUUlKilJQUSVJiYqI6d+7sM1NdXa19+/Y5M8nJyfJ6vdqzZ48zs3v3bnm9Xp+Zffv2qbq62pnZvHmzgoODlZiY6M9lAQCADuqKP3VVX1+vQ4cOOfcrKytVXl6uyMhI9e3bV7m5ucrLy1NcXJzi4uKUl5en0NBQZWVlSZIsy9LEiRM1Y8YM9ezZU5GRkZo5c6YGDBjgfAorPj5eI0aM0KRJk7RixQpJ0uTJk5WRkaH+/ftLktLT03XbbbcpOztbL730kr799lvNnDlTkyZN4koNAACQ9CNCZ+/evbr33nud+9OnT5ckTZgwQYWFhZo1a5YaGho0depU1dbWKikpSZs3b1Z4eLjznCVLligoKEjjxo1TQ0ODhgwZosLCQgUGBjoz69atU05OjvPprMzMTJ+f3RMYGKj33ntPU6dO1eDBgxUSEqKsrCz97ne/u/K/BQAAYKQA27bt9j6I9lJXVyfLsuT1etvkKlC/2e/5fZ+AKf764qj2PgS/4DwHWtcW5/qVfP3md10BAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAY/k9dM6cOaN//ud/VmxsrEJCQvSzn/1Mzz//vM6ePevM2LatefPmKTo6WiEhIUpLS9P+/ft99tPY2Khp06apV69eCgsLU2Zmpo4ePeozU1tbq+zsbFmWJcuylJ2drRMnTvh7SQAAoIPye+gsWLBAr776qgoKClRRUaGFCxfqpZde0iuvvOLMLFy4UIsXL1ZBQYFKS0vldrs1bNgwnTx50pnJzc3Vpk2btGHDBm3fvl319fXKyMhQc3OzM5OVlaXy8nIVFRWpqKhI5eXlys7O9veSAABABxXk7x3u3LlTY8aM0ahRoyRJ/fr10x/+8Aft3btX0rmrOUuXLtXcuXM1duxYSdLq1avlcrm0fv16TZkyRV6vV2+88YbWrFmjoUOHSpLWrl2rmJgYbdmyRcOHD1dFRYWKioq0a9cuJSUlSZJWrlyp5ORkHTx4UP379/f30gAAQAfj9ys6v/zlL/Vf//Vf+uKLLyRJn3zyibZv36777rtPklRZWSmPx6P09HTnOcHBwUpNTdWOHTskSWVlZTp9+rTPTHR0tBISEpyZnTt3yrIsJ3IkadCgQbIsy5m5UGNjo+rq6nxuAADAXH6/ovPUU0/J6/Xq1ltvVWBgoJqbmzV//nw99NBDkiSPxyNJcrlcPs9zuVw6fPiwM9OlSxf16NGjxcz553s8HkVFRbV4/aioKGfmQvn5+XruueeuboEAAKDD8PsVnY0bN2rt2rVav369PvroI61evVq/+93vtHr1ap+5gIAAn/u2bbfYdqELZy4239p+5syZI6/X69yqqqoud1kAAKAD8vsVnSeffFKzZ8/W3/3d30mSBgwYoMOHDys/P18TJkyQ2+2WdO6KzE033eQ8r6amxrnK43a71dTUpNraWp+rOjU1NUpJSXFmjh071uL1jx8/3uJq0XnBwcEKDg72z0IBAMB1z+9XdL777jt16uS728DAQOfj5bGxsXK73SouLnYeb2pqUklJiRMxiYmJ6ty5s89MdXW19u3b58wkJyfL6/Vqz549zszu3bvl9XqdGQAAcGPz+xWd0aNHa/78+erbt6/+5m/+Rh9//LEWL16sf/zHf5R07ttNubm5ysvLU1xcnOLi4pSXl6fQ0FBlZWVJkizL0sSJEzVjxgz17NlTkZGRmjlzpgYMGOB8Cis+Pl4jRozQpEmTtGLFCknS5MmTlZGRwSeuAACApDYInVdeeUVPP/20pk6dqpqaGkVHR2vKlCl65plnnJlZs2apoaFBU6dOVW1trZKSkrR582aFh4c7M0uWLFFQUJDGjRunhoYGDRkyRIWFhQoMDHRm1q1bp5ycHOfTWZmZmSooKPD3kgAAQAcVYNu23d4H0V7q6upkWZa8Xq8iIiL8vv9+s9/z+z4BU/z1xVHtfQh+wXkOtK4tzvUr+frN77oCAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGKtNQuerr77S3//936tnz54KDQ3VwIEDVVZW5jxu27bmzZun6OhohYSEKC0tTfv37/fZR2Njo6ZNm6ZevXopLCxMmZmZOnr0qM9MbW2tsrOzZVmWLMtSdna2Tpw40RZLAgAAHZDfQ6e2tlaDBw9W586d9cEHH+jAgQNatGiRunfv7swsXLhQixcvVkFBgUpLS+V2uzVs2DCdPHnSmcnNzdWmTZu0YcMGbd++XfX19crIyFBzc7Mzk5WVpfLychUVFamoqEjl5eXKzs7295IAAEAHFeTvHS5YsEAxMTFatWqVs61fv37On23b1tKlSzV37lyNHTtWkrR69Wq5XC6tX79eU6ZMkdfr1RtvvKE1a9Zo6NChkqS1a9cqJiZGW7Zs0fDhw1VRUaGioiLt2rVLSUlJkqSVK1cqOTlZBw8eVP/+/f29NAAA0MH4/YrOO++8o7vuuksPPPCAoqKidMcdd2jlypXO45WVlfJ4PEpPT3e2BQcHKzU1VTt27JAklZWV6fTp0z4z0dHRSkhIcGZ27twpy7KcyJGkQYMGybIsZ+ZCjY2Nqqur87kBAABz+T10vvzySy1fvlxxcXH605/+pEcffVQ5OTl68803JUkej0eS5HK5fJ7ncrmcxzwej7p06aIePXq0OhMVFdXi9aOiopyZC+Xn5zvv57EsSzExMVe3WAAAcF3ze+icPXtWd955p/Ly8nTHHXdoypQpmjRpkpYvX+4zFxAQ4HPftu0W2y504czF5lvbz5w5c+T1ep1bVVXV5S4LAAB0QH4PnZtuukm33Xabz7b4+HgdOXJEkuR2uyWpxVWXmpoa5yqP2+1WU1OTamtrW505duxYi9c/fvx4i6tF5wUHBysiIsLnBgAAzOX30Bk8eLAOHjzos+2LL77QT3/6U0lSbGys3G63iouLncebmppUUlKilJQUSVJiYqI6d+7sM1NdXa19+/Y5M8nJyfJ6vdqzZ48zs3v3bnm9XmcGAADc2Pz+qasnnnhCKSkpysvL07hx47Rnzx699tpreu211ySd+3ZTbm6u8vLyFBcXp7i4OOXl5Sk0NFRZWVmSJMuyNHHiRM2YMUM9e/ZUZGSkZs6cqQEDBjifwoqPj9eIESM0adIkrVixQpI0efJkZWRk8IkrAAAgqQ1C5+6779amTZs0Z84cPf/884qNjdXSpUs1fvx4Z2bWrFlqaGjQ1KlTVVtbq6SkJG3evFnh4eHOzJIlSxQUFKRx48apoaFBQ4YMUWFhoQIDA52ZdevWKScnx/l0VmZmpgoKCvy9JAAA0EEF2LZtt/dBtJe6ujpZliWv19sm79fpN/s9v+8TMMVfXxzV3ofgF5znQOva4ly/kq/f/K4rAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYCxCBwAAGIvQAQAAxiJ0AACAsQgdAABgLEIHAAAYi9ABAADGInQAAICxCB0AAGAsQgcAABiL0AEAAMYidAAAgLEIHQAAYKw2D538/HwFBAQoNzfX2WbbtubNm6fo6GiFhIQoLS1N+/fv93leY2Ojpk2bpl69eiksLEyZmZk6evSoz0xtba2ys7NlWZYsy1J2drZOnDjR1ksCAAAdRJuGTmlpqV577TX94he/8Nm+cOFCLV68WAUFBSotLZXb7dawYcN08uRJZyY3N1ebNm3Shg0btH37dtXX1ysjI0PNzc3OTFZWlsrLy1VUVKSioiKVl5crOzu7LZcEAAA6kDYLnfr6eo0fP14rV65Ujx49nO22bWvp0qWaO3euxo4dq4SEBK1evVrfffed1q9fL0nyer164403tGjRIg0dOlR33HGH1q5dq88++0xbtmyRJFVUVKioqEivv/66kpOTlZycrJUrV+rdd9/VwYMH22pZAACgA2mz0Hnsscc0atQoDR061Gd7ZWWlPB6P0tPTnW3BwcFKTU3Vjh07JEllZWU6ffq0z0x0dLQSEhKcmZ07d8qyLCUlJTkzgwYNkmVZzsyFGhsbVVdX53MDAADmCmqLnW7YsEEfffSRSktLWzzm8XgkSS6Xy2e7y+XS4cOHnZkuXbr4XAk6P3P++R6PR1FRUS32HxUV5cxcKD8/X88999yVLwgAAHRIfr+iU1VVpX/6p3/S2rVr1bVr10vOBQQE+Ny3bbvFtgtdOHOx+db2M2fOHHm9XudWVVXV6usBAICOze+hU1ZWppqaGiUmJiooKEhBQUEqKSnRyy+/rKCgIOdKzoVXXWpqapzH3G63mpqaVFtb2+rMsWPHWrz+8ePHW1wtOi84OFgRERE+NwAAYC6/h86QIUP02Wefqby83LndddddGj9+vMrLy/Wzn/1MbrdbxcXFznOamppUUlKilJQUSVJiYqI6d+7sM1NdXa19+/Y5M8nJyfJ6vdqzZ48zs3v3bnm9XmcGAADc2Pz+Hp3w8HAlJCT4bAsLC1PPnj2d7bm5ucrLy1NcXJzi4uKUl5en0NBQZWVlSZIsy9LEiRM1Y8YM9ezZU5GRkZo5c6YGDBjgvLk5Pj5eI0aM0KRJk7RixQpJ0uTJk5WRkaH+/fv7e1kAAKADapM3I/+QWbNmqaGhQVOnTlVtba2SkpK0efNmhYeHOzNLlixRUFCQxo0bp4aGBg0ZMkSFhYUKDAx0ZtatW6ecnBzn01mZmZkqKCi45usBAADXpwDbtu32Poj2UldXJ8uy5PV62+T9Ov1mv+f3fQKm+OuLo9r7EPyC8xxoXVuc61fy9ZvfdQUAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwlt9DJz8/X3fffbfCw8MVFRWlv/3bv9XBgwd9Zmzb1rx58xQdHa2QkBClpaVp//79PjONjY2aNm2aevXqpbCwMGVmZuro0aM+M7W1tcrOzpZlWbIsS9nZ2Tpx4oS/lwQAADoov4dOSUmJHnvsMe3atUvFxcU6c+aM0tPTderUKWdm4cKFWrx4sQoKClRaWiq3261hw4bp5MmTzkxubq42bdqkDRs2aPv27aqvr1dGRoaam5udmaysLJWXl6uoqEhFRUUqLy9Xdna2v5cEAAA6qADbtu22fIHjx48rKipKJSUluueee2TbtqKjo5Wbm6unnnpK0rmrNy6XSwsWLNCUKVPk9XrVu3dvrVmzRg8++KAk6euvv1ZMTIzef/99DR8+XBUVFbrtttu0a9cuJSUlSZJ27dql5ORkff755+rfv/8PHltdXZ0sy5LX61VERITf195v9nt+3ydgir++OKq9D8EvOM+B1rXFuX4lX7/b/D06Xq9XkhQZGSlJqqyslMfjUXp6ujMTHBys1NRU7dixQ5JUVlam06dP+8xER0crISHBmdm5c6csy3IiR5IGDRoky7KcGQAAcGMLasud27at6dOn65e//KUSEhIkSR6PR5Lkcrl8Zl0ulw4fPuzMdOnSRT169Ggxc/75Ho9HUVFRLV4zKirKmblQY2OjGhsbnft1dXU/cmUAAKAjaNMrOo8//rg+/fRT/eEPf2jxWEBAgM9927ZbbLvQhTMXm29tP/n5+c4bly3LUkxMzOUsAwAAdFBtFjrTpk3TO++8o23btqlPnz7OdrfbLUktrrrU1NQ4V3ncbreamppUW1vb6syxY8davO7x48dbXC06b86cOfJ6vc6tqqrqxy8QAABc9/weOrZt6/HHH9cf//hHbd26VbGxsT6Px8bGyu12q7i42NnW1NSkkpISpaSkSJISExPVuXNnn5nq6mrt27fPmUlOTpbX69WePXucmd27d8vr9TozFwoODlZERITPDQAAmMvv79F57LHHtH79ev3nf/6nwsPDnSs3lmUpJCREAQEBys3NVV5enuLi4hQXF6e8vDyFhoYqKyvLmZ04caJmzJihnj17KjIyUjNnztSAAQM0dOhQSVJ8fLxGjBihSZMmacWKFZKkyZMnKyMj47I+cQUAAMzn99BZvny5JCktLc1n+6pVq/Twww9LkmbNmqWGhgZNnTpVtbW1SkpK0ubNmxUeHu7ML1myREFBQRo3bpwaGho0ZMgQFRYWKjAw0JlZt26dcnJynE9nZWZmqqCgwN9LAgAAHVSb/xyd6xk/RwdoP/wcHeDGYPzP0QEAAGgvhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFiEDgAAMBahAwAAjEXoAAAAYxE6AADAWB0+dJYtW6bY2Fh17dpViYmJ+vOf/9zehwQAAK4THTp0Nm7cqNzcXM2dO1cff/yxfvWrX2nkyJE6cuRIex8aAAC4DnTo0Fm8eLEmTpyoRx55RPHx8Vq6dKliYmK0fPny9j40AABwHQhq7wP4sZqamlRWVqbZs2f7bE9PT9eOHTsu+pzGxkY1NjY6971erySprq6uTY7xbON3bbJfwARtdd5da5znQOva4lw/v0/btn9wtsOGzv/+7/+qublZLpfLZ7vL5ZLH47noc/Lz8/Xcc8+12B4TE9Mmxwjg0qyl7X0EAK6FtjzXT548KcuyWp3psKFzXkBAgM9927ZbbDtvzpw5mj59unP/7Nmz+vbbb9WzZ89LPgdmqKurU0xMjKqqqhQREdHehwOgDXCe3zhs29bJkycVHR39g7MdNnR69eqlwMDAFldvampqWlzlOS84OFjBwcE+27p3795Wh4jrUEREBP8AAobjPL8x/NCVnPM67JuRu3TposTERBUXF/tsLy4uVkpKSjsdFQAAuJ502Cs6kjR9+nRlZ2frrrvuUnJysl577TUdOXJEjz76aHsfGgAAuA506NB58MEH9c033+j5559XdXW1EhIS9P777+unP/1pex8arjPBwcF69tlnW3zrEoA5OM9xMQH25Xw2CwAAoAPqsO/RAQAA+CGEDgAAMBahAwAAjEXoAAA6vMLCQn4uGi6K0EGHEhAQ0Ort4Ycfbu9DBHAVHn744Yue24cOHWrvQ0MH1aE/Xo4bT3V1tfPnjRs36plnntHBgwedbSEhIT7zp0+fVufOna/Z8QG4eiNGjNCqVat8tvXu3budjgYdHVd00KG43W7nZlmWAgICnPvff/+9unfvrrfeektpaWnq2rWr1q5dq3nz5mngwIE++1m6dKn69evns23VqlWKj49X165ddeutt2rZsmXXbmEAHMHBwT7nutvt1u9//3sNGDBAYWFhiomJ0dSpU1VfX3/JfXzyySe69957FR4eroiICCUmJmrv3r3O4zt27NA999yjkJAQxcTEKCcnR6dOnboWy8M1RujAOE899ZRycnJUUVGh4cOHX9ZzVq5cqblz52r+/PmqqKhQXl6enn76aa1evbqNjxbA5ejUqZNefvll7du3T6tXr9bWrVs1a9asS86PHz9effr0UWlpqcrKyjR79mzn6u5nn32m4cOHa+zYsfr000+1ceNGbd++XY8//vi1Wg6uIb51BePk5uZq7NixV/ScF154QYsWLXKeFxsbqwMHDmjFihWaMGFCWxwmgEt499131a1bN+f+yJEj9W//9m/O/djYWL3wwgv67W9/e8krr0eOHNGTTz6pW2+9VZIUFxfnPPbSSy8pKytLubm5zmMvv/yyUlNTtXz5cnXt2rUNVoX2QujAOHfdddcVzR8/flxVVVWaOHGiJk2a5Gw/c+bMZf92XAD+c++992r58uXO/bCwMG3btk15eXk6cOCA6urqdObMGX3//fc6deqUwsLCWuxj+vTpeuSRR7RmzRoNHTpUDzzwgG6++WZJUllZmQ4dOqR169Y587Zt6+zZs6qsrFR8fHzbLxLXDKED41z4j16nTp104W86OX36tPPns2fPSjr37aukpCSfucDAwDY6SgCXEhYWpp///OfO/cOHD+u+++7To48+qhdeeEGRkZHavn27Jk6c6HMu/1/z5s1TVlaW3nvvPX3wwQd69tlntWHDBv3617/W2bNnNWXKFOXk5LR4Xt++fdtsXWgfhA6M17t3b3k8Htm2rYCAAElSeXm587jL5dJPfvITffnllxo/fnw7HSWAS9m7d6/OnDmjRYsWqVOnc28tfeutt37webfccotuueUWPfHEE3rooYe0atUq/frXv9add96p/fv3+8QUzEXowHhpaWk6fvy4Fi5cqPvvv19FRUX64IMPFBER4czMmzdPOTk5ioiI0MiRI9XY2Ki9e/eqtrZW06dPb8ejB3DzzTfrzJkzeuWVVzR69Gj993//t1599dVLzjc0NOjJJ5/U/fffr9jYWB09elSlpaX6zW9+I+ncBxYGDRqkxx57TJMmTVJYWJgqKipUXFysV1555VotC9cIn7qC8eLj47Vs2TL9y7/8i26//Xbt2bNHM2fO9Jl55JFH9Prrr6uwsFADBgxQamqqCgsLFRsb205HDeC8gQMHavHixVqwYIESEhK0bt065efnX3I+MDBQ33zzjf7hH/5Bt9xyi8aNG6eRI0fqueeekyT94he/UElJif7nf/5Hv/rVr3THHXfo6aef1k033XStloRrKMC+8M0LAAAAhuCKDgAAMBahAwAAjEXoAAAAYxE6AADAWIQOAAAwFqEDAACMRegAAABjEToAAMBYhA4AADAWoQMAAIxF6AAAAGMROgAAwFj/D+m7vKbLMKEfAAAAAElFTkSuQmCC", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "bars= plt.bar(['True','False'],train_df['label'].value_counts().values)\n", + "for bar in bars:\n", + " yval = bar.get_height()\n", + " plt.text(bar.get_x() + bar.get_width() / 2, yval, round(yval, 2), ha='center', va='bottom')\n", + "\n", + "plt.show()\n", + "#Hence data has nearly equal cases of True and False News." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.306146Z", + "iopub.status.busy": "2021-05-25T06:50:32.305826Z", + "iopub.status.idle": "2021-05-25T06:50:32.335357Z", + "shell.execute_reply": "2021-05-25T06:50:32.33417Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.306118Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n", + "id 0\n", + "title 558\n", + "author 1957\n", + "text 39\n", + "label 0\n" + ] + } + ], + "source": [ + "def data_qualityCheck():\n", + " print(\"{:{}}\".format(\"\\033[1mCOLUMN\\033[0m\",38),end='')\n", + " print(\"{:{}}\".format(\"\\033[1mNULL VALUES COUNT\\033[0m\",18))\n", + " for x in train_df.columns:\n", + " print(\"{:{}}\".format(x,34),end='')\n", + " print(train_df[x].isnull().sum())\n", + "\n", + " \n", + "data_qualityCheck()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.337061Z", + "iopub.status.busy": "2021-05-25T06:50:32.336735Z", + "iopub.status.idle": "2021-05-25T06:50:32.367948Z", + "shell.execute_reply": "2021-05-25T06:50:32.366933Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.33703Z" + } + }, + "outputs": [], + "source": [ + "train_df=train_df.drop([\"id\", \"author\"], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.401314Z", + "iopub.status.busy": "2021-05-25T06:50:32.400868Z", + "iopub.status.idle": "2021-05-25T06:50:32.407806Z", + "shell.execute_reply": "2021-05-25T06:50:32.406589Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.401272Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(20800, 3)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.409912Z", + "iopub.status.busy": "2021-05-25T06:50:32.409162Z", + "iopub.status.idle": "2021-05-25T06:50:32.426843Z", + "shell.execute_reply": "2021-05-25T06:50:32.425727Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.409868Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titletextlabel
0House Dem Aide: We Didn’t Even See Comey’s Let...House Dem Aide: We Didn’t Even See Comey’s Let...1
1FLYNN: Hillary Clinton, Big Woman on Campus - ...Ever get the feeling your life circles the rou...0
2Why the Truth Might Get You FiredWhy the Truth Might Get You Fired October 29, ...1
315 Civilians Killed In Single US Airstrike Hav...Videos 15 Civilians Killed In Single US Airstr...1
4Iranian woman jailed for fictional unpublished...Print \\nAn Iranian woman has been sentenced to...1
\n", + "
" + ], + "text/plain": [ + " title \\\n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 FLYNN: Hillary Clinton, Big Woman on Campus - ... \n", + "2 Why the Truth Might Get You Fired \n", + "3 15 Civilians Killed In Single US Airstrike Hav... \n", + "4 Iranian woman jailed for fictional unpublished... \n", + "\n", + " text label \n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... 1 \n", + "1 Ever get the feeling your life circles the rou... 0 \n", + "2 Why the Truth Might Get You Fired October 29, ... 1 \n", + "3 Videos 15 Civilians Killed In Single US Airstr... 1 \n", + "4 Print \\nAn Iranian woman has been sentenced to... 1 " + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.457112Z", + "iopub.status.busy": "2021-05-25T06:50:32.45653Z", + "iopub.status.idle": "2021-05-25T06:50:32.46346Z", + "shell.execute_reply": "2021-05-25T06:50:32.461467Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.457067Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 1\n", + "1 0\n", + "2 1\n", + "3 1\n", + "4 1\n", + " ..\n", + "20795 0\n", + "20796 0\n", + "20797 0\n", + "20798 1\n", + "20799 1\n", + "Name: label, Length: 20800, dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train = train_df['label']\n", + "label_train" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.46513Z", + "iopub.status.busy": "2021-05-25T06:50:32.46484Z", + "iopub.status.idle": "2021-05-25T06:50:32.479833Z", + "shell.execute_reply": "2021-05-25T06:50:32.478601Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.465102Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 1\n", + "1 0\n", + "2 1\n", + "3 1\n", + "4 1\n", + "5 0\n", + "6 1\n", + "7 0\n", + "8 0\n", + "9 0\n", + "Name: label, dtype: int64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.481757Z", + "iopub.status.busy": "2021-05-25T06:50:32.481439Z", + "iopub.status.idle": "2021-05-25T06:50:32.493571Z", + "shell.execute_reply": "2021-05-25T06:50:32.492736Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.481728Z" + } + }, + "outputs": [], + "source": [ + "train_df = train_df.drop(\"label\", axis = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.495566Z", + "iopub.status.busy": "2021-05-25T06:50:32.495116Z", + "iopub.status.idle": "2021-05-25T06:50:32.513957Z", + "shell.execute_reply": "2021-05-25T06:50:32.51265Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.495526Z" + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titletext
0House Dem Aide: We Didn’t Even See Comey’s Let...House Dem Aide: We Didn’t Even See Comey’s Let...
1FLYNN: Hillary Clinton, Big Woman on Campus - ...Ever get the feeling your life circles the rou...
2Why the Truth Might Get You FiredWhy the Truth Might Get You Fired October 29, ...
315 Civilians Killed In Single US Airstrike Hav...Videos 15 Civilians Killed In Single US Airstr...
4Iranian woman jailed for fictional unpublished...Print \\nAn Iranian woman has been sentenced to...
5Jackie Mason: Hollywood Would Love Trump if He...In these trying times, Jackie Mason is the Voi...
6Life: Life Of Luxury: Elton John’s 6 Favorite ...Ever wonder how Britain’s most iconic pop pian...
7Benoît Hamon Wins French Socialist Party’s Pre...PARIS — France chose an idealistic, traditi...
8Excerpts From a Draft Script for Donald Trump’...Donald J. Trump is scheduled to make a highly ...
9A Back-Channel Plan for Ukraine and Russia, Co...A week before Michael T. Flynn resigned as nat...
\n", + "
" + ], + "text/plain": [ + " title \\\n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 FLYNN: Hillary Clinton, Big Woman on Campus - ... \n", + "2 Why the Truth Might Get You Fired \n", + "3 15 Civilians Killed In Single US Airstrike Hav... \n", + "4 Iranian woman jailed for fictional unpublished... \n", + "5 Jackie Mason: Hollywood Would Love Trump if He... \n", + "6 Life: Life Of Luxury: Elton John’s 6 Favorite ... \n", + "7 Benoît Hamon Wins French Socialist Party’s Pre... \n", + "8 Excerpts From a Draft Script for Donald Trump’... \n", + "9 A Back-Channel Plan for Ukraine and Russia, Co... \n", + "\n", + " text \n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 Ever get the feeling your life circles the rou... \n", + "2 Why the Truth Might Get You Fired October 29, ... \n", + "3 Videos 15 Civilians Killed In Single US Airstr... \n", + "4 Print \\nAn Iranian woman has been sentenced to... \n", + "5 In these trying times, Jackie Mason is the Voi... \n", + "6 Ever wonder how Britain’s most iconic pop pian... \n", + "7 PARIS — France chose an idealistic, traditi... \n", + "8 Donald J. Trump is scheduled to make a highly ... \n", + "9 A week before Michael T. Flynn resigned as nat... " + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "def fill_data(data):\n", + " data[\"title\"] = data[\"title\"].fillna(\"Has No Title\")\n", + " data[\"text\"] = data[\"text\"].fillna(\"Has No text\")\n", + " return data\n", + "\n", + "train_df= fill_data(train_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1mCOLUMN\u001b[0m \u001b[1mNULL VALUES COUNT\u001b[0m\n", + "title 0\n", + "text 0\n" + ] + } + ], + "source": [ + "data_qualityCheck()" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titletext
0House Dem Aide: We Didn’t Even See Comey’s Let...House Dem Aide: We Didn’t Even See Comey’s Let...
1FLYNN: Hillary Clinton, Big Woman on Campus - ...Ever get the feeling your life circles the rou...
2Why the Truth Might Get You FiredWhy the Truth Might Get You Fired October 29, ...
315 Civilians Killed In Single US Airstrike Hav...Videos 15 Civilians Killed In Single US Airstr...
4Iranian woman jailed for fictional unpublished...Print \\nAn Iranian woman has been sentenced to...
5Jackie Mason: Hollywood Would Love Trump if He...In these trying times, Jackie Mason is the Voi...
6Life: Life Of Luxury: Elton John’s 6 Favorite ...Ever wonder how Britain’s most iconic pop pian...
7Benoît Hamon Wins French Socialist Party’s Pre...PARIS — France chose an idealistic, traditi...
8Excerpts From a Draft Script for Donald Trump’...Donald J. Trump is scheduled to make a highly ...
9A Back-Channel Plan for Ukraine and Russia, Co...A week before Michael T. Flynn resigned as nat...
10Obama’s Organizing for Action Partners with So...Organizing for Action, the activist group that...
11BBC Comedy Sketch \"Real Housewives of ISIS\" Ca...The BBC produced spoof on the “Real Housewives...
12Russian Researchers Discover Secret Nazi Milit...The mystery surrounding The Third Reich and Na...
13US Officials See No Link Between Trump and RussiaClinton Campaign Demands FBI Affirm Trump's Ru...
14Re: Yes, There Are Paid Government Trolls On S...Yes, There Are Paid Government Trolls On Socia...
15In Major League Soccer, Argentines Find a Home...Guillermo Barros Schelotto was not the first A...
16Wells Fargo Chief Abruptly Steps Down - The Ne...The scandal engulfing Wells Fargo toppled its ...
17Anonymous Donor Pays $2.5 Million To Release E...A Caddo Nation tribal leader has just been fre...
18FBI Closes In On Hillary!FBI Closes In On Hillary! Posted on Home » Hea...
19Chuck Todd: ’BuzzFeed Did Donald Trump a Polit...Wednesday after Donald Trump’s press confere...
\n", + "
" + ], + "text/plain": [ + " title \\\n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 FLYNN: Hillary Clinton, Big Woman on Campus - ... \n", + "2 Why the Truth Might Get You Fired \n", + "3 15 Civilians Killed In Single US Airstrike Hav... \n", + "4 Iranian woman jailed for fictional unpublished... \n", + "5 Jackie Mason: Hollywood Would Love Trump if He... \n", + "6 Life: Life Of Luxury: Elton John’s 6 Favorite ... \n", + "7 Benoît Hamon Wins French Socialist Party’s Pre... \n", + "8 Excerpts From a Draft Script for Donald Trump’... \n", + "9 A Back-Channel Plan for Ukraine and Russia, Co... \n", + "10 Obama’s Organizing for Action Partners with So... \n", + "11 BBC Comedy Sketch \"Real Housewives of ISIS\" Ca... \n", + "12 Russian Researchers Discover Secret Nazi Milit... \n", + "13 US Officials See No Link Between Trump and Russia \n", + "14 Re: Yes, There Are Paid Government Trolls On S... \n", + "15 In Major League Soccer, Argentines Find a Home... \n", + "16 Wells Fargo Chief Abruptly Steps Down - The Ne... \n", + "17 Anonymous Donor Pays $2.5 Million To Release E... \n", + "18 FBI Closes In On Hillary! \n", + "19 Chuck Todd: ’BuzzFeed Did Donald Trump a Polit... \n", + "\n", + " text \n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 Ever get the feeling your life circles the rou... \n", + "2 Why the Truth Might Get You Fired October 29, ... \n", + "3 Videos 15 Civilians Killed In Single US Airstr... \n", + "4 Print \\nAn Iranian woman has been sentenced to... \n", + "5 In these trying times, Jackie Mason is the Voi... \n", + "6 Ever wonder how Britain’s most iconic pop pian... \n", + "7 PARIS — France chose an idealistic, traditi... \n", + "8 Donald J. Trump is scheduled to make a highly ... \n", + "9 A week before Michael T. Flynn resigned as nat... \n", + "10 Organizing for Action, the activist group that... \n", + "11 The BBC produced spoof on the “Real Housewives... \n", + "12 The mystery surrounding The Third Reich and Na... \n", + "13 Clinton Campaign Demands FBI Affirm Trump's Ru... \n", + "14 Yes, There Are Paid Government Trolls On Socia... \n", + "15 Guillermo Barros Schelotto was not the first A... \n", + "16 The scandal engulfing Wells Fargo toppled its ... \n", + "17 A Caddo Nation tribal leader has just been fre... \n", + "18 FBI Closes In On Hillary! Posted on Home » Hea... \n", + "19 Wednesday after Donald Trump’s press confere... " + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "train_df[\"new_text\"] = train_df[\"title\"] + \" \" + train_df[\"text\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titletextnew_text
0House Dem Aide: We Didn’t Even See Comey’s Let...House Dem Aide: We Didn’t Even See Comey’s Let...House Dem Aide: We Didn’t Even See Comey’s Let...
1FLYNN: Hillary Clinton, Big Woman on Campus - ...Ever get the feeling your life circles the rou...FLYNN: Hillary Clinton, Big Woman on Campus - ...
2Why the Truth Might Get You FiredWhy the Truth Might Get You Fired October 29, ...Why the Truth Might Get You Fired Why the Trut...
315 Civilians Killed In Single US Airstrike Hav...Videos 15 Civilians Killed In Single US Airstr...15 Civilians Killed In Single US Airstrike Hav...
4Iranian woman jailed for fictional unpublished...Print \\nAn Iranian woman has been sentenced to...Iranian woman jailed for fictional unpublished...
5Jackie Mason: Hollywood Would Love Trump if He...In these trying times, Jackie Mason is the Voi...Jackie Mason: Hollywood Would Love Trump if He...
6Life: Life Of Luxury: Elton John’s 6 Favorite ...Ever wonder how Britain’s most iconic pop pian...Life: Life Of Luxury: Elton John’s 6 Favorite ...
7Benoît Hamon Wins French Socialist Party’s Pre...PARIS — France chose an idealistic, traditi...Benoît Hamon Wins French Socialist Party’s Pre...
8Excerpts From a Draft Script for Donald Trump’...Donald J. Trump is scheduled to make a highly ...Excerpts From a Draft Script for Donald Trump’...
9A Back-Channel Plan for Ukraine and Russia, Co...A week before Michael T. Flynn resigned as nat...A Back-Channel Plan for Ukraine and Russia, Co...
10Obama’s Organizing for Action Partners with So...Organizing for Action, the activist group that...Obama’s Organizing for Action Partners with So...
11BBC Comedy Sketch \"Real Housewives of ISIS\" Ca...The BBC produced spoof on the “Real Housewives...BBC Comedy Sketch \"Real Housewives of ISIS\" Ca...
12Russian Researchers Discover Secret Nazi Milit...The mystery surrounding The Third Reich and Na...Russian Researchers Discover Secret Nazi Milit...
13US Officials See No Link Between Trump and RussiaClinton Campaign Demands FBI Affirm Trump's Ru...US Officials See No Link Between Trump and Rus...
14Re: Yes, There Are Paid Government Trolls On S...Yes, There Are Paid Government Trolls On Socia...Re: Yes, There Are Paid Government Trolls On S...
15In Major League Soccer, Argentines Find a Home...Guillermo Barros Schelotto was not the first A...In Major League Soccer, Argentines Find a Home...
16Wells Fargo Chief Abruptly Steps Down - The Ne...The scandal engulfing Wells Fargo toppled its ...Wells Fargo Chief Abruptly Steps Down - The Ne...
17Anonymous Donor Pays $2.5 Million To Release E...A Caddo Nation tribal leader has just been fre...Anonymous Donor Pays $2.5 Million To Release E...
18FBI Closes In On Hillary!FBI Closes In On Hillary! Posted on Home » Hea...FBI Closes In On Hillary! FBI Closes In On Hil...
19Chuck Todd: ’BuzzFeed Did Donald Trump a Polit...Wednesday after Donald Trump’s press confere...Chuck Todd: ’BuzzFeed Did Donald Trump a Polit...
\n", + "
" + ], + "text/plain": [ + " title \\\n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 FLYNN: Hillary Clinton, Big Woman on Campus - ... \n", + "2 Why the Truth Might Get You Fired \n", + "3 15 Civilians Killed In Single US Airstrike Hav... \n", + "4 Iranian woman jailed for fictional unpublished... \n", + "5 Jackie Mason: Hollywood Would Love Trump if He... \n", + "6 Life: Life Of Luxury: Elton John’s 6 Favorite ... \n", + "7 Benoît Hamon Wins French Socialist Party’s Pre... \n", + "8 Excerpts From a Draft Script for Donald Trump’... \n", + "9 A Back-Channel Plan for Ukraine and Russia, Co... \n", + "10 Obama’s Organizing for Action Partners with So... \n", + "11 BBC Comedy Sketch \"Real Housewives of ISIS\" Ca... \n", + "12 Russian Researchers Discover Secret Nazi Milit... \n", + "13 US Officials See No Link Between Trump and Russia \n", + "14 Re: Yes, There Are Paid Government Trolls On S... \n", + "15 In Major League Soccer, Argentines Find a Home... \n", + "16 Wells Fargo Chief Abruptly Steps Down - The Ne... \n", + "17 Anonymous Donor Pays $2.5 Million To Release E... \n", + "18 FBI Closes In On Hillary! \n", + "19 Chuck Todd: ’BuzzFeed Did Donald Trump a Polit... \n", + "\n", + " text \\\n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 Ever get the feeling your life circles the rou... \n", + "2 Why the Truth Might Get You Fired October 29, ... \n", + "3 Videos 15 Civilians Killed In Single US Airstr... \n", + "4 Print \\nAn Iranian woman has been sentenced to... \n", + "5 In these trying times, Jackie Mason is the Voi... \n", + "6 Ever wonder how Britain’s most iconic pop pian... \n", + "7 PARIS — France chose an idealistic, traditi... \n", + "8 Donald J. Trump is scheduled to make a highly ... \n", + "9 A week before Michael T. Flynn resigned as nat... \n", + "10 Organizing for Action, the activist group that... \n", + "11 The BBC produced spoof on the “Real Housewives... \n", + "12 The mystery surrounding The Third Reich and Na... \n", + "13 Clinton Campaign Demands FBI Affirm Trump's Ru... \n", + "14 Yes, There Are Paid Government Trolls On Socia... \n", + "15 Guillermo Barros Schelotto was not the first A... \n", + "16 The scandal engulfing Wells Fargo toppled its ... \n", + "17 A Caddo Nation tribal leader has just been fre... \n", + "18 FBI Closes In On Hillary! Posted on Home » Hea... \n", + "19 Wednesday after Donald Trump’s press confere... \n", + "\n", + " new_text \n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let... \n", + "1 FLYNN: Hillary Clinton, Big Woman on Campus - ... \n", + "2 Why the Truth Might Get You Fired Why the Trut... \n", + "3 15 Civilians Killed In Single US Airstrike Hav... \n", + "4 Iranian woman jailed for fictional unpublished... \n", + "5 Jackie Mason: Hollywood Would Love Trump if He... \n", + "6 Life: Life Of Luxury: Elton John’s 6 Favorite ... \n", + "7 Benoît Hamon Wins French Socialist Party’s Pre... \n", + "8 Excerpts From a Draft Script for Donald Trump’... \n", + "9 A Back-Channel Plan for Ukraine and Russia, Co... \n", + "10 Obama’s Organizing for Action Partners with So... \n", + "11 BBC Comedy Sketch \"Real Housewives of ISIS\" Ca... \n", + "12 Russian Researchers Discover Secret Nazi Milit... \n", + "13 US Officials See No Link Between Trump and Rus... \n", + "14 Re: Yes, There Are Paid Government Trolls On S... \n", + "15 In Major League Soccer, Argentines Find a Home... \n", + "16 Wells Fargo Chief Abruptly Steps Down - The Ne... \n", + "17 Anonymous Donor Pays $2.5 Million To Release E... \n", + "18 FBI Closes In On Hillary! FBI Closes In On Hil... \n", + "19 Chuck Todd: ’BuzzFeed Did Donald Trump a Polit... " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "train_df=train_df.drop(['title','text'],axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
new_text
0House Dem Aide: We Didn’t Even See Comey’s Let...
1FLYNN: Hillary Clinton, Big Woman on Campus - ...
2Why the Truth Might Get You Fired Why the Trut...
315 Civilians Killed In Single US Airstrike Hav...
4Iranian woman jailed for fictional unpublished...
\n", + "
" + ], + "text/plain": [ + " new_text\n", + "0 House Dem Aide: We Didn’t Even See Comey’s Let...\n", + "1 FLYNN: Hillary Clinton, Big Woman on Campus - ...\n", + "2 Why the Truth Might Get You Fired Why the Trut...\n", + "3 15 Civilians Killed In Single US Airstrike Hav...\n", + "4 Iranian woman jailed for fictional unpublished..." + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "custom_download_dir = \"C:\\\\Users\\\\ysach/nltk\"\n", + "nltk.data.path.append(custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package stopwords to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package stopwords is already up-to-date!\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nltk.download('stopwords',download_dir=custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.51602Z", + "iopub.status.busy": "2021-05-25T06:50:32.515411Z", + "iopub.status.idle": "2021-05-25T06:50:32.531829Z", + "shell.execute_reply": "2021-05-25T06:50:32.530895Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.515972Z" + } + }, + "outputs": [], + "source": [ + "lemmatizer = WordNetLemmatizer()\n", + "stpwrds = list(stopwords.words('english'))" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['i',\n", + " 'me',\n", + " 'my',\n", + " 'myself',\n", + " 'we',\n", + " 'our',\n", + " 'ours',\n", + " 'ourselves',\n", + " 'you',\n", + " \"you're\",\n", + " \"you've\",\n", + " \"you'll\",\n", + " \"you'd\",\n", + " 'your',\n", + " 'yours',\n", + " 'yourself',\n", + " 'yourselves',\n", + " 'he',\n", + " 'him',\n", + " 'his',\n", + " 'himself',\n", + " 'she',\n", + " \"she's\",\n", + " 'her',\n", + " 'hers',\n", + " 'herself',\n", + " 'it',\n", + " \"it's\",\n", + " 'its',\n", + " 'itself',\n", + " 'they',\n", + " 'them',\n", + " 'their',\n", + " 'theirs',\n", + " 'themselves',\n", + " 'what',\n", + " 'which',\n", + " 'who',\n", + " 'whom',\n", + " 'this',\n", + " 'that',\n", + " \"that'll\",\n", + " 'these',\n", + " 'those',\n", + " 'am',\n", + " 'is',\n", + " 'are',\n", + " 'was',\n", + " 'were',\n", + " 'be',\n", + " 'been',\n", + " 'being',\n", + " 'have',\n", + " 'has',\n", + " 'had',\n", + " 'having',\n", + " 'do',\n", + " 'does',\n", + " 'did',\n", + " 'doing',\n", + " 'a',\n", + " 'an',\n", + " 'the',\n", + " 'and',\n", + " 'but',\n", + " 'if',\n", + " 'or',\n", + " 'because',\n", + " 'as',\n", + " 'until',\n", + " 'while',\n", + " 'of',\n", + " 'at',\n", + " 'by',\n", + " 'for',\n", + " 'with',\n", + " 'about',\n", + " 'against',\n", + " 'between',\n", + " 'into',\n", + " 'through',\n", + " 'during',\n", + " 'before',\n", + " 'after',\n", + " 'above',\n", + " 'below',\n", + " 'to',\n", + " 'from',\n", + " 'up',\n", + " 'down',\n", + " 'in',\n", + " 'out',\n", + " 'on',\n", + " 'off',\n", + " 'over',\n", + " 'under',\n", + " 'again',\n", + " 'further',\n", + " 'then',\n", + " 'once',\n", + " 'here',\n", + " 'there',\n", + " 'when',\n", + " 'where',\n", + " 'why',\n", + " 'how',\n", + " 'all',\n", + " 'any',\n", + " 'both',\n", + " 'each',\n", + " 'few',\n", + " 'more',\n", + " 'most',\n", + " 'other',\n", + " 'some',\n", + " 'such',\n", + " 'no',\n", + " 'nor',\n", + " 'not',\n", + " 'only',\n", + " 'own',\n", + " 'same',\n", + " 'so',\n", + " 'than',\n", + " 'too',\n", + " 'very',\n", + " 's',\n", + " 't',\n", + " 'can',\n", + " 'will',\n", + " 'just',\n", + " 'don',\n", + " \"don't\",\n", + " 'should',\n", + " \"should've\",\n", + " 'now',\n", + " 'd',\n", + " 'll',\n", + " 'm',\n", + " 'o',\n", + " 're',\n", + " 've',\n", + " 'y',\n", + " 'ain',\n", + " 'aren',\n", + " \"aren't\",\n", + " 'couldn',\n", + " \"couldn't\",\n", + " 'didn',\n", + " \"didn't\",\n", + " 'doesn',\n", + " \"doesn't\",\n", + " 'hadn',\n", + " \"hadn't\",\n", + " 'hasn',\n", + " \"hasn't\",\n", + " 'haven',\n", + " \"haven't\",\n", + " 'isn',\n", + " \"isn't\",\n", + " 'ma',\n", + " 'mightn',\n", + " \"mightn't\",\n", + " 'mustn',\n", + " \"mustn't\",\n", + " 'needn',\n", + " \"needn't\",\n", + " 'shan',\n", + " \"shan't\",\n", + " 'shouldn',\n", + " \"shouldn't\",\n", + " 'wasn',\n", + " \"wasn't\",\n", + " 'weren',\n", + " \"weren't\",\n", + " 'won',\n", + " \"won't\",\n", + " 'wouldn',\n", + " \"wouldn't\"]" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "stpwrds" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package punkt to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package punkt is already up-to-date!\n", + "[nltk_data] Downloading package wordnet to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package wordnet is already up-to-date!\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nltk.download('punkt',download_dir=custom_download_dir)\n", + "nltk.download('wordnet',download_dir=custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package omw-1.4 to C:\\Users\\ysach/nltk...\n", + "[nltk_data] Package omw-1.4 is already up-to-date!\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nltk.download('omw-1.4',download_dir=custom_download_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T06:50:32.54905Z", + "iopub.status.busy": "2021-05-25T06:50:32.548517Z", + "iopub.status.idle": "2021-05-25T06:53:51.648153Z", + "shell.execute_reply": "2021-05-25T06:53:51.647283Z", + "shell.execute_reply.started": "2021-05-25T06:50:32.549015Z" + } + }, + "outputs": [], + "source": [ + "for x in range(len(train_df)) :\n", + " corpus = []\n", + " review = train_df['new_text'][x]\n", + " review = re.sub(r'[^a-zA-Z\\s]', '', review)\n", + " review = review.lower()\n", + " review = nltk.word_tokenize(review)\n", + " for y in review :\n", + " if y not in stpwrds :\n", + " corpus.append(lemmatizer.lemmatize(y))\n", + " review = ' '.join(corpus)\n", + " train_df['new_text'][x] = review" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:14:51.798724Z", + "iopub.status.busy": "2021-05-25T07:14:51.798361Z", + "iopub.status.idle": "2021-05-25T07:14:51.805617Z", + "shell.execute_reply": "2021-05-25T07:14:51.804946Z", + "shell.execute_reply.started": "2021-05-25T07:14:51.798694Z" + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'actor steven seagal live tv erupts hillary obama intense comment actor steven seagal stood america rest hollywood remains silent week rough country first democratic nominee hillary clinton collapsed memorial called million hardworking american deplorable werent enough nfl player throughout country blatantly disrespecting american flag needle say seagal enough think important job secretary state ensuring people dont get killed seagal tweeted cant email protected pneumonia going disastrous american people notohillary continued course seagal quickly became target liberal fire comment refused break particularly lost one twitter user tried argued hillary capable presidency capable capable leaving american die capable disregarding law capable disrespecting rape survivor argued went address race relation united state true role president barack obama played social evolution country obama abysmal race relation usa truth need start honest dialog wrote seagal concluded pointing irony attack receiving liberal everywhere best thing worldmaking one statement freedom getting attacked every demo hypocritical tweeted america without democrat white house safer america think seagals comment'" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df['new_text'][2188]" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train[2188]" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:16:37.152728Z", + "iopub.status.busy": "2021-05-25T07:16:37.152216Z", + "iopub.status.idle": "2021-05-25T07:16:37.163059Z", + "shell.execute_reply": "2021-05-25T07:16:37.161884Z", + "shell.execute_reply.started": "2021-05-25T07:16:37.152696Z" + } + }, + "outputs": [], + "source": [ + "X_train= train_df['new_text']" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 house dem aide didnt even see comeys letter ja...\n", + "1 flynn hillary clinton big woman campus breitba...\n", + "2 truth might get fired truth might get fired oc...\n", + "3 civilian killed single u airstrike identified ...\n", + "4 iranian woman jailed fictional unpublished sto...\n", + " ... \n", + "20795 rapper ti trump poster child white supremacy r...\n", + "20796 nfl playoff schedule matchup odds new york tim...\n", + "20797 macys said receive takeover approach hudson ba...\n", + "20798 nato russia hold parallel exercise balkan nato...\n", + "20799 keep f alive david swanson author activist jou...\n", + "Name: new_text, Length: 20800, dtype: object" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_train" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:17:50.592597Z", + "iopub.status.busy": "2021-05-25T07:17:50.592095Z", + "iopub.status.idle": "2021-05-25T07:17:50.598862Z", + "shell.execute_reply": "2021-05-25T07:17:50.597641Z", + "shell.execute_reply.started": "2021-05-25T07:17:50.592566Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(20800,)" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_train.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:18:05.89317Z", + "iopub.status.busy": "2021-05-25T07:18:05.892651Z", + "iopub.status.idle": "2021-05-25T07:18:05.902743Z", + "shell.execute_reply": "2021-05-25T07:18:05.901523Z", + "shell.execute_reply.started": "2021-05-25T07:18:05.893127Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(20800,)" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "label_train.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.\n", + "\n" + ] + } + ], + "source": [ + "from keras.preprocessing.text import Tokenizer\n", + "from keras.preprocessing.sequence import pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The Padding Sequance Shape is --> (20800, 12140)\n" + ] + } + ], + "source": [ + "tokenize = Tokenizer(oov_token=\"\")\n", + "tokenize.fit_on_texts(X_train)\n", + "word_idx = tokenize.word_index\n", + "\n", + "text2seq = tokenize.texts_to_sequences(X_train)\n", + "\n", + "# pad_seq = pad_sequences(text2seq, maxlen=150, padding=\"pre\", truncating=\"pre\")\n", + "\n", + "pad_seq = pad_sequences(text2seq, padding=\"pre\", truncating=\"pre\")\n", + "\n", + "\n", + "print(\"The Padding Sequance Shape is --> \", pad_seq.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [], + "source": [ + "input_length = max(len(seq) for seq in text2seq)\n", + "\n", + "vocabulary_size = len(word_idx) + 1" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The maximum Sequance Length is --> 12140\n", + "The vocabulary size of dataset is --> 166055\n" + ] + } + ], + "source": [ + "print(\"The maximum Sequance Length is --> \", input_length)\n", + "print(\"The vocabulary size of dataset is --> \", vocabulary_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.feature_extraction.text import TfidfVectorizer\n", + "from sklearn.feature_extraction.text import CountVectorizer\n", + "vectorizer = CountVectorizer(\n", + " ngram_range=(1,1),\n", + " max_features=250\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 118, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:18:10.901469Z", + "iopub.status.busy": "2021-05-25T07:18:10.901136Z", + "iopub.status.idle": "2021-05-25T07:18:22.003384Z", + "shell.execute_reply": "2021-05-25T07:18:22.002314Z", + "shell.execute_reply.started": "2021-05-25T07:18:10.90144Z" + } + }, + "outputs": [], + "source": [ + "#tfidf_v = TfidfVectorizer()\n", + "#tfidf_X_train = vectorizer.fit_transform(X_train)\n", + "#tfidf_X_test = vectorizer.transform(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 119, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:18:24.321674Z", + "iopub.status.busy": "2021-05-25T07:18:24.321329Z", + "iopub.status.idle": "2021-05-25T07:18:24.327063Z", + "shell.execute_reply": "2021-05-25T07:18:24.325975Z", + "shell.execute_reply.started": "2021-05-25T07:18:24.321644Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7168, 250)" + ] + }, + "execution_count": 119, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#tfidf_X_train.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:18:31.418929Z", + "iopub.status.busy": "2021-05-25T07:18:31.418573Z", + "iopub.status.idle": "2021-05-25T07:18:31.427535Z", + "shell.execute_reply": "2021-05-25T07:18:31.426865Z", + "shell.execute_reply.started": "2021-05-25T07:18:31.418889Z" + } + }, + "outputs": [], + "source": [ + "def plot_confusion_matrix(cm, classes,\n", + " normalize=False,\n", + " title='Confusion matrix',\n", + " cmap=plt.cm.GnBu):\n", + " \n", + " plt.imshow(cm, interpolation='nearest', cmap=cmap)\n", + " plt.title(title)\n", + " plt.colorbar()\n", + " tick_marks = np.arange(len(classes))\n", + " plt.xticks(tick_marks, classes, rotation=45)\n", + " plt.yticks(tick_marks, classes)\n", + "\n", + " if normalize:\n", + " cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n", + " print(\"Normalized confusion matrix\")\n", + " else:\n", + " print('Confusion matrix, without normalization')\n", + "\n", + " thresh = cm.max() / 2.\n", + " for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n", + " plt.text(j, i, cm[i, j],\n", + " horizontalalignment=\"center\",\n", + " color=\"white\" if cm[i, j] > thresh else \"black\")\n", + "\n", + " plt.tight_layout()\n", + " plt.ylabel('True label')\n", + " plt.xlabel('Predicted label')" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: tensorflow in c:\\users\\ysach\\anaconda3\\lib\\site-packages (2.15.0)\n", + "Requirement already satisfied: tensorflow-intel==2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow) (2.15.0)\n", + "Requirement already satisfied: opt-einsum>=2.3.2 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.3.0)\n", + "Requirement already satisfied: six>=1.12.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.16.0)\n", + "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.31.0)\n", + "Requirement already satisfied: setuptools in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (65.6.3)\n", + "Requirement already satisfied: keras<2.16,>=2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.0)\n", + "Requirement already satisfied: wrapt<1.15,>=1.11.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.14.1)\n", + "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.5.4)\n", + "Requirement already satisfied: libclang>=13.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (16.0.6)\n", + "Requirement already satisfied: packaging in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (22.0)\n", + "Requirement already satisfied: grpcio<2.0,>=1.24.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.60.0)\n", + "Requirement already satisfied: astunparse>=1.6.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.6.3)\n", + "Requirement already satisfied: typing-extensions>=3.6.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (4.4.0)\n", + "Requirement already satisfied: tensorflow-estimator<2.16,>=2.15.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.0)\n", + "Requirement already satisfied: google-pasta>=0.1.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.2.0)\n", + "Requirement already satisfied: absl-py>=1.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.0.0)\n", + "Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (4.23.4)\n", + "Requirement already satisfied: flatbuffers>=23.5.26 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (23.5.26)\n", + "Requirement already satisfied: ml-dtypes~=0.2.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (0.2.0)\n", + "Requirement already satisfied: tensorboard<2.16,>=2.15 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.15.1)\n", + "Requirement already satisfied: termcolor>=1.1.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (2.4.0)\n", + "Requirement already satisfied: numpy<2.0.0,>=1.23.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (1.23.5)\n", + "Requirement already satisfied: h5py>=2.9.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorflow-intel==2.15.0->tensorflow) (3.7.0)\n", + "Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from astunparse>=1.6.0->tensorflow-intel==2.15.0->tensorflow) (0.38.4)\n", + "Requirement already satisfied: google-auth<3,>=1.6.3 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.25.2)\n", + "Requirement already satisfied: markdown>=2.6.8 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.4.1)\n", + "Requirement already satisfied: google-auth-oauthlib<2,>=0.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.2.0)\n", + "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.7.2)\n", + "Requirement already satisfied: requests<3,>=2.21.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.28.1)\n", + "Requirement already satisfied: werkzeug>=1.0.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.2.2)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (4.9)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.2.8)\n", + "Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (5.3.2)\n", + "Requirement already satisfied: requests-oauthlib>=0.7.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.3.1)\n", + "Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (1.26.14)\n", + "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.4)\n", + "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2023.11.17)\n", + "Requirement already satisfied: charset-normalizer<3,>=2 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.0.4)\n", + "Requirement already satisfied: MarkupSafe>=2.1.1 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from werkzeug>=1.0.1->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (2.1.1)\n", + "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (0.4.8)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in c:\\users\\ysach\\anaconda3\\lib\\site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<2,>=0.5->tensorboard<2.16,>=2.15->tensorflow-intel==2.15.0->tensorflow) (3.2.2)\n" + ] + } + ], + "source": [ + "!pip install tensorflow" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [], + "source": [ + "import keras\n", + "from keras.models import Sequential\n", + "from keras.utils import to_categorical\n", + "from keras import metrics as metrics1\n", + "from keras.layers import LeakyReLU\n", + "from keras.layers import Dense, Embedding, GlobalAveragePooling1D, LSTM, Bidirectional" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [], + "source": [ + "x_train1, x_test, y_train1, y_test = train_test_split(pad_seq, label_train, train_size=0.7)" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\backend.py:873: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.\n", + "\n", + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\optimizers\\__init__.py:309: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "classifier = Sequential()\n", + "classifier.add(Embedding(vocabulary_size, 182, input_length=input_length))\n", + "classifier.add(GlobalAveragePooling1D())\n", + "classifier.add(Dense(96, activation='relu'))\n", + "classifier.add(Dense(24, activation='relu'))\n", + "classifier.add(Dense(1, activation='sigmoid'))\n", + "\n", + "# Compile the model\n", + "classifier.compile(optimizer='adam',\n", + " loss='binary_crossentropy',\n", + " metrics=['accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model: \"sequential\"\n", + "_________________________________________________________________\n", + " Layer (type) Output Shape Param # \n", + "=================================================================\n", + " embedding (Embedding) (None, 12140, 182) 30222010 \n", + " \n", + " global_average_pooling1d ( (None, 182) 0 \n", + " GlobalAveragePooling1D) \n", + " \n", + " dense (Dense) (None, 96) 17568 \n", + " \n", + " dense_1 (Dense) (None, 24) 2328 \n", + " \n", + " dense_2 (Dense) (None, 1) 25 \n", + " \n", + "=================================================================\n", + "Total params: 30241931 (115.36 MB)\n", + "Trainable params: 30241931 (115.36 MB)\n", + "Non-trainable params: 0 (0.00 Byte)\n", + "_________________________________________________________________\n" + ] + } + ], + "source": [ + "classifier.summary()" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/10\n", + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\utils\\tf_utils.py:492: The name tf.ragged.RaggedTensorValue is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.\n", + "\n", + "WARNING:tensorflow:From c:\\Users\\ysach\\anaconda3\\lib\\site-packages\\keras\\src\\engine\\base_layer_utils.py:384: The name tf.executing_eagerly_outside_functions is deprecated. Please use tf.compat.v1.executing_eagerly_outside_functions instead.\n", + "\n", + "455/455 [==============================] - 422s 923ms/step - loss: 0.6866 - accuracy: 0.5386 - val_loss: 0.6534 - val_accuracy: 0.5832\n", + "Epoch 2/10\n", + "455/455 [==============================] - 433s 952ms/step - loss: 0.4281 - accuracy: 0.8095 - val_loss: 0.3156 - val_accuracy: 0.8345\n", + "Epoch 3/10\n", + "455/455 [==============================] - 422s 927ms/step - loss: 0.2246 - accuracy: 0.9132 - val_loss: 0.2006 - val_accuracy: 0.9226\n", + "Epoch 4/10\n", + "455/455 [==============================] - 418s 919ms/step - loss: 0.1441 - accuracy: 0.9494 - val_loss: 0.1607 - val_accuracy: 0.9502\n", + "Epoch 5/10\n", + "455/455 [==============================] - 414s 910ms/step - loss: 0.1020 - accuracy: 0.9671 - val_loss: 0.1505 - val_accuracy: 0.9535\n", + "Epoch 6/10\n", + "455/455 [==============================] - 413s 909ms/step - loss: 0.0765 - accuracy: 0.9750 - val_loss: 0.1286 - val_accuracy: 0.9564\n", + "Epoch 7/10\n", + "455/455 [==============================] - 428s 942ms/step - loss: 0.0586 - accuracy: 0.9812 - val_loss: 0.1270 - val_accuracy: 0.9583\n", + "Epoch 8/10\n", + "455/455 [==============================] - 423s 930ms/step - loss: 0.0476 - accuracy: 0.9840 - val_loss: 0.1698 - val_accuracy: 0.9441\n", + "Epoch 9/10\n", + "455/455 [==============================] - 414s 911ms/step - loss: 0.0311 - accuracy: 0.9912 - val_loss: 0.1222 - val_accuracy: 0.9617\n", + "Epoch 10/10\n", + "455/455 [==============================] - 411s 904ms/step - loss: 0.0303 - accuracy: 0.9908 - val_loss: 0.1265 - val_accuracy: 0.9627\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "classifier.fit(x_train1,y_train1,epochs=10,validation_data=(x_test, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "196/196 [==============================] - 29s 148ms/step\n" + ] + } + ], + "source": [ + "Y_pred = classifier.predict(x_test)\n", + "a=[]\n", + "for x in Y_pred:\n", + " if x>=0.5:\n", + " a.append(1)\n", + " else:\n", + " a.append(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": { + "execution": { + "iopub.execute_input": "2021-05-25T07:18:41.422338Z", + "iopub.status.busy": "2021-05-25T07:18:41.421887Z", + "iopub.status.idle": "2021-05-25T07:18:41.673492Z", + "shell.execute_reply": "2021-05-25T07:18:41.672498Z", + "shell.execute_reply.started": "2021-05-25T07:18:41.422308Z" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy: 96.27%\n", + "Confusion matrix, without normalization\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "score = accuracy_score(y_test, a)\n", + "print(f'Accuracy: {round(score*100,2)}%')\n", + "cm = confusion_matrix(y_test, a)\n", + "plot_confusion_matrix(cm, classes=['FAKE Data', 'REAL Data'])" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "from keras.layers import SimpleRNN,LSTM" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": {}, + "outputs": [], + "source": [ + "model = Sequential()\n", + "model.add(Embedding(vocabulary_size, 100, input_length=input_length))\n", + "model.add(SimpleRNN(units=10, return_sequences=False))\n", + "model.add(Dense(units=1))" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [], + "source": [ + "model.compile(optimizer='adam',\n", + " loss='binary_crossentropy',\n", + " metrics=['accuracy'])" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model: \"sequential_3\"\n", + "_________________________________________________________________\n", + " Layer (type) Output Shape Param # \n", + "=================================================================\n", + " embedding_3 (Embedding) (None, 12140, 100) 16605500 \n", + " \n", + " simple_rnn (SimpleRNN) (None, 10) 1110 \n", + " \n", + " dense_5 (Dense) (None, 1) 11 \n", + " \n", + "=================================================================\n", + "Total params: 16606621 (63.35 MB)\n", + "Trainable params: 16606621 (63.35 MB)\n", + "Non-trainable params: 0 (0.00 Byte)\n", + "_________________________________________________________________\n" + ] + } + ], + "source": [ + "model.summary()" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/3\n", + "455/455 [==============================] - 2882s 6s/step - loss: 0.7130 - accuracy: 0.8058 - val_loss: 0.6941 - val_accuracy: 0.8763\n", + "Epoch 2/3\n", + "455/455 [==============================] - 3135s 7s/step - loss: 0.2564 - accuracy: 0.9559 - val_loss: 0.8114 - val_accuracy: 0.7906\n", + "Epoch 3/3\n", + "455/455 [==============================] - 3928s 9s/step - loss: 0.1439 - accuracy: 0.9792 - val_loss: 0.7665 - val_accuracy: 0.8503\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.fit(x_train1,y_train1,epochs=3,validation_data=(x_test, y_test))" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "196/196 [==============================] - 65s 328ms/step\n" + ] + } + ], + "source": [ + "Y_pred = model.predict(x_test)\n", + "a=[]\n", + "for x in Y_pred:\n", + " if x>=0.5:\n", + " a.append(1)\n", + " else:\n", + " a.append(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Accuracy: 85.03%\n", + "Confusion matrix, without normalization\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "#For second model\n", + "score = accuracy_score(y_test, a)\n", + "print(f'Accuracy: {round(score*100,2)}%')\n", + "cm = confusion_matrix(y_test, a)\n", + "plot_confusion_matrix(cm, classes=['FAKE Data', 'REAL Data'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#The first model performed better.The second model had good training accuracy but less test accuracy hinting towards overfitting.Maybe the key reason being in fake news it is important to capture overall sentiment better than individual word sentiment." + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": {}, + "outputs": [], + "source": [ + "def fake_news_det(news):\n", + " review = news\n", + " review = re.sub(r'[^a-zA-Z\\s]', '', review)\n", + " review = review.lower()\n", + " review = nltk.word_tokenize(review)\n", + " for y in review :\n", + " if y not in stpwrds :\n", + " corpus.append(lemmatizer.lemmatize(y)) \n", + " input_data = [' '.join(corpus)]\n", + " vectorized_input_data_pre = tokenize.texts_to_sequences(input_data)\n", + " vectorized_input_data=pad_sequences(vectorized_input_data_pre, padding=\"pre\", truncating=\"pre\")\n", + " prediction = classifier.predict(vectorized_input_data)\n", + " if prediction[0] == 1:\n", + " print(\"Prediction of the News : Looking Fake⚠ News📰 \")\n", + " else:\n", + " print(\"Prediction of the News : Looking Real News📰 \")" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1/1 [==============================] - 0s 86ms/step\n", + "Prediction of the News : Looking Fake⚠ News📰 \n" + ] + } + ], + "source": [ + "fake_news_det(\"actor steven seagal live tv erupts hillary obama intense comment actor steven seagal stood america rest hollywood remains silent week rough country first democratic nominee hillary clinton collapsed memorial called million hardworking american deplorable werent enough nfl player throughout country blatantly disrespecting american flag needle say seagal enough think important job secretary state ensuring people dont get killed seagal tweeted cant email protected pneumonia going disastrous american people notohillary continued course seagal quickly became target liberal fire comment refused break particularly lost one twitter user tried argued hillary capable presidency capable capable leaving american die capable disregarding law capable disrespecting rape survivor argued went address race relation united state true role president barack obama played social evolution country obama abysmal race relation usa truth need start honest dialog wrote seagal concluded pointing irony attack receiving liberal everywhere best thing worldmaking one statement freedom getting attacked every demo hypocritical tweeted america without democrat white house safer america think seagals comment\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/Fake-News-Classification/README.md b/Fake-News-Classification/README.md new file mode 100644 index 000000000..d22ba02d2 --- /dev/null +++ b/Fake-News-Classification/README.md @@ -0,0 +1,97 @@ +# Fake News Classification using DL + +## PROJECT TITLE + +Fake News Detection using Deep Learning + +## GOAL + +To identify whether the given news is fake or not. + +## DATASET + +The link for the dataset used in this project: https://www.kaggle.com/competitions/fake-news/data?select=train.csv + + +## DESCRIPTION + +This project aims to identify whether the given news is fake or not by extracting meaning and semantics of the given news. + +## WHAT I HAD DONE + +1. Data collection: From the link of the dataset given above. +2. Data preprocessing: Preprocessed the news by combining title and text to create a new feature and did some augementation like tokeinizing and vectorising before passing them to model training +3. Model selection: Self Designed model having a Embedding Layer followed by Global Pooling Layer and then 2 Dense layers and then output layer.Second model had a Embedding layer followed by a RNN layer and a Dense output layer. +4. Comparative analysis: Compared the accuracy score of all the models. + +## MODELS SUMMARY + +Model-1: "sequential" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding (Embedding) (None, 12140, 182) 30222010 + + global_average_pooling1d ( (None, 182) 0 + GlobalAveragePooling1D) + + dense (Dense) (None, 96) 17568 + + dense_1 (Dense) (None, 24) 2328 + + dense_2 (Dense) (None, 1) 25 + +================================================================= +Total params: 30241931 (115.36 MB) +Trainable params: 30241931 (115.36 MB) +Non-trainable params: 0 (0.00 Byte) + +Model-2: "sequential_3" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + embedding_3 (Embedding) (None, 12140, 100) 16605500 + + simple_rnn (SimpleRNN) (None, 10) 1110 + + dense_5 (Dense) (None, 1) 11 + +================================================================= +Total params: 16606621 (63.35 MB) +Trainable params: 16606621 (63.35 MB) +Non-trainable params: 0 (0.00 Byte) + +## LIBRARIES NEEDED + +The following libraries are required to run this project: + +- nltk +- pandas +- matplotlib +- tensorflow +- keras +- sklearn + +## EVALUATION METRICS + +The evaluation metrics I used to assess the models: + +- Accuracy +- Loss + +It is shown using Confusion Matrix in the Images folder + +## RESULTS +Results on Val dataset: +For Model-1: +Accuracy:96.11% +loss: 0.1350 + +For Model-2: +Accuracy:85.03% +loss: 0.1439 + +## CONCLUSION +Based on results we can draw following conclusions: + +1.The model-1 showed high validation accuracy of 96.11% and loss of 0.1350.Thus the model-1 worked fairly well identifying 2874 fake articles from a total of 3044.The first model performed better.The second model had good training accuracy but less test accuracy hinting towards overfitting.Maybe the key reason being in fake news it is important to capture overall sentiment better than individual word sentiment.