Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dinesh Kumar/Final_report_text_Sentiment_project(winter of code) #8

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions Dinesh-Kumar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<center><a href="https://winterofcode.com/"><img src="https://camo.githubusercontent.com/c73f77959233a8adb69f3dee7bbb3ba5e016f4239c7496c82538cc60c984f56e/68747470733a2f2f77696e7465726f66636f64652e636f6d2f7374617469632f6d656469612f6f72672d6c6f676f2e39333564376634382e706e67" alt="gsoc" height="50"/></a>
<a href="https://www.python.org/"><img src="https://www.python.org/static/community_logos/python-logo.png" height="45"/></a>
<a href="https://fury.gl/latest/community.html"><img src="https://raw.githubusercontent.com/divyake/Cysec-Hacktoberfest/dcc84465cfcff73981f8fcb5c8fe3b1710c007e1/assets/logo.svg" alt="DSC-IEM" height="45"/></a>
</center>
Comment on lines +1 to +4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these, the WoC team will take care of appropriate logos and other styling while putting the report website live


# Winter of Code Final Work Product
* **Name:** Dinesh Kumar
* **Organisation:** DSC-IEM
* **Project:** [TextSentimentAnalysis](https://github.com/khanfarhan10/TextSentimentAnalysis)

## Selection a Proper Dataset
* Getting proper data for training models suitable to our requirements is important.
* I have searched a lot of dataset like twitter analysis data and many more but at last i finalised **Amazon Fine Food Review** .
* I have choosed this dataset because it includes rating from **0-5 scores** for every individual review.
* The data span a period of more than 10 years, including all **~500,000 reviews** up to October 2012.
* It contains **huge** dataset due to this i have choosen this dataset for my project.
* images
<a href="https://ibb.co/26fcYVw"><img src="https://i.ibb.co/6ZfDJ92/dataset.jpg" alt="dataset" border="0" height="250" width="1000"></a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Markdown


## Data Preprocessing on dataset
* Before we move to train our model we have to do preprocessing so that we can **remove unwanted data**.
* So here in my dataset there are various columns of different values but for my project i have selected only **scores , id and reviews text** column for my project.
* after that there is score given 0-5 for reviews so i divided that into 3 categories **negative(score>3),Positive(score>3),neutral(score==3)**.
* after that i have checked that if any **duplicate values** are there than i have seen that there are duplicates values , I removed all duplicated value.
* Then i removed **html tags**, **special character** and **Tokenize** the reviews into word tokens.
* After that i split the review into words and then check if these words are in the **stop words** if so we will remove them, if not we will join.

## Featurisation , Tf-idf
* Now we have splitted my cleaned dataset into **train and test set** to work on that and build a gentle model.
* Then i featurised my dataset on tf-idf vectorizer and fit it as **tfidf_model.fit(reviews_train,sentiment_train)**.
* Then i transform it on train reviews as **reviews_train_tfidf=tfidf_model.transform(reviews_train)**.
* Then i used **WordCloud** to see **top 10 words** by importing Wordcloud in my model.
* images
<a href="https://ibb.co/G5V30Sj"><img src="https://i.ibb.co/MCR5s0L/top10-words.jpg" alt="top10-words" border="0" height="500" width="800"></a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Markdown


## Model Selection
* Model selection is very **key point** to make your project best in term of accuracy and precission.
* after applying **EDA** on dataset i tried **three algorithms** to train my model for better prediction.
* These are 3 algorithms :- **Logistic Regression, Naive Bayes and Decision tree**.
* From these three i have figureout one algorithm which will fit to my model.
* So on the basis of **parameters** and **accuracy** i choosed Naive Bayes for my model to train my model on it.
* images
<a href="https://imgbb.com/"><img src="https://i.ibb.co/30zykYJ/f1-scores.jpg" alt="f1-scores" border="0"></a>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Markdown


## Model Deployement
* For pre-check my model **frontend and backend** i deployed it on my local server, and it working efficiently and precisely.
* I have added feature of **prediction of sentiment** , **keywords extraction and showing** , **Polarity and Subjectivity** and **Summary**. these are some features which will comes up when we put sone text in my frontend and gives output after processing in backend.
* link of my web app
https://sentimentproject.herokuapp.com/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use markdown linking if you want it to open on clicking link