-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Digikala online market has recently published some open source data in various categories.
Since I always wanted to do some NLP project, so I thought of some useful tutorials in python for newcomers. I really hope this could be useful for you guys.
I still keep updating the package and also will share the link of video and article related to this post soon!
If you like the content, follow me on LinkedIn and upvote my data science skills in My LinkedIn Account
First you should run the 0 - data Wrangling.ipynb to preprocess the data before going for the rest of files and creating your models.
Use these conda commands to install the packages in environment:
conda install -c conda-forge --file requirements.txt
I got mini-version of digikala customers comment dataset from www.quera.ir which was uploaded for a AI competetion on 1398/08/16 and can be found here.
(Of course Needs authentication).
Full version available in these links:
for text preprocessing:
https://www.kaggle.com/sudalairajkumar/getting-started-with-text-preprocessing https://www.kaggle.com/kernels/scriptcontent/19201884/download
tfidf:
https://towardsdatascience.com/multi-label-text-classification-with-scikit-learn-30714b7819c5 https://kavita-ganesan.com/tfidftransformer-tfidfvectorizer-usage-differences/#.Xc3OG67ngRY
basic word2vec:
gensim:
keras with gensim:
https://www.depends-on-the-definition.com/guide-to-word-vectors-with-gensim-and-keras/
LSTM:
https://medium.com/free-code-camp/applied-introduction-to-lstms-for-text-generation-380158b29fb3