Natural-Language-Processing

This repository contains some assignment for NLP including:

Multilabel classification models using linear models to predict tags for a dataset of StackOverflow questions. This notebook compares BOW and TF-IDF approches.
A model for Name Entity Recognision NER problem using Bidirectional RNN models such as GRU in keras API. Evaluation of the model shows how the model performance on predicting each tag. We use a small dataset from Twitter.
It compares two imbdeddings for finding similarity between texts. This is very importnt problem and the success of the model for large datasets greatly depends on the embedding chosen. One embedding is a pre-trained W2V model from Google trained on Google News (about 100 billion words). The second representation is using StarSpace that we train over StackOverflow data sample. The benefit of using StarSpace here is that it can be customized for the particular task in hand rather than just providing a general purpose embedding. We use a couple of metrics to compare the performance of these embeddings for finding duplicate questions in StackOverflow.
Legendry Seq2Seq model has many applications such as NMT, Text Summerization or Conversaional Modeling. We use a simple Seq2Seq model to make a simple calculator. This model does not use Attention mechanism.
Finally, the last notebook combines the result of the previous ones 1-3 into a working chatbot using Telegram Chatterbot specialized for directing programming-related questions to related threads on StackOverflow, which I hosted it on AWS. The chatbot consists of a dialogue manager that distinguishes general guestions from technical ones using a classification model (intent classifier). If the question is not a ordinary dialogue type, a classifier predicts a tag for it and the bot directs the user to the appropriate thread on StackOverflow to see the answer. We can also train our own conversational model by a Seq2Seq model instead of using a pretrained model provide by ChatterBot on Telegram (or any pther provider).

These are assignments I did for a course here: https://www.coursera.org/learn/language-processing/home/welcome

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Best Embeddings for Text Similarity.ipynb		Best Embeddings for Text Similarity.ipynb
ChatBot-StackOverflow Assistant.ipynb		ChatBot-StackOverflow Assistant.ipynb
MultilabelTagClassification.ipynb		MultilabelTagClassification.ipynb
Name Entity Recognition.ipynb		Name Entity Recognition.ipynb
README.md		README.md
Seq2Seq Model as a Calculator.ipynb		Seq2Seq Model as a Calculator.ipynb
dialogue_manager.py		dialogue_manager.py
main_bot.py		main_bot.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural-Language-Processing

About

Releases

Packages

Languages

Yas2020/Natural-Language-Processing

Folders and files

Latest commit

History

Repository files navigation

Natural-Language-Processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages