Kaggle competition
main.py -- Code to implement all the models
The datasets are available as train.csv.zip and test.csv.zip at: https://www.kaggle.com/c/quora-question-pairs/data
numpy, sklearn, pandas, nltk, csv, re
python main.py <jaccard|cosine|tfidf|logistic|naivebayes|randomforest|voting>
- Jaccard Similarity
- Cosine Similarity
- Pearson Coefficient
- TF-IDF based Cosine Similarity
- Logistic Regression
- Naive Bayes Model
- Random Forest Model
- Probabilistic Voting Ensemble
Note: The voting ensemble takes a huge amount of time to train
log-loss value of: 0.40167 with Probabilistic Voting Ensemble. (Still improving it)