Skip to content

Aminoid/quora-question-pairs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quora Question Pairs

Kaggle competition

Files

main.py -- Code to implement all the models

Datasets

The datasets are available as train.csv.zip and test.csv.zip at: https://www.kaggle.com/c/quora-question-pairs/data

Dependencies

numpy, sklearn, pandas, nltk, csv, re

How to run

python main.py <jaccard|cosine|tfidf|logistic|naivebayes|randomforest|voting>

Individual Classifiers

  • Jaccard Similarity
  • Cosine Similarity
  • Pearson Coefficient
  • TF-IDF based Cosine Similarity

Ensemble Classifiers

  • Logistic Regression
  • Naive Bayes Model
  • Random Forest Model
  • Probabilistic Voting Ensemble

Note: The voting ensemble takes a huge amount of time to train

Results

log-loss value of: 0.40167 with Probabilistic Voting Ensemble. (Still improving it)

About

Kaggle Competition - Quora Question Pairs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages