Skip to content
This repository has been archived by the owner on Apr 19, 2022. It is now read-only.

WangHaiYang874/NYU-NLP-final-project-2022-team-20

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personality prediction: predicting the MBTI types over a reddit dataset

This is the final project of team 20, undergraduate nlp class in fall 2021, NYU. And we are doing personality prediction. This project was oringinally a kaggle contest, and then we read a model with 73% accuracy that uses type dynamics and cognitive functions (I should find a better reference for type dynamics). Sort of inspired by the theory of type dynamics, we are going to use what we would like to call pseudo/generalized cognitive functions, that could be psychologically nonsense, to train a model. If the model works really well, then I guess that implies some of the pseudo coginitive functions are psychologically meaningful.

Team members

Students

  • Oishika
  • Haiyang
  • Vincent
  • Arthur

links

TODOs

Please put your name next to the todo that you would like to do or you have done. Otherwise, the team won't give you credit for this project.

  • the proposal: finished collectively
  • feature extraction/selection/data cleaning
    • emoticons
    • tfidf: (Haiyang and Vincent)
    • topic extraction: (Vincent)
    • development dataset: (Haiyang)
    • feature selection: (Haiyang)
    • dimension reduction: done by Vincent. However, we decided that to not reduce the dimension.
    • parallel feature extracting (Vincent)
  • first layer models: (Haiyang, Vincent)
  • second layer model: this the the neural network taking input from previous layer to predict personality. (Haiyang)
    • chunk max pool,
    • activations,
    • k max
    • deeper models when the dataset is large enough?
    • profiling the training process. We discovered that the first layer model is just too good. So we decided that we are not using a NN for the second layer. Instead, we will use a simple random forest decision tree.
    • decision tree
  • building the model (Vincent, Haiyang)
    • feature
    • first layer
    • second layer
  • evaluation (Oishika, Vincent)
  • presentation (Oishika, Vincent, Haiyang)
  • writing the paper (Oishika 70%, others 30%)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published