This is the final project of team 20, undergraduate nlp class in fall 2021, NYU. And we are doing personality prediction. This project was oringinally a kaggle contest, and then we read a model with 73% accuracy that uses type dynamics and cognitive functions (I should find a better reference for type dynamics). Sort of inspired by the theory of type dynamics, we are going to use what we would like to call pseudo/generalized cognitive functions, that could be psychologically nonsense, to train a model. If the model works really well, then I guess that implies some of the pseudo coginitive functions are psychologically meaningful.
Students
- Oishika
- Haiyang
- Vincent
- Arthur
- google doc readme for more information
- overleaf project proposal
- development set
- train set
- test set
Please put your name next to the todo that you would like to do or you have done. Otherwise, the team won't give you credit for this project.
- the proposal: finished collectively
- feature extraction/selection/data cleaning
- emoticons
- tfidf: (Haiyang and Vincent)
- topic extraction: (Vincent)
- development dataset: (Haiyang)
- feature selection: (Haiyang)
- dimension reduction: done by Vincent. However, we decided that to not reduce the dimension.
- parallel feature extracting (Vincent)
- first layer models: (Haiyang, Vincent)
- second layer model: this the the neural network taking input from previous layer to predict personality. (Haiyang)
- chunk max pool,
- activations,
- k max
- deeper models when the dataset is large enough?
- profiling the training process. We discovered that the first layer model is just too good. So we decided that we are not using a NN for the second layer. Instead, we will use a simple random forest decision tree.
- decision tree
- building the model (Vincent, Haiyang)
- feature
- first layer
- second layer
- evaluation (Oishika, Vincent)
- presentation (Oishika, Vincent, Haiyang)
- writing the paper (Oishika 70%, others 30%)