ArXiv - Predicting Scientific Paper Category

MSc Data & Web Science, Aristotle University of Thessaloniki (AUTH)

Course: Advanced Machine Learning

Project: “Predicting Categories of Scientific Papers with Advanced Machine Learning Techniques”

Team Members:

Georgios Arampatzis
Alexia Fytili
Eleni Tsiolaki

Dataset: ArXiv

Technical Report: PDF

Imbalance:

Oversampling: SMOTE, Borderline SMOTE, RandomOverSampler

Undersampling: Tomek Links, Random Undersampling, NearMiss1, NearMiss2, NearMiss3

Over&Undersampling: SMOTE and Tomek Links

Extra: Generating Synonym Text

Multi-label Classification:

LabelPowerset
BinaryRelevance
ClassifierChain
MLkNN

Interpretability / Explainability:

ELI5
LIME
Anchors
SHAP

Code Structure:

.
└── arxiv_predicting_paper_category
    ├── Imbalanced
    │   ├── imbalance_generate_synonym_text.py
    │   ├── imbalance_methods.py
    │   └── random_undesampling.py
    ├── Interpretability
    │   ├── interpretability_results
    │   │   ├── Weights_CountVectorizer_25000_LinearSVC_Big_dataset.html
    │   │   ├── Weights_CountVectorizer_25000_LinearSVC_TOMEK_Big_dataset.html
    │   │   ├── Weights_CountVectorizer_25000_LogisticRegression_Big_dataset.html
    │   │   ├── Weights_CountVectorizer_25000_LogisticRegression_TOMEK_Big_dataset.html
    │   │   └── shap_LinearSVC.png
    │   ├── LIME.py
    │   ├── anchors_explanations.py
    │   ├── eli5.py
    │   └── shap_explanations.py
    ├── Multi_label_classification
    │   ├── multi_label_classification_new.py
    │   └── multi_label_plots.py
    ├── data_pics
    │   ├── imbalanced_cv.png
    │   ├── imbalanced_dataset.png
    │   └── imbalanced_roc.png
    ├── dataset
    │   ├── create_synonym_dataset.py
    │   ├── dataset_methods.py
    │   └── exploratory_data_analysis.py
    ├── preprocessing
    │   ├── create_csv_with_categories_as_new_Columns.py
    │   ├── create_multi_class_dataset.py
    │   ├── create_preprocessed_csv.py
    │   ├── create_tf_idf_csv.py
    │   ├── dictionaries.py
    │   ├── filter_dataset_based_on_category.py
    │   └── main_preprocessing.py
    ├── .gitingnore
    └── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArXiv - Predicting Scientific Paper Category

MSc Data & Web Science, Aristotle University of Thessaloniki (AUTH)

Course: Advanced Machine Learning

Project: “Predicting Categories of Scientific Papers with Advanced Machine Learning Techniques”

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.idea		.idea
Imbalanced		Imbalanced
Interpretability		Interpretability
Multi_label_classification		Multi_label_classification
data_pics		data_pics
dataset		dataset
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md

alfagama/arxiv_predicting_paper_category

Folders and files

Latest commit

History

Repository files navigation

ArXiv - Predicting Scientific Paper Category

MSc Data & Web Science, Aristotle University of Thessaloniki (AUTH)

Course: Advanced Machine Learning

Project: “Predicting Categories of Scientific Papers with Advanced Machine Learning Techniques”

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages