Fine-tuned Longformer model to detect root cause of depression in long texts.
This project aims to fine-tune a MentalLongformer model for a depression causal analysis downstream task using CAMS dataset (paper, GitHub). This model reaches SOTA performance (read below for details) for depression causal analysis task.
This project is developed as a part of my undegraduate thesis research.
- Language: Python
- Libraries: PyTorch, HuggingFace family of libraries
- Hyperparameter Tuning: Optuna
I measured my model using F1-score and accuracy to maintain comparability with other researches. Other metrics and training data are available, please see the appendix.
With CAMS Dataset:
Author | Model Used | F1 Score | Accuracy |
---|---|---|---|
Garg et al. (2022) | CNN + LSTM | 0.4633 | 0.4778 |
Saxena et al. (2022) | BiLSTM | 0.4700 | 0.5054 |
Ji et al. (2023) | MentalLongformer | 0.4874 | 0.4920 |
Ji et al. (2023) | MentalXLNet | 0.5008 | 0.5080 |
Mine | MentalLongformer | 0.5524 | 0.6064 |
If I miss a research, please let me know!
This is the easiest way to try my model, as you don't need to setup anything. Just head to my HuggingFace Spaces for this project and try typing some depressive text and let my model do the magic of analyzing the reason of its depression. You can also load some long example text as MentalLongformer
is better for long texts.
If you wish to use my model to infer your dataset or maybe pre-train it further, you can import my model in a Python script/notebook.
from transformers import LongformerTokenizer, LongformerForSequenceClassification
tokenizer = LongformerTokenizer.from_pretrained("aimh/mental-longformer-base-4096")
model = LongformerForSequenceClassification.from_pretrained("stackofsugar/mentallongformer-cams-finetuned")
If you prefer to use the high-level HuggingFace pipeline to make predictions, you can also do it in a Python script/notebook.
from transformers import pipeline
pipe = pipeline("text-classification", model="stackofsugar/mentallongformer-cams-finetuned", tokenizer="aimh/mental-longformer-base-4096")
If you're not sure yet, you might want to read HuggingFace's Course on NLP.
If you're researching this topic - just like me - you might be interested in replicating my research. You can open the notebooks folder to see the notebooks that I use for my research.
Mind you that I'm excluding a notebook for preprocessing purposes (deleting inference
column, renaming cause
column, and deleting empty rows), as it contains a private repository. Nevertheless, I prepared the pre-processed dataset in notebooks/data folder in HuggingFace Datasets format that has a first-party integration with other HuggingFace libraries.
1-train.ipynb
Contains the training script, while 1b-hyperparam-search.ipynb
contains the hyperparameter search script. Below are steps that might help you to replicate this research:
- If you haven't already, install Python.
- If you have an NVIDIA GPU and are willing to use it to train, install CUDA Toolkit. It requires a C++ compiler, so install according to your chip and consult their docs. AMD GPU users might want to take a look at ZLUDA, but I dont know will my notebook work with it or not.
- I highly reccomend to setup a Python virtual environment to make your global workspace clean. Consult its docs here.
- Install the required packages using
pip install -r /path/to/requirements.txt
- Run the notebook using VS Code, Jupyter, or others.
You might also want to run the inference server (the one I hosted on HuggingFace Spaces) locally. The server is powered with Streamlit that provided a very easy way to make ML/DL showcases like I did. The server is lightweight, and can provide a fast prediction even on slow CPUs (I can't say how fast it is though). You can follow the steps below:
- Clone the repo. If you haven't set Git up in your machine, install it.
# Make sure you have git-lfs installed (https://git-lfs.com)
> git lfs install
> git clone https://huggingface.co/spaces/stackofsugar/depression-causal-analysis
- I highly reccomend to setup a Python virtual environment to make your global workspace clean. Consult its docs here.
- Install the required packages using
pip install -r /path/to/requirements.txt
- Run the server using
streamlit run path/to/app.py
- @stackofsugar (Myself)
See also a list of contributors who has participated in this project.
- CPU: Intel Xeon Silver 4216
- RAM: 128GB DDR4 ECC
- GPU: 16GB RTX A4000
With the setup above, it took me ~4 hours to complete a training run and ~36 hours to run hyperparameter tuning at 20 runs with pruning enabled.
I would like to acknowledge Universitas Sebelas Maret for the computational resources provided for this project.
Metric | Score |
---|---|
F1 Score | 0.5524 |
Accuracy | 0.6064 |
Precision | 0.6020 |
Recall | 0.5385 |
Hyperparameter | Value |
---|---|
Learning Rate | 3.04e-5 |
Warmup Steps | 75 |
Weight Decay | 2.692e-5 |
Train Epochs | 5 |
fp16 | True |
Total number of steps trained: 620