Mental Health Topic Extraction and Depression Prediction

This project consists of a two-step framework for analyzing mental health-related posts and predicting depression risk using machine learning and deep learning models. The dataset used is sourced from Zenodo and includes Reddit posts from various mental health subreddits.

Project Overview

This project aims to:

Extract themes/topics from mental health-related text data using clustering techniques.
Build a supervised learning model to predict the likelihood of depression in user posts.

Steps Overview

Step 1: Topic Extraction

Goal: Identify key themes in mental health posts grouped by half-year intervals.
Techniques:
- Text Preprocessing (tokenization, stopword removal, lemmatization)
- Feature Extraction using TF-IDF
- Clustering using K-Means
- Extracting top unigrams and bigrams for each cluster

Input: Preprocessed text from mental health-related posts. Output: A summary of extracted topics and corresponding keywords saved to a file.

Step 2: Depression Prediction

Goal: Predict whether a given text indicates depressive tendencies.
Techniques:
- Data Annotation using:
  - Keyword-based matching
  - Sentiment analysis (VADER and Transformer-based sentiment models)
- Feature Extraction using word embeddings (BERT)
- Supervised Learning:
  - Logistic Regression
  - LSTM Model with BERT Embeddings
- Evaluation metrics: Accuracy, F1-Score, Confusion Matrix

Input: Annotated dataset of posts labeled for depression risk. Output: Trained models, evaluation metrics, and visualizations.

Dataset Download and Setup

The dataset used in this project can be downloaded from Zenodo:

Zenodo Link: Reddit Dataset

Steps to Download and Setup:

Run the following command in the terminal to download the dataset:

wget https://zenodo.org/api/records/3941387/files-archive -O reddit_dataset.zip

Unzip the downloaded file into the data/ folder:
```
unzip reddit_dataset.zip -d data/
```
Ensure the extracted files include the required mental_health_support.csv and non_mental_health.csv files.

Installation Requirements

Before running the project, install the required libraries:

pip install -r requirements.txt

Key Libraries:

Python 3.8+
PyTorch
scikit-learn
transformers (Hugging Face)
NLTK
Matplotlib

Workflow

Preprocessing and Data Preparation:
- Run data_processing.ipynb to preprocess and prepare the data.
Step 1 - Topic Extraction:
- Run Step1-Topic-extracition.ipynb to extract topics and keywords using clustering.
Step 2 - Depression Prediction:
- Run Step2-Depression-Predict.ipynb to train a classifier for depression risk prediction.

Files and Outputs

Files:

data/ - Folder containing the downloaded and unzipped dataset.
Step1-Topic-extracition.ipynb - Notebook for topic extraction.
Step2-Depression-Predict.ipynb - Notebook for depression risk prediction.
data_processing.ipynb - Notebook for preprocessing the dataset.
lstm_depression_model.pth - Trained LSTM model weights.
vocab.json - Saved vocabulary file for inference.
README.md - Project documentation (this file).

Outputs:

Step 1:
- merged_clusters_output.csv - File containing topic keywords for each cluster. (There is an example in the repository.)
Step 2:
- combined_annotated_data_updated.csv - Annotated dataset with depression labels.
- classification_report.txt - Performance metrics (precision, recall, F1-score).
- lstm_depression_model.pth - Saved model for inference.

Usage Instructions

1. Data Processing

Run the preprocessing notebook:

jupyter notebook data_processing.ipynb

2. Step 1: Topic Extraction

Run the topic extraction notebook:

jupyter notebook Step1-Topic-extracition.ipynb

Expected Output:

Topic clusters and corresponding top keywords for each half-year interval.

3. Step 2: Depression Prediction

Run the depression prediction notebook:

jupyter notebook Step2-Depression-Predict.ipynb

Expected Output:

Trained LSTM model with BERT embeddings.
Performance metrics and confusion matrix.

Model Inference

To use the trained LSTM model for inference on new text data:

Load the saved model:

import torch
from transformers import BertTokenizer
from model import BertLSTMClassifier

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertLSTMClassifier()
model.load_state_dict(torch.load("lstm_depression_model.pth"))
model.eval()

Predict:

def predict(text):
    tokens = tokenizer(text, return_tensors='pt', truncation=True, padding='max_length', max_length=100)
    input_ids, attention_mask = tokens['input_ids'], tokens['attention_mask']
    with torch.no_grad():
        output = model(input_ids, attention_mask)
        prediction = torch.argmax(output, dim=1)
    return 'Depression' if prediction.item() == 1 else 'Non-Depression'

# Example
print(predict("I feel hopeless and sad all the time."))

Contact

If you have any issues or questions regarding the project, please feel free to contact the author or raise an issue on GitHub.

End of README

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Data Processsing.ipynb		Data Processsing.ipynb
Group9Report.Rmd		Group9Report.Rmd
Group9Report.pdf		Group9Report.pdf
README.md		README.md
Step1-Topic-extracition.ipynb		Step1-Topic-extracition.ipynb
Step2-Depression-Predict.ipynb		Step2-Depression-Predict.ipynb
merged_clusters_output.csv		merged_clusters_output.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mental Health Topic Extraction and Depression Prediction

Table of Contents

Project Overview

Steps Overview

Step 1: Topic Extraction

Step 2: Depression Prediction

Dataset Download and Setup

Steps to Download and Setup:

Installation Requirements

Workflow

Files and Outputs

Files:

Outputs:

Usage Instructions

1. Data Processing

2. Step 1: Topic Extraction

3. Step 2: Depression Prediction

Model Inference

Contact

About

Releases

Packages

Contributors 2

Languages

Cameron-zgl/Biostat_625_umich_final_project

Folders and files

Latest commit

History

Repository files navigation

Mental Health Topic Extraction and Depression Prediction

Table of Contents

Project Overview

Steps Overview

Step 1: Topic Extraction

Step 2: Depression Prediction

Dataset Download and Setup

Steps to Download and Setup:

Installation Requirements

Workflow

Files and Outputs

Files:

Outputs:

Usage Instructions

1. Data Processing

2. Step 1: Topic Extraction

3. Step 2: Depression Prediction

Model Inference

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages