Mail Recipients Prediction using Text and Graph Learning Techniques

Introduction

This project aims at predicting recipients of emails. Given an email body, this code outputs 10 recipients and rank them by decreasing order or relevance. A few methods were tested including KNN, TfIdf, to compute distances between multiple texts. Finally the best approach I found was to combine both the similarity between email bodies and the frequency of emails sent to recipients.

Note that using metadata such as the day/hour an email is written could improve the accuracy of this approach.

Dataset

Sample of the Enron email dataset, consisting in 43613 emails for training, sent from December 1998 to November 2001 by 125 different senders, and 2362 emails for test, sent after November 2001 by the same senders.

Evaluation Metric

The evaluation metric used for this project is Mean Average @10 (MAP@10), which is a classification metric sensitive to the rank of predicted recipients.

Requirements

python 3
pandas
sklearn

Run the model

Just run the following command:

python run.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
models		models
README.md		README.md
evaluate.py		evaluate.py
preprocessing.py		preprocessing.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mail Recipients Prediction using Text and Graph Learning Techniques

Introduction

Dataset

Evaluation Metric

Requirements

Run the model

About

Releases

Packages

Languages

gauthierdmn/enron_recipients_prediction

Folders and files

Latest commit

History

Repository files navigation

Mail Recipients Prediction using Text and Graph Learning Techniques

Introduction

Dataset

Evaluation Metric

Requirements

Run the model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages