Skip to content

This repository contains code for normalizing the Enron dataset.

Notifications You must be signed in to change notification settings

RutujChheda/Enron_Emails_Dataset_Processed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Enron Dataset Normalization

This repository contains code for normalizing the Enron dataset. The Enron dataset is a collection of emails and other documents that were exchanged by employees of Enron Corporation, a major energy company that collapsed in 2001 due to accounting fraud. The dataset is a valuable resource for researchers who are studying corporate fraud and other financial crimes. Files

The repository contains the following files:

Enron_Data_normalization.ipynb: A Jupyter notebook that contains the code for normalizing the Enron dataset.
requirements.txt: A file that lists the dependencies that need to be installed in order to run the code.

#Instructions

To run the code, first install the dependencies

Then, open the Jupyter notebook and run the cells one by one. Dataset

The Enron dataset is not included in this repository. You can download the dataset from the following URL: https://www.cs.cmu.edu/~enron/

The code is written for A CSV version of the dataset, which I am sharing using Google Drive due to GitHub's restriction on large file uploads https://drive.google.com/file/d/1VLY0Xqhkg25FGuTIvUKiAfcGeX1fczQa/view?usp=drive_link

About

This repository contains code for normalizing the Enron dataset.

Topics

Resources

Stars

Watchers

Forks