This repository contains code for normalizing the Enron dataset. The Enron dataset is a collection of emails and other documents that were exchanged by employees of Enron Corporation, a major energy company that collapsed in 2001 due to accounting fraud. The dataset is a valuable resource for researchers who are studying corporate fraud and other financial crimes. Files
The repository contains the following files:
Enron_Data_normalization.ipynb: A Jupyter notebook that contains the code for normalizing the Enron dataset.
requirements.txt: A file that lists the dependencies that need to be installed in order to run the code.
#Instructions
To run the code, first install the dependencies
Then, open the Jupyter notebook and run the cells one by one. Dataset
The Enron dataset is not included in this repository. You can download the dataset from the following URL: https://www.cs.cmu.edu/~enron/
The code is written for A CSV version of the dataset, which I am sharing using Google Drive due to GitHub's restriction on large file uploads https://drive.google.com/file/d/1VLY0Xqhkg25FGuTIvUKiAfcGeX1fczQa/view?usp=drive_link