Enron Dataset
 
 
   
  Email logs have been considered as a useful source for research in fields like link analysis, social network analysis and textual analysis. Most of the experiments in these fields of research are performed on synthetic data due to lack of an adequate and real life benchmark. The Enron email dataset is a touchstone for such research. This dataset is similar to the data collected for fraud detection and counter terrorism hence it is a perfect test bed for testing the effectiveness of techniques used for counter terrorism and fraud detection. This dataset was made public by the Federal Energy Regulatory Commission during its investigation. William Cohen made the dataset available on his webpage. The dataset still had a lot of integrity issues. It had many duplicate and corrupt messages. We cleaned it and created a MySql database for the dataset to catalyze the statistical analysis of the data. The MySql form of the dataset can be downloaded here

We further statistically analyzed the datasets appropriateness for research. We also derived a social network constituting the Enron employees from the evidence database. The word file here contains a detailed description of the database, a report of its statistical analysis and also describes the social network.

We gathered the information regarding the status of every ex employee in the ex organization hierarchy. This is vital in studies of information flow in an organization.


For any questions regarding this data contact:

Jitesh Shetty: jshetty(at)usc(dot)edu
Jafar Adibi: adibi(at)isi(dot)edu