|
|
|||
|
Enron
Dataset
|
|||
Email
logs have been considered as a useful source for research in fields like
link analysis, social network analysis and textual analysis. Most of the
experiments in these fields of research are performed on synthetic data
due to lack of an adequate and real life benchmark. The Enron email dataset
is a touchstone for such research. This dataset is similar to the data collected
for fraud detection and counter terrorism hence it is a perfect test bed
for testing the effectiveness of techniques used for counter terrorism and
fraud detection. This dataset was made public by the Federal Energy Regulatory
Commission during its investigation. William Cohen made the dataset available
on his webpage. The
dataset still had a lot of integrity issues. It had many duplicate and corrupt
messages. We cleaned it and created a MySql database for the dataset to
catalyze the statistical analysis of the data. The MySql form of the dataset
can be downloaded here
We further statistically analyzed the datasets appropriateness for research. We also derived a social network constituting the Enron employees from the evidence database. The word file here contains a detailed description of the database, a report of its statistical analysis and also describes the social network. We gathered the information regarding the status of every ex employee in the ex organization hierarchy. This is vital in studies of information flow in an organization. For any questions regarding this data contact: Jitesh
Shetty: jshetty(at)usc(dot)edu
|
|||