Digg 2009 data set
Digg2009 data set contains data about stories promoted to Digg's front page over a period of a month in 2009. For each story, we collected the list of all Digg users who have voted for the story up to the time of data collection, and the time stamp of each vote. We also retrieved the voters' friendship links. The semantics of the friendship links are as follows
user_id --> friend_id
means that user_id is watching the activities of (is a fan of) friend_id.
User ids have been anonymized, but are unique in the data set: a user with a specific id in the friendship links table and a user with the same id in the votes table correspond to the same actual user.
The data is in zipped csv files that are password protected. The password is digg2009_user.
Table digg_votes contains 3,018,197 votes on 3553 popular stories made by 139,409 distinct users. The first vote is from the story's submitter.
Schema of the table
- vote_date: Unix time stamp of the vote
- voter_id: anonymized unique id of the voter
- story_id: anonymized unique id of the story
(left) Distribution of votes (diggs) per story. An outlier with more than 24,000 votes is not shown.
(right)Distribution of the number of votes (diggs) made by users.
Table digg_friends contains 1,731,658 friendship links of 71,367 distinct users. Voters who do not appear in the table did not specify any friends at the time data was collected.
Schema of the digg_friends table
- mutual: indicated whether the link represents a mutual friend relation (1) or not (0)
- friend_date: Unix time stamp of when the friendship link was created
- user_id: anonymized unique id of a user
- friend_id: anonymized unique id of a user
Empirical characterization of this data is described in
Distribution of the number of fans per user.
Lerman, K., Ghosh, R. and Surachawala, T. (2012) "Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs." http://arxiv.org/abs/1202.3162
This data is made available to the community for research purposes only. If you use the data in a publication,
Copyright 2015 University of Southern California Information Sciences Institute