University of Southern California

ISI Site Signature

Kristina Lerman
  Modeling Social Dynamics  
  Social media sites such as Digg use crowd-sourcing and social ("follow") links to help people find interesting content. Crowd-sourcing relies on the reactions of the first people to see new content to indicate whether others will find it interesting. Social links allow fine-tuning the selection by emphasizing reactions by a person's friends. These techniques are especially useful filters for the flood of online content whose quality is hard, if not impossible, to determine automatically.  
  There are, however, challenges for realizing this potential of social media. Are early user reactions typical of later reactions? How do other factors, such as the web site's user interface, the timing of posts, number of friends and deliberate "gaming the system" affect user's reactions? These questions make it difficult to directly relate early user reactions to the story's appeal to the user community.  
  We addressed these challenges with models of user's behavior on Digg. We were able to develop these models thanks to the availability of large amounts of data on how stories receive votes. Digg, in fact, was one of the first social media sites to provide programmatic access to such data.  
  The figure above shows how the popularity of three stories, as measured by the number of votes (diggs), changed over time since each story's submission. The abrupt increase in the slope corresponds to promotion to the front page. Our goal was to understand such curves: what makes some stories more popular than others? How does popularity grow and why does it saturate? What role do social networks play in the evolution of popularity? Can this behavior be predicted?  
  We used a physics-based framework to model users and stories on Digg. A user who sees a story will digg it with probability related to how interesting the story is to that user. The more interesting the story, the more popular it will become. However, digging a story also depends on how easily users can find it. This factor, which we call visibility, depends on Digg's user interface. Our model tracks how visibility changes over time. A story starts in the Upcoming Stories queue, where visibility decreases rapidly as users submit subsequent stories. After promotion to the Front Page, visibility skyrockets and then decreases as additional stories are promoted. Social links also affect visibility: each new digg makes the story visible to that digger's fans.  
  Though the model is simple, the hard part was calibrating it from the available data, i.e., figuring out how often stories are submitted and promoted, how persistently users explore Upcoming and Front page stories, how often users visit Digg, how their activity varies during a day, and so on. For individual stories, we determined interestingness by having the model match, as closely as possible, the observed growth of the number of diggs that story received. Hence, one novel application of our modeling framework is estimating how interesting stories are. Our separation of interestingness from visibility provides a different, perhaps truer, measure of a story's quality than its number of diggs.  
  A second application of modeling is prediction. This uses the votes a story receives up to a certain time to estimate its interestingness. We then use the model to predict the subsequent votes the story will receive, both in total and from different groups of users, e.g., to distinguish stories of general interest from those appealing mainly to the submitter's friends. As an example, the figure above shows predictions of story popularity (black line) among three groups of users: the submitter's fans, users who are fans of other diggers but not of the submitter, and users who are not fans of any previous diggers. The x-axis is time, in hours, since submission. The prediction in this example is made at promotion time (vertical dashed line). The model predicts actual votes (dots) fairly well. The model also provides confidence intervals for the predictions, indicated by the shaded areas in the figure, which estimate how well the model predicts. The predictions could be updated as new votes arrive, thereby continually giving both short and long-term forecasts for the story's subsequent votes.  
  More generally, this modeling approach disentangles various factors contributing to user behavior in social media. Among other things, it explained why top users submitted a disproportionate number of Front Page stores, a matter of some controversy on Digg. Rather than collusion or manipulation, their success could simply be explained by the higher visibility of their stories due to the larger numbers of fans these users had. However, we found that a story that received many of its initial diggs from submitter's fans was less likely to go viral than a story that was spreading initially among non-fans. An interesting challenge for this approach in the future is the increasingly personalization of the web. This could ultimately lead to each user being a special case and make it harder to generalize from a few early reactions to the broad user community. This would diminish the usefulness of crowd-sourcing unless more sophisticated models can account for numerous factors to create highly-personalized recommendations for each user. Thus our modeling ideas could help in developing next-generation social computing platforms.  

  Hogg, T. and K. Lerman, 2012. Social Dynamics of Digg, EPJ Data Scienced 1(5).
  Hodas, N. and Lerman K., 2012. How Limited Visibility and Divided Attention Constrain Social Contagion., in ASE/IEEE International Conference on Social Computing..
  Ver Steeg, G., Ghosh, R. and Lerman K., 2011. What Stops Social Epidemics?, in Proceedings of 5th International Conference on Weblogs and Social Media..
  K. Lerman and Hogg, T., 2010. Using a Model of Social Dynamics to Predict Popularity of News, Proc. of World Wide Web Conference, Raleigh, NC.
  Hogg, T. and K. Lerman, 2009. Stochastic Models of User-Contributory Web Sites, International Conference on Weblogs and Social Media, San Jose, CA.
  Lerman K., 2007. Social Information Processing in Social News Aggregation, Extended version of the paper submitted to IEEE Internet Computing special issue on Social Search.
  Lerman K., 2007. Dynamics of collaborative document rating systems, in Proceedings of KDD workshop on Social Network Analysis.
  Project Staff  
  Kristina Lerman, USC
Tad Hogg, HP Labs

  This research was generously supported by the National Science Foundation under Grant Nos. IIS-0968370 and IIS-0535182.