Digg This: ISI Researcher Forecasts Social Network Behavior

August 5, 2010

Computer Scientist Kristina Lerman of the Information Sciences Institute recently took a look at Digg, and found that watching the behavior of a relatively few superusers foretold the fate of newly posted stories.

Digg is the news aggregation web site that posts 25,000 new stories every day. Lerman and Tad Hogg of the Institute for Molecular Manufacturing analyzed postings in the site's "upcoming" list, stories that are held in a queue waiting to be "promoted" to the main pages of the site, trying to predict which stories would become popular.

The pair presented their results, a paper entitled "The Social Dynamics of Digg" at the 4th International Conference on Weblogs and Social Media recently in Washington D.C.

Lerman is a Project Leader at the Information Sciences Institute and holds a joint appointment as a Research Assistant Professor in the USC Viterbi School of Engineering's Computer Science Department. She and Hogg used mathematical operators similar to the ones used by biologists to describe the collective behavior of social insects to study the behavior of web users.

What they found was that not all users on Digg are equally influential in promoting a story. "The top 30 users—the so called "super users"—were responsible for the vast majority of the stories posted to the front page of Digg," said Lerman.

Such "super users" are linked to hundreds or even thousands of other users, so when they make a recommendation, their linked users, in turn, can then promote the story by voting for it, where it ends up on one of the main pages on Digg.

Lerman and Hogg, of the Institute for Molecular Manufacturing hypothesized that by observing the early reactions to a story shortly after it was posted on the site they could predict how fast a particular news item would be promoted to the main Digg front page.

"We can then use this "crowd sourcing" to predict whether the posted news item will go viral," says Lerman.

The researchers also found an interesting quirk in the influence of these "super users" on the Digg site.

A key point of her work is that she determined that the popularity of a particular item posted on a site like Digg is not related to the content of the posting as much as it is to the links of the person who is doing the posting. Lerman decoded the 'friendship' links on sites like Digg to figure out who the supers are.

Then by following the postings and the Digg network's reaction to the initial appearance of an item, a prediction model can anticipate how popular the item will eventually be on the site

What use would it be to people who are trying to see at the start of a posting on the web how their message is being received and then passed on to other users?

"Marketing could know ahead of time: is my campaign working or not? The political people could be spreading messages and asking: Are my messages working or not?"

"The new social media sites offer a glimpse into the future of the web," she continued, "where rather than passively consuming information, users will actively participate in creating, evaluating, and disseminating information," says Lerman.

One of the key points of the work is to leverage what's learned from sites like Digg and its use of "crowd sourcing" to perceive other social networking patterns.

"Social media sites, such as Digg, show that is possible to exploit the activities of others to solve hard information processing problems. We expect progress in this field to continue to bring novel solutions to problems and information processing, personalization, search and discovery," says Lerman.

While superusers can help move stories Digg's front page they have relatively little influence in getting a news item to go "viral" — become ubiquitous on numerous networks, not just Digg

"When we looked at the viral stories," says Lerman, "all we can say is that the content (of these viral stories) is unpredictable."

She is trying to change this situation. "We are using these mathematical models to understand how a group of web users reacts to a story, and then use this information to predict whether the story will go viral."