USC Information Sciences Institute
I am co-organizing AAAI Social Information Processing Symposium, March 26-28, 2008 at Stanford University.
The label ’social media' and Web 2.0 has been attached to a quickly growing number of Web sites, such as blogs, wikis, Flickr, and Del.icio.us, whose content is primarily user-driven. In the process of using social media sites, users are creating content and adding metadata in the form of: (1) tags: content annotations using free-form keywords, and more complex annotations; (2) ratings: passive or active evaluation of content; and (3) social networks: where users designate others as friends so as to track their activities. The connections between content, users and metadata create layers of rich interlinked data that will revolutionize information processing. New applications will include personalized information discovery; applications that exploit the 'wisdom of crowds,' for example, emergent semantics and collaborative information evaluation; deeper analysis of community structure to identify trends and experts, and many others.
Social media facilitate new ways of interacting with information - what I call social information processing. Social information processing allows users to collaborate implicitly by leveraging the opinions and knowledge generated by others. In addition to collaborative problem solving, social information processing may lead to wholly new kinds of knowledge, that emerge from the distributed activities of many users.
My recent work has studied the question of how the collective metadata added by different users can be used to solve a range of information processing problems.
Although social metadata lacks formal structure, it captures the collective knowledge of the community. The Harvest project is developing methods to extract collective knowledge from the traces left by many users in order to add a rich semantic layer to the content of the Social Web. We are developing methods to synthesize a common hierarchy for organizing content from shallow hierarchies created by over 30,000 users on the social photo-sharing site Flickr. Such global concept hierarchies, for instance, can help users visualize how their content relates to that of others and allow for more efficient browsing, search and discovery. By linking content to a common concept hierarchy, the methods developed by the Harvest project could also be used to integrate disparate data and align it across domains.
Heterogeneous networks play a key role in the evolution of communities and the decisions individuals make. On the Social Web these networks link different types of entities, for example, people and content they create and use. Network analysis algorithms usually project such networks unto simple graphs composed of entities of a single type. In the process, they conflate relations between entities of different types and loose important structural information. We developed a mathematical framework that can be used to compactly represent and analyze heterogeneous networks that combine multiple entity and link types. We generalized Bonacich centrality, which measures connectivity between nodes by the number of paths between them, to heterogeneous networks and used this measure to study network structure. Specifically, we extended the popular modularity-maximization method for community detection to use this centrality metric. We also ranked nodes based on their connectivity to other nodes. One advantage of this centrality metric is that it has a tunable parameter we can use to set the length scale of interactions. By studying how rankings change with this parameter allows us to identify important nodes in the network.
Social Browsing and Information Filtering
Collaborative Filtering (CF) is a popular AI technology for discovery and recommendation, used by commercial giants like Amazon and Netflix. It finds users with similar opinions, which they express by rating products. However, its performance leaves something to be desired, leading Netflix to offer a $1M prize for improvements to its CF-based movie recommendation system. Social media sites, on the other hand, allow users to create networks of friends and easily track friends' activities. In the social media sites we studied, Flickr, Delicious and Digg, users generally take advantage of this feature, creating personal networks of tens to hundreds (if not thousands) of friends. We believe that social filtering --- using social networks for recommendation --- is an effective alternative to CF. Recently we studied Digg and Flickr to show that users take advantage of social recommendation.
The social news aggregator Digg allows users to submit links to, vote on and discuss news stories. Each day Digg selects a few most interesting stories to feature on its front page. Rather than rely on the opinion of a few editors, Digg aggregates opinions of thousands of its users to decide which stories to promote to the front page. The social networks on Digg act as a social filtering system, recommending to user stories his or her friends liked or found interesting. By tracking the votes received by newly submitted stories over time, we showed that social recommendation system is very effective for information filtering. Specifically, we showed that (i) users tend to like stories submitted by friends and (ii) users tend to like stories their friends like.
Dynamics of Collective Decision-making
It has been noted that the aggregate decision of a large number of (uninformed) individuals can produce a better result than a decision made by a small group of experts. This effect, dubbed “wisdom of crowds,” is employed by Digg to select the best news stories for its front page, but it can be generalized to users collectively evaluating the quality of information sources.
Designing a collaborative rating system, that exploits the emergent behavior of many independent evaluators, is very challenging, as it is difficult to predict global consequences of local decisions. The choice of the user interface can have a dramatic impact on user experience and system behavior. Besides running the system or perhaps simulating it, designers have little choice in evaluating the performance of different options. Mathematical analysis can be used as a tool to explore the design space of collaborative rating systems to find the parameters that optimize a given set of metrics (story timeliness vs interest, etc.) before the system is actually deployed.
We constructed a mathematical model that describes the dynamics of collective voting. The model includes terms which account for the social recommendation effect described above, as well as votes coming from users who see the story on the Newly Submitted or the Front pages. The model allows us to evaluate different methods for guiding the emergent “wisdom of crowds” effect into the desired direction, e.g., making the threshold a function of the social network size will ensure that only highly interesting stories are promoted. We found that solutions of the model correctly predicted the change in the number of votes received by real stories on Digg.
Please see CiteUlike for the up-to-date list of relevant publications
Kristina Lerman (2007), Social Information Processing in Social News Aggregation, Extended version of the paper submitted to IEEE Internet Computing special issue on Social Search. (cs.CY/0703087)
Kristina Lerman (2007), Social
Networks and Social Information Filtering on Digg, in Proceedings of
Int. Conf. on Weblogs and Social Media,
Stochastic Model of Social Dynamics Stochastic model of the social news portal Digg, and how it can be used to predict popularity of news stories. Presented at ONR workshop at IPAM (UCLA), October 14, 2009
Social Information Processing Introduction to the AAAI Social Information Processing Symposium held in March 2008 at Stanford University that I co-organized.
Analysis of Social Voting Patterns on Digg Presented at ACM SIGCOMM Workshop on Online Social Networks, August 18, 2008.
Copyright: USC Information Sciences Institute 2009