Harvesting Concept Hierarchies from Social Data


The Social Web is changing the way people create and use information. Unlike traditional Web sites, Flickr, Del.icio.us, Digg, and many others, enable users to publish content, organize it, and to participate in communities. The information they create while interacting with content and other users is called social metadata. Tags are one example of social metadata. Tagging was introduced as a means for individual users to organize their own content by assigning freely-chosen keywords to it. In addition to flat tags, some Social Web sites now allow users to organize content hierarchically. The social photosharing site Flickr, for example, allows users to group related photos in sets, and related sets in collections, while the social bookmarking site Del.icio.us lets users group related tags in bundles. Although the sites themselves do not impose constraints on how the hierarchies are created and used, individuals generally use them to efficiently organize complex content and represent intuitive relations between concepts.

Although social metadata lacks formal structure, it captures the collective knowledge of the community. Once extracted from the traces left by many users, such collective knowledge will add a rich semantic layer to the content of the Social Web. The next generation of information discovery, search, data management, visualization and personalization tools will rely on this semantic layer. This project will develop a probabilistic framework to combine diverse types of social metadata in the form of tags and hierarchical relations to construct a global concept hierarchy. In addition, the methods developed by the project will use social relations, in the form of community participation, to discover community-specific vocabulary and concepts, and identify facets of multi-dimensional concepts.

In the future, Social Web sites and data management tools will allow users to express ever richer types of knowledge, including complex predicates and semantic relations. The ability to aggregate individually expressed knowledge into a unified whole will transform the way people use information. Global concept hierarchies, for instance, will help users visualize how their content relates to that of others and allow for more efficient browsing, search and discovery. By linking content to a common concept hierarchy, the methods developed by the project could also be used to integrate disparate data and align it across domains. The proposed work, therefore, addresses one of the important emerging questions in AI research, namely, how to harness the power of collective intelligence.

Below are examples of graphs of concepts related to the main concept (in yellow) extracted from the collection-set relations created by ~30,000 Flickr users.
invertebrate country

Papers

Plangprasopchok, A. and Lerman K., 2009. Constructing Folksonomies from User-Specified Relations on Flickr, in Proc. International World Wide Web Conference (WWW09), Madrid, Spain.

Presentations

Constructing folsonomies from User-Specified Relations on Flickr, presented at WWW09.

Project Staff

Kristina Lerman, USC
Lise Getoor, UMD
Anon Plangprasopchok, USC

Acknowledgements

This work is sponsored in part by the National Science Foundation under Award IIS-0812677.


Copyright: USC Information Sciences Institute 2008,2009
  Updated: 07/2009