University of Southern California

ISI Site Signature

Kristina Lerman
  Harvesting Concept Hierarchies from Social Data  
  The Social Web has revolutionized the production of knowledge. On sites like Wikipedia, Twitter, YouTube, Flickr, and others, users generate and publish content, annotate it with descriptive keywords, and interact with others. Although social metadata contained in the descriptive keywords lacks formal structure, it captures the collective "folk" knowledge of the community. In NSF-funded work, we have developed computational methods to extract such "folk" knowledge from the traces of online activity of many users. Their methods merge personal fragments of knowledge that users of the social photo-sharing site Flickr create to organize their own photos, into a common deeper taxonomy of concepts. Once mined from data created by many users, such knowledge will add a rich semantic layer to user-generated content on the Social Web. It will also help people visualize how their content relates to that of others and allow for more efficient browsing and knowledge discovery.  
  We analyzed social metadata from the photo-sharing site Flickr. This site allows users to upload photos, tag them with descriptive labels, and also organize them within personal directories. Although the site itself does not impose constraints on how these directories are created and used, individuals generally employ them to represent intuitive relations between concepts, for example, creating a folder "people" with sub-folders "family" and "friends".  
  The figure above shows two personal directories created by one user: one for the places in Africa he traveled to, and the other one to organize holiday photos. The research team developed computational method to automatically learn taxonomies of concepts - what we call folksonomies - from thousands of such personal directories. Our method extends powerful distributed inference algorithm called Affinity Propagation to concurrently combine many small structures (user-generated directories) into a larger, more comprehensive structure (communal folksonomy). We showed that our method allows them to learn accurate and complete folksonomies.  
  The figure below shows one such folksonomy, of places related to Africa, which was automatically learned by our method. This folksonomy is more complete than those specified by any individual users, and includes knowledge that may not be found in a knowledgebase that has been created by experts. We learn, for example, that people see places like "South Africa" both as place which contains other places, such as "Cape Town" and "Soweto", but also as a destination for observing "Rhinos", "Lions" and "Antelopes".  
  We have also developed computational methods to leverage the diversity of expertise among users to learn more comprehensive and accurate folksonomies, and novel approaches to validate learned folksonomies. Our research shows the potential to unleash structured knowledge from the massive amount of user-generated content.  

  Click here for the list of project-related publications.  

  Project Staff  
  Kristina Lerman, USC
Lise Getoor, UMD
Anon Plangprasopchok, USC
  This research is sponsored in part by the National Science Foundation under Award IIS-0812677