Flickr personal taxonomies

In addition to allowing users to organize content by tagging it with descriptive labels, several social media sites also allow users to organize content hierarchically within personal taxonomies. Delicious, for example, lets users group related tags into bundles. Flickr lets users group related photos into sets and related sets within collections.


The figure below shows some of the collections created by a Flickr user. These collections reflect her interest in the natural world.


The Plant Pests collection is composed of several sets: Plant Parasites, Sap Suckers, Plant Eaters, Caterpillars. The Mushrooms & Fungi collection is composed of sets: Mushrooms, Fungi, Puffballs & Shelf fungi, Molds and Rusts.


Individual sets contain images. Figure below shows an image in the Caterpillars set and an image in the Mushrooms set and the tags user assigned to them.


We view collections and sets (NB collections can be grouped together within other collections) as defining a personal taxonomy a user created --- what we call a sapling. We represent saplings as shallow trees. Notice that we split up composite child names. Below are the saplings corresponding to the examples above.


Table anonym_folder contains saplings from 7,121 Flickr users who belong to wildlife and nature photography public groups. This data has been anonymized, with user and folder ids replaced with unique integers. The hierarchical relations within a sapling are encoded as parent-->child relations, where a parent is a collection, and a child is a constituent set or another collection.

Schema of the anonym_folder table


Table anonym_tag contains 7,656,031 tags created by users to annotate images within the sets. Only 5,620 users tagged images within sets, and 63,997 sets contained tags. Tags are propagated from sets to their parent collections.
Schema of the anonym_tag table

This data set was used in Plangprasopchok, A.; Lerman, K.; and Getoor, L., Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata. 2010. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), July. (bibtex) Presentation

The data is in zipped csv files that are password protected. The password is flickr_small
This data is made available to the community for research purposes only. If you use the data in a publication, please cite the above paper.

Copyright 2011 University of Southern California Information Sciences Institute