The figure below shows some of the collections created by a Flickr user. These collections reflect her interest in the natural world.
The Plant Pests collection is composed of several sets: Plant Parasites, Sap Suckers, Plant Eaters, Caterpillars. The Mushrooms & Fungi collection is composed of sets: Mushrooms, Fungi, Puffballs & Shelf fungi, Molds and Rusts.
Individual sets contain images. Figure below shows an image in the Caterpillars set and an image in the Mushrooms set and the tags user assigned to them.
We view collections and sets (NB collections can be grouped together within other collections) as defining a personal taxonomy a user created --- what we call a sapling. We represent saplings as shallow trees. Notice that we split up composite child names. Below are the saplings corresponding to the examples above.
Table anonym_folder contains saplings from 7,121 Flickr users who belong to wildlife and nature photography public groups. This data has been anonymized, with user and folder ids replaced with unique integers. The hierarchical relations within a sapling are encoded as parent-->child relations, where a parent is a collection, and a child is a constituent set or another collection.
This data set was used in Plangprasopchok, A.; Lerman, K.; and Getoor, L., Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata. 2010. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), July. (bibtex) Presentation
The data is in zipped csv files that are password protected. The password is flickr_small
This data is made available to the community for research purposes only. If you use the data in a publication,
please cite the above paper.