University of Southern California

ISI Site Signature

Kristina Lerman
 Data Sets  
 Digg 2009 
  This anonymized data set consists of the voting records for 3553 stories promoted to the front page over a period of a month in 2009. The voting record for each story contains id of the voter and time stamp of the vote. In addition, data about friendship links of voters was collected from Digg.  
 Download Digg 2009 data set  
 Twitter 2010 
  This data set contains information about URLs that were tweeted over a 3 week period in the Fall of 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users.  
 Download Twitter 2010 data set  
 Flickr personal taxonomies 
  This anonymized data set contains personal taxonomies constructed by 7,000+ Flickr users to organize their photos, as well as the tags they associated with the photos. Personal taxonomies are shallow hierarchies (trees) containing collections and their constituent sets (aka photo-albums) and collections.  
 Download Flickr data set  
 Wrapper maintenance 
  Wrappers facilitate access to Web-based information sources by providing a uniform querying and data extraction capability. When wrapper stops working due to changed in the layout of web pages, our task is to automatically reinduce the wrapper. The data sets used for experiments in our JAIR 2003 paper contain web pages downloaded from two dozen sources over a period of a year.  
 Data set  
  Social network analysis methods examine topology of a network in order to indentify its structure, for example, who the important nodes are. Centrality, however, depends on both network topology (or social links) and the dynamical processes (or flow) taking place on the network, which determines how ideas, pathogens, or influence flow along social links. Click the link below to see Matlab code for calculating random walk-based centrality (PageRank) and epidemic diffusion-based centrality (given by Bonacich's Alpha-Centrality).  
 More: Matlab code to calculate PageRank and Alpha-Centrality.  
 Content Map Equation: community detection in heterogeneous networks 
  This code finds communities in networks in which nodes have attributes. The approach, described in this paper, finds best compression on a random walk on a network that also takes node attributes into account.  
 Download: ContentMapEquation on Github  
 LA-CTR: limited attention collaborative topic regression for social recommendation 
  This is a C implementation of limited attention collaborative topic regression for recommendations (LA-CTR) model, which is fully described in Kang and Lerman, 2013. Original CTR ( code has been modified to implement LA-CTR model. Please cite Kang and Lerman (2013) LA-CTR: A Limited Attention Collaborative Topic Regression for Social Media, in Proc. of AAAI.  
 Download:, LA_CTR