- BSL (Blocking Scheme Learner): efficiently generates candidate matches between data sources
- Carmel: Finite-state transducer package written in C++
- Digg 2009
- EIDOS (Efficiently Inducing Definitions for Online Sources): automated semantic modeling of online source information
- A Fast, Accurate, Non-Projective, Semantically-Enriched Parser: Natural language parsing system and enhancements
- MACE (Multi Annotator Competence Estimation): evaluates redundant annotations of categorical data
- Pegasus: Automates scientific workflows; robust scalable tools for entire scientific community
- PowerLoom: Language and environment for constructing intelligent, knowledge-based applications
- RST Tool: discourse annotation tool creates files that reflect discourse structure of text
- SPADE (Sentence-level PArsing for DiscoursE): Sentence-level discourse parser
- Summarization: Basic elements for automated evaluation of text summaries
- Tiburon: tree transducer for composition, intersection, application and other functions
- Torc: C++ infrastructure and tool set for reconfigurable computing
- Trellis: interactive environment for semantic annotations to documents and other on-line resources
- Wings: semantic workflow system to assist scientists with computational experiments design
Don't see what you're looking for? Try our advanced search.
Other intellectual property is available through USC's innovation-oriented Stevens Center for Innovation.
Carmel is a finite-state transducer package written by Jonathan Graehl at USC/ISI. Carmel includes code for handling finite-state acceptors and transducers, weighted transitions, empty transitions on input and output, composition, k-most likely input/output strings, and both Bayesian (Gibbs sampling) and EM (forward-backward) training.
Tiburon is a tree transducer package written by Jonathan May at USC/ISI. Tiburon is designed to handle weighted regular tree grammars, context-free grammars, and both tree-to-tree and tree-to-string transducers, and can perform composition, intersection, application, determinization, inside/outside training, pruning, return k-most likely trees, Viterbi derivations, and other useful things.
ARX and Phoebus: Information Extraction from Unstructured and Ungrammatical Text on Web
The project presents two implementations for performing information extraction from unstructured, ungrammatical text on the Web such as classified ads, auction listings, and forum posting titles. The ARX system is an automatic approach to exploiting reference sets for this extraction. The Phoebus system presents a machine learning approach exploiting reference sets.
BSL: A system for learning blocking schemes
Record linkage is the problem of determining the matches between two data sources. However, as data sources become larger and larger, this task becomes difficult and expensive. To aid in this process, blocking is the efficient generation of candidate matches which can then be examined in detail later to determine whether or not they are true matches. So, blocking is a preprocessing step to make record linkage a more scalable process.
EIDOS: Efficiently Inducing Definitions for Online Sources
The Internet is full of information sources providing various types of data from weather forecasts to travel deals. These sources can be accessed via web-forms, Web Services or RSS feeds. In order to make automated use of these sources, one needs to first model them semantically. Writing semantic descriptions for web sources is both tedious and error prone.
This anonymized data set consists of the voting records for 3553 stories promoted to the front page over a period of a month in 2009. The voting record for each story contains id of the voter and time stamp of the vote. In addition, data about friendship links of voters was collected from Digg.