Large Resources

Ontologies (SENSUS) and Lexicons


Objective

In order to perform the kind of reasoning/inference required for deeper (semantic) understanding of texts, as required for high-quality Machine Translation, Summarization, and Information Retrieval, it is imperative to provide systems with a wide-ranging semantic thesaurus. We call such a 'concept thesaurus' an Ontology.

No adequately large, refined, and consistent ontology exists today. It is our goal to try to build one incrementally, with constant testing and revision according to the needs of the MT and summarization systems, out of existing lexical, text, and other ontology resources.


Approach

The SENSUS ontology:
We have constructed SENSUS, a 70,000-node terminology taxonomy, as a framework into which additional knowledge can be placed. SENSUS is an extension and reorganization of WordNet (built at Princeton University by George Miller, Christiane Fellbaum, and colleagues); at the top level, we have added nodes from the Penman Upper Model, and the major branches of WordNet have been rearranged to fit. In addition, we have added nodes based on work with other ontologies, as described below.

SENSUS can be browsed using the viewer Ontosaurus.

Ontology Alignment:
Since no ontology is useful in isolation, we have developed a number of cross-ontology alignment algorithms, by which one can find corresponding terms in other lexicons or ontologies, in order to transfer knowledge. The use of these algorithms has allowed us to link terms in lexicons of several languages to SENSUS terms, thereby facilitating word transfer for Machine Translation across languages. In this application, the ontology terms serve as interlingua pivot points.

In addition, we have investigated the use of these cross-ontology alignment algorithms to construct a single large ontology out of several existing ones, in order to try to come up with an ontology standard. This work, in conjunction with members of the Artificial Intelligence Knowledge has helped us find errors and omissions in various ontologies, and has led to extansions of parts of the top of the CYC ontology, the MIKROKOSMOS ontology of New Mexico State University, and others. This work formed part of the efforts of the ANSI Ad Hoc Committee on Ontology Standards, with participants from IBM Santa Theresa, CYCorp., Stanford University, EDR Tokyo, and various individuals). The initial construction of SENSUS and alignment algorithms were performed by Kevin Knight and Steve Luk; later alignment algorithms and related work by Eduard Hovy and Bruce Jakeway. This project is currently unfunded.

Concept families:
Two subprojects are attempting to add information to SENSUS. One of them is extracting families of semantically related words from text retrieved from the Web. A variety of methods has been investigated to improve the purity of extracted words, including Concept Lattice manipulation, singular value decomposition, chi-squared, tf.idf and other statistical counts, sparse matrix diagonalization, and traditional clustering algorithms such as CLINK and SLINK.

When successfully collected, the sets of related words are then converted into families of SENSUS concepts and cross-linked within SENSUS. These concept families (which we call concept signatures) are then used in the automated text summarization system SUMMARIST to enable the difficult step of semantic interpretation. Interpretation is one method of fusing together related concepts into a more concise description, for example:

enter + menu + order + waiter + food + pay -> visit restaurant
This work is being performed by Eduard Hovy and Mike Junk.

Concept definitions from dictionaries:
Simply knowing that two concepts are somehow related is often not sufficient. The question is how they are related. Another project is attempting to derive some inter-concept relations by extracting information from the definitions of Webster's 1913 dictionary of English. This project resembles the MindNet project of Microsoft.

The project is based on the assumption that words' definitions will contain a fairly small and highly relevant set of prepositional phrases that can be used to relate the word to other words (and eventually, the concept represented by the word to other concepts represented by the heads of the proposition phrases). Many of the heads will be words in the word families collected above. By extracting the prepositions that relate the words, and finding their equivalent semantic relations, one can replace the relatedness link in SENSUS with a more particular relation.

This project, performed by Bruce Jakeway and Eduard Hovy, involves the following steps:

At present, a pilot version of all but the final stage has been completed and tested.

The most difficult step is the third step, for which unique semantic readings must be found for the essential constituents of portions of the definition. For example, for the fragment

made out of paper
several semantic (SENSUS meanings are possible for each of make, out-of, and paper, but only the senses construct, material, and paper or newspaper for them respectively, when considered together. Similarly, for
sing out of happiness
the semantic relation cause is the only appropriate one for the preposition out-of,, given the presence of an emotional word as its head. The project has experimented with the Expectation Maximization technique to assemble the statistical probabilities for various semantic readings of combinations of various classes of words.

This project is associated with the Dictionary Parsing Project, which focuses on the syntactic parsing of Webster's definitions using a variety of high-quality parsing tools. For more information, please contact Ken Litkowski.



Project members


Eduard Hovy -- senior project leader

Kevin Knight -- project leader

Mike Junk -- graduate student



NLG overview | Project Members | Projects| Demonstrations | Publications