Tools for Assembling and Managing Scalable Knowledge Bases
Robert MacGregor & Ramesh S. Patil
USC Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292
Tel: (310) 822-1511 Fax: (310) 823-6714
Email: MacGregor@isi.edu, ramesh@isi.edu
A. Abstract
The HPKB program aims to produce technology that will enable system developers
to rapidly construct large, reusable, and maintainable ontologies and knowledge
bases (KBs). In HPKB, these large-scale KBs will not be built from scratch,
but rather assembled from libraries of existing resources. However, current
tools for constructing and maintaining ontologies suffer from some important
limitations:
- Current ontology tools support browsing and editing, but they lack support
for merging ontologies, pruning them, or extracting focused ontologies from
larger ones. These capabilities are essential for rapid construction of
large ontologies, since they enable reuse of existing resources.
- Ontology tools provide little help in organizing ontologies into hierarchies,
identifying and debugging inconsistencies or maintaining their consistency
and completeness.
- Most tools are limited to small to medium scale ontologies by the inherent
limitations of their underlying representation systems, and thus cannot
support the goals of the HPKB program.
- Some ontology tools are "dead ends" in the sense that the
knowledge in the ontologies cannot be used directly in implementations--it
must first be translated into some implementation language with some loss
of fidelity, making the overall maintenance of systems developed from the
ontology more difficult.
We propose to develop a system called "Knowledge Builder", a system
for building and managing scalable knowledge bases, that will provide solutions
to each of these problems. Two key technology components make this possible:
ISI's Ontosaurus, a web-based collaboration system for ontology browsing
and editing, is pioneering the development of scalable KB development. Ontosaurus
provides prototype capabilities for composing, pruning, merging,
and extraction capabilities for rapid construction and tailoring of new
KBs from existing KBs. Using Loom, Ontosaurus already provides some capabilities
for organizing concepts in hierarchy, detecting conflicts and validating
completeness. The Knowledge Builder will expand these capabilities with
a full suite of operations for KB assembly and manipulation. It will provide
a powerful set of semantic tools to aid in organizing, debugging, validating
and maintaining ontologies and knowledge bases.
The other key component is ISI's PowerLoom KR system. A successor to Loom,
PowerLoom is highly expressive, with multiple built-in deductive reasoning
capabilities including a query processor, a description classifier, and
a context mechanism. The proposed effort will extend PowerLoom system with
view, abstraction, and mapping components needed to support KB construction.
We will make PowerLoom scalable by interfacing it to a secondary storage
system. The combination of a "just in time classifier" and modular
inference architecture will allow PowerLoom to dynamically add additional
workstations to meet the demands on its reasoning services, thus allowing
it to handle a truly large library of ontologies and knowledge bases.
Knowledge Builder will provide a powerful Web-based collaboration environment
for assembly, maintenance and export of large knowledge bases, supported
by a scalable, high-performance KR system whose reasoning services will
be utilized by end-user applications as well as by the Web tools. The ontologies,
knowledge bases and knowledge-based systems developed using Knowledge Builder
will be delivered in Common Lisp, C++, and Java to facilitate integration
with AI and conventional software.
We will support the integration of Knowledge Builder with other HPKB technology
products and demonstrate its application to HPKB challenge problem domains.
B. Innovative Claims
As discussed in the Abstract, no existing system is capable of providing
the full range of capabilities needed to construct, manage, and reason with
very large, high-performance knowledge bases. Our major innovative claim
is the development of a new system, Knowledge Builder, that achieves this
objective. Knowledge Builder will be a Web-based collaboration system for
assembling, validating, maintaining, and exporting large ontologies and
knowledge bases. Underlying Knowledge Builder will be a scalable KR system
able to represent and reason effectively with a library of knowledge bases
too large to fit within the virtual memory of a single workstation. Knowledge
Builder will demonstrate unprecedented interoperability. It will: export
ontologies and knowledge bases in a variety of KR languages such as, KIF,
Ontolingua, KRSS, Loom, and GFP; automatically generate and populate object
schema in CLOS, C++, and Java object systems; generate an IDL interface
specification for integration with service-based HPKB integration architectures;
and support HPKB system development in LISP, C++ and Java programming languages.
Implementing the entire range of capabilities envisioned for Knowledge Builder
represents an ambitious undertaking. The proposed effort will leverage two
DARPA-funded systems developed at the University of Southern California's
Information Sciences Institute (ISI): the Ontosaurus Ontology server and
the PowerLoom KR system. These systems will be extended and integrated to
develop the Knowledge Builder system. Technology advances are needed in
several areas to achieve the capabilities planned for Knowledge Builder.
These innovations are described below.
Knowledge Builder will include the first ontology tool able to construct
new KBs by assembling subtrees or subnetworks of other knowledge bases.
The assembly tool will implement a variety of complex knowledge base operations,
including composing, extracting, pruning, and merging of coarse-grained
knowledge structures. Existing ontology tools are designed to add, modify,
or delete a single axiom/frame/rule at a time, or to compose entire knowledge-bases.
Having a coarse-grained editing capability is essential to the task of rapidly
assembling large (high performance) knowledge bases from existing, reusable
knowledge bases developed for related but distinct purposes. At DARPA's
Fall 1996 I*3 workshop Ontosaurus demonstrated a preliminary coarse-grained
editor merging parts of two different knowledge bases. We plan on collaborating
with Prof. John McCarthy to develop a formal theory for ontology assembly
operations.
As knowledge bases become larger, it becomes essential to provide automated
support for evaluating the quality , correctness, and completeness of a
knowledge base--humans will not be able to cope with the complexity unaided.
While small scale examples of semantic validation tools abound, no comprehensive
system exists for validating a knowledge base. Until recently, this lack
could be attributed in part to the difficulty of developing a portable KRS
interface. The advent of Web technology, combined with standard languages
like the Knowledge Interchange Format (KIF) and translator technology now
makes it practical to make a large investment in semantic validation software.
We are proposing to incorporate a comprehensive set of semantically-motivated
checks into an Ontosaurus-based tool. This tool, coupled with our translation
technology, will enable validation to be applied to a broad range of ontologies
and knowledge bases; those developed using Knowledge Builder as well as
those developed using other KR systems. A prototype of this capability was
demonstrated at the I*3 workshop by importing a subset of C2 Schema encoded
in Ontolingua, classifying it and returning the augmented version back to
Ontolingua system.
While database management system technology considers scalability--the
ability to manage databases too large to fit in a workstation's virtual
memory--as a fundamental requirement, few KRSs are scalable. The Loom KRS
provides one example--a module developed at SRI enables Loom to manage knowledge
bases stored in a relational database management system (RDBMS). The University
of Maryland's PARKA system is also scalable. However, no existing KRS is
both scalable and has a fully-expressive representation language. We have
designed the knowledge structures and inference procedures in PowerLoom
with scalability in mind. In this effort we propose to implement our design,
making PowerLoom the first fully-expressive and scalable KRS.
Loom and PowerLoom belong to a family of KRSs ("KL-ONE-like" or
"description logic" systems) that implement a specialized reasoner
called a "description classifier". A description classifier is
uniquely suited to the task of organizing conceptual networks into semantic
hierarchies, and for validating the consistency of a knowledge base. Classifier
inferences help to make explicit semantic relationships and derived facts
that exist implicitly in a knowledge base, thereby assisting users in visualizing
the contents and consequences of a complex base of axioms and facts. Classifiers
are also able to detect contradictory definitions, thereby providing a semantic
check on the integrity of a domain model.
Up to now, all classifiers have been designed to operate in a sort of "batch"
mode--for every concept and instance entered into a classifier-based KRS
the classifier computes subsumption relationships derivable between it and
all other concepts. The computational overhead of classification becomes
prohibitively large as knowledge bases increase in size. In practice, this
prevents a classification-based KRS from managing very large knowledge bases;
in other words, it is not scalable. For PowerLoom, we have invented a new
mode of classification wherein any portion of a knowledge base can be loaded
into PowerLoom's working memory, and the PowerLoom classifier will classify
only the main memory-resident knowledge entities. Alternatively, a fully
"lazy" strategy, called "just in time" classification,
can be adopted wherein concepts and instances are not classified until they
are first referenced by a user, GUI or application. In this effort, we will
apply the just-in-time classifier to very large knowledge bases, and develop
preference heuristics that optimize the classifier's ability to apply its
computational resources where they will yield the most value to an application
or user. The flexibility provided by the just-in-time classifier architecture
will enable us to develop a scalable inference architecture wherein multiple
workstations can be harnessed to classify different portions of a knowledge
base, with their results returned to a central server that shares those
results with all users.
As programming languages drift in and out of fashion, large software systems
built in those languages can become obsolete, because the expense of porting
them to newer languages can be prohibitive. This phenomenon is currently
afflicting systems built in Common Lisp--large research systems such as
Loom are gradually becoming less useful to certain classes of users as those
users migrate to other languages (e.g., to C or C++). ISI has developed
a unique programming language called STELLA, tailored for programming intelligent
symbol processing applications, that eliminates this problem. Currently,
programs written in STELLA can be translated into efficient C++ and Common
Lisp programs. We plan to develop a STELLA-to-Java translator as part of
this effort. A system programmed in STELLA can therefore be used by the
(still considerable) body of researchers that use Common Lisp, as well as
by (more product-oriented) users who base their software on C, C++, or Java.
PowerLoom is written in STELLA, and hence will run efficiently in three
different languages. We also plan to implement major portions of the Ontosaurus
tools in STELLA, making them portable as well. The use of STELLA-based technology
means that research systems like PowerLoom will be able to transition much
more rapidly and smoothly into commercial or product environments.
C. Technical Rationale, Approach, and Plan
C.1. Introduction
The goal of the High-Performance Knowledge Base (HPKB) program is to produce
and field the technology system developers need to rapidly and inexpensively
construct large comprehensive knowledge-based systems. Unfortunately, acquiring,
formalizing, and representing knowledge is difficult and time-consuming.
It is therefore hard to build large knowledge-based systems if one continues
to start from scratch each time a new system is built. Building the next
generation of knowledge-based systems (qualitatively larger than today's
systems) will be possible only if we abandon the approach of hand-crafting
each knowledge base. Researchers need to develop ontologies and knowledge
bases that can be shared and reused. Developers need to collaborate to build
up a sizable library of reusable knowledge bases. Implementors must be able
to construct domain-specific knowledge bases by composing, and specializing
reusable knowledge bases. To further enhance the reusability of a KB, the
developers should also be able to translate a KB to the target application
environment, that is, to a specified programming language, KR system and
problem-solver(s).
The problems in sharing and reuse of knowledge have already been identified
by DARPA and the research community as a key impediment to wider use of
knowledge-based systems technology. This realization led to the establishment
of the DARPA Knowledge Sharing Effort [Neches et. al. 1991]. Over the past
several years, we have started to build the infrastructure for knowledge
sharing and reuse. Today there is a significant body of work on pieces of
the problem, such as a portable library of ontologies and knowledge bases
[Gruber 1993], knowledge interchange formats
[Genesereth 1991; Genesereth
and Fikes 1992], generic application program interfaces (APIs) (e.g. KQML
[Finin et al. 1994], GFP [Karp et al. 1995]) for knowledge exchange, and
prototype environments for collaborative development and management of ontology
libraries. Additional research in the area of generic tasks [Chandrasekaran
and Johnson 1993] in the United States and Common KADS
[Wielinga and Breuker 1986] in Europe has developed a foundation for characterizing abstract problem-solving
strategies and ontologies for describing the tasks and the role knowledge
plays in problem-solving.
In parallel, the emergence of the World Wide Web, the client-server and
intelligent agent architectures, and platform-independent programming languages
such as Java, now make it possible to exploit the Internet for the development
of distributed collaboration tools. These tools will facilitating team development
of knowledge bases and make libraries of knowledge-based components readily
available to the community. A number of tools (e.g. Ontosaurus
[Swartout et. al. 1996] at ISI, and
Ontolingua [Gruber 1992] at Stanford) are beginning
to provide such capabilities for knowledge base development.
These emerging systems and technologies provide a solid foundation for a
new paradigm of knowledge base development and maintenance that will meet
the goals of the HPKB program. This effort proposes to develop Knowledge
Builder by integrating and extending Ontosaurus, a web-based environment
for collaborative development of knowledge bases, with PowerLoom, a highly
expressive, scalable knowledge representation and reasoning system designed
to support reasoning services necessary to support ontology assembly, validation,
and maintenance of very large libraries of ontologies and domain theories.
The proposed effort will result in an integrated knowledge base development
environment that will provide:
- Tools and techniques to define, test, verify, validate, and edit large
libraries of ontologies, domain theories, and problem-solving strategies;
- Tools and techniques to extend and specialize those ontologies, domain
theories, and problem-solving strategies to create new foundation knowledge;
- Tools and techniques to compare and compose independently developed
ontologies, domain theories, and problem-solving, especially to compare
independently developed or modified theories to detect conflicting information;
- Tools and techniques to translate ontologies, domain theories, and problem-solving
strategies among different KR systems and programming languages, and
- Tools and techniques to import data dictionaries, data schemas, lexicons,
and knowledge bases from legacy expert systems to create initial ontologies
and skeleton theories for elaboration and completion;
- Tools to manage library containing foundation ontologies e.g., space,
time, actions and processes, and physical objects, domain theories and KBs
for challenge problem domains, and broad coverage ontologies such as the
SENSUS/WordNet ontology.
- Web-based tools that enable easy browsing, visualization, and understanding
of large libraries of ontologies, and domain theories hyperlinked to multimedia
documentation.
- A highly expressive KR system that can represent knowledge for reuse
by multiple applications and reasoning at different levels of abstraction
and is scalable to effectively represent and reason with large libraries
of reusable knowledge bases.
The proposed system architecture will integrate these tools and techniques
to provide a comprehensive knowledge base development, validation, and maintenance
environment. In our architecture, a well-integrated set of Web-based tools
provide various ontology management services. Underlying these tools is
a powerful KRS that manages the knowledge base and provides a variety of
inferencing services. The tools exploit the KRS's reasoning capabilities
to provide an exceptional degree of assistance to an ontology developer.
A KRS used in this capability is functioning as an "ontology server".
The notions of "ontology server" and "knowledge representation
system" (KRS) overlap considerably. The difference is mainly one of
emphasis--an ontology server emphasizes the ability to construct, manipulate,
and maintain knowledge bases containing definitional and background knowledge,
while a KRS emphasizes support for reasoning services that can be applied
to a knowledge base by an intelligent application. At the high end, both
kinds of systems demand highly expressive knowledge representation capabilities,
combined with efficient, in-depth reasoning services.
|
|
Figure 1. Architecture for the proposed Ontology Server.
|
Figure 1 shows the proposed structure for the Knowledge Builder system.
The proposed system will significantly enhance the capabilities of the current
version of the Ontosaurus Ontology Server. The present version of Ontosaurus
is built around the Loom KR system. During our implementation of Ontosaurus,
we have discovered that a number of capabilities, such as a view mechanism
and semantic analysis, require data structures and reasoning capabilities
not commonly implemented in KR systems. The proposed implementation will
be integrated with PowerLoom, a much more powerful scalable KR system. The
PowerLoom system being developed under an existing DARPA contract will provide
many of the features we consider important in an ontology server. This effort
will extend those features with a particular emphasis on scaling and performance
issues. The PowerLoom will provide a wider array of services for ontology
construction, validation and maintenance, as well as for the construction
of intelligent knowledge-based systems. Other key features of the Ontosaurus-PowerLoom
system will be its scalability (to operate on very large knowledge bases),
its just-in-time classifier that provide scalable reasoning (workstations
can be added incrementally to meet the computational load), and its modular
design (to allow user configuration and easier upgrades).
C.2. Ontosaurus Ontology Server
Ontosaurus, developed under DARPA funding (Knowledge Sharing, and ARPI Jump
Start), is a Web-based ontology and knowledge base development environment
using the Loom KR system. Because Ontosaurus uses Loom, it can provide a
number of ontology manipulation and inference capabilities. For example,
it can identify additional subsumption relations implicit in user-specified
ontologies, it can detect inconsistencies in user definitions, and it can
propagate and validate type and number constraints on hierarchically organized
roles and relations. The Ontosaurus Web server has been available to the
DARPI community since September 1996. At a recent I*3 PI meeting we demonstrated
prototype capabilities for extracting and merging ontologies, ontology translation
between Loom and Ontolingua, and compilation of the C2 Schema into C++ class
definitions. As a result it is currently being evaluated by OMWG at the
Air Force's Armstrong Laboratories for use in the development of the C2
Schema.
The proposed effort will continue the development of the Ontosaurus system
to provide foundation-building technology (section B.1.1) for ontology libraries
and knowledge bases. Towards this end we will extend existing capabilities
and develop new capabilities to support:
- the collaborative development and maintenance of large ontologies, and
KBs.
- a user customizable view mechanism that allows a single ontology to
be viewed from multiple perspectives such as the logistics planning and
force application.
- tools for constructing application-specific ontologies by composing,
merging, pruning, extracting, and customizing components from a library
of foundation ontologies and re-usable domain KBs.
- Semantic analysis tools to validate the consistency and completeness
of an ontology, detect incompatibilities between ontologies, and highlight
semantic differences between successive versions of an ontology.
- tools for translating Loom ontologies into and out of other KR languages,
and schemas for programming languages such as C++, CLOS, IDL interface specifications,
and , Java objects.
Each of these points will be addressed in detail below:
Collaborative Development Environment: Rapid development of large
multifaceted knowledge bases can only be achieved through collaboration
between teams of experts. Users will need a supportive environment that
enables them to collaborate in building, editing and maintaining an ontology,
including the ability to work individually or in a group session. They need
fine grain access control and locking to support simultaneous use by multiple
users. They need audit trails to identify authors and sources of information,
and an ability to rollback earlier changes. They also need an ability to
work on independent copies of an ontology, identify differences, and merge
the results into newer versions (see ontology construction below). They
want to view and manipulate ontologies from the perspective of their area
of expertise or responsibility (see view manager below).
Support of distributed collaboration was a key motivation underlying the
use of Web tools and protocols in the development of Ontosaurus. Although
Ontosaurus supports collaborative development, its capabilities are limited
to one group at a time. This capability will be extended to allow multiple
simultaneous sessions and support for users to create sessions, locate sessions
in progress and join or leave sessions. Currently users are provided with
read/write privileges on all information in an ontology. Proper support
of collaboration requires that we implement access control at much finer
level. For example, a user my be allowed to change only a part of an ontology
(related to his area of expertise) or only certain fields in a definition.
However, the precise nature of the requirements for access control can only
be determined through actual use. In this proposed effort, we will experiment
with access controls at different levels of granularity (ontology, sub-hierarchy,
concept and slot level) and refine them through actual use.
View Manager: The ontologies and knowledge bases designed to support
the entire spectrum of concerns of a given domain, such as the JFACC or
ACP will contain knowledge about each concept in great detail, only a small
fraction of which is required to solve any one aspect (e.g. logistics planning)
of the problem. Consider for example the JFACC domain. An air campaign planner
will be interested in different types of aircraft, their armaments and mission
capabilities, whereas a logistic planner supporting the same mission will
be interested in the fuel type and capacity for the same aircraft. Describing
each concept in its full detail is likely to confuse both user, or unnecessarily
clutter KBSs supporting them. To overcome this problem, we propose to develop
a view mechanism (similar in spirit to database views) that will tailor
the information as well as the presentation based on the needs of a domain
or user. Although databases have supported view mechanisms for some time,
view mechanisms have not been used in the knowledge-based area because knowledge
bases in the past have been designed for a single purpose, and because KR
languages are significantly more expressive than database schema definition
languages. The proposed effort will design and implement a general view
definition facility that will allow layers of abstract views to be defined
to meet the needs of a user or an application. Because KR languages are
significantly more expressive than databases schema languages, the KR-language-based
view definition facility will allow creation of substantially more complex
views while providing precise logical semantics to operations performed
over views. In particular, unlike database, which do not allow updates through
views (because they lack the logical semantics that would eliminate ambiguities),
these views will be updatable. The implementation of a view definition and
mapping facility will rely on the more general schema mapping and translation
facilities incorporated in the PowerLoom system.
Ontology Construction: This module will provide a toolkit for constructing
new ontologies tailored to a particular domain by composing, pruning, extracting,
and merging existing ontologies.
Composing ontologies: The process of creating a domain knowledge
base begins with the identification of concepts and relations for modeling
basic aspects of the domain, these include the fundamental mathematical
building blocks for representing (sets, sequences, numbers etc.) and the
formalization of concepts required to model the physical world (such as
the animate and inanimate objects, time, and space). In the absence of outside
guidance, this task is often very difficult and generally leads to an ad
hoc organization of knowledge. Given a library of carefully defined ontologies
addressing these fundamental areas, the task of constructing the upper ontology
can be achieved by selecting appropriate modules (micro-theories) from the
library and joining them together. (See Figure 2). Many existing KRSs including
Loom provide the capability for composing ontologies and knowledge bases
in an hierarchic manner. The process of composition, however, can only be
applied to micro-theories that are designed modularly and do not overlap,
or where the overlap between the ontologies is carefully managed. In the
proposed effort we will develop modular interfaces for micro-theories that
will allow different formalizations to be interchanged without affecting
other theories, and develop guidelines for the selection of theories for
composition.
 |
|
Figure 2. Composing Ontologies | Figure 3.
Pruning an ontology |
Pruning: The pruning operation allows a designer to delete concepts
or a sub-hierarchy of concepts (see Figure 3) that are not needed for a
given domain. Some KRSs (including Loom) provide support for recursively
deleting all sub-concepts and other concepts which refer to a given concept.
In our experience with Ontosaurus, we have identified a number of additional
pruning operations that are less drastic than the existing pruning operations.
These include selective deletion of offending clauses from the definition
of referring concepts and generalization of the definitions to eliminate
references to deleted concepts. These additional deletion capabilities will
be implemented within the PowerLoom system and made available to the user
as options in the pruning process.
 |
|
Figure 4. Extracting a domain-specific ontology from a broad coverage
ontology |
Ontology Extraction: One of the most time consuming aspect of modeling
a domain is the enumeration and organization of tens of thousands of domain
concepts. ISI has been exploring a novel approach to assist in this process
using existing broad coverage ontologies such as the SENSUS
[Knight and Luk 1994]
and WordNet [Miller 1990] ontologies [Swartout et. al. 1996].
Our approach begins by identifying a small number of key domain terms (called
seeds). These terms are then mapped to concepts in the SENSUS ontology (Figure
4a). Using a variety of structural and probabilistic heuristics, we then
identify concepts in the ontology that relate to the seed concepts (Figure
4b). These concepts delimit the domain relevant aspects of the larger ontology,
and are extracted to create an approximate domain-specific ontology (Figure
4c). The designer can then browse this ontology and delete unrelated terms
or import additional terms to arrive at an initial domain model. This process
was used to create the ACP-SENSUS ontology for air campaign planning (ARPI).
Using 50 seed terms provided by experts, an initial ontology of approximately
1200 domain concepts was created in less than a week. A detailed description
of this process is in [Swartout et. al. 1996].
 |
|
Figure 5. Merging parts of one ontology into another. |
Knowledge Base Merging: The merging operation allows a designer to
augment one ontology or knowledge base by extracting and incorporating selected
parts from independently developed ontologies (shown in Figure 5). A prototype
of an interactive merging capability was demonstrated at the I*3 workshop
in November 1996. In this demonstration, Ontosaurus was used to augment
the C2 Schema (JTF-ATD) with more detailed information about combat aircraft
from an independently developed ACP ontology (ARPI). During this demonstration,
we were able to extend the C2 Schema by approximately 200 additional concepts
within 10 minutes, a process which would take hours to days to complete
without this tool.
The extraction and merging operation presents a variety of technical challenges.
For example, the two knowledge bases may use different naming conventions,
they may refer to the same underlying concept with different names, or different
concepts with same name. Furthermore, because ontologies are richly interconnected,
the process of extracting only selected concepts requires deciding how to
isolate the desired portions from the remainder of the ontology. Finally,
the axioms and definitions written in the context of the originating ontology
must be reinterpreted and made consistent with those in the receiving context.
Some of these problems can be solved pragmatically. For example, the problem
with naming conventions is solved in the Ontosaurus using a mapping table
that can translate between common naming conventions. The problem of name
mismatch between ontologies can be solved by mapping concepts in both ontologies
to a common reference ontology (e.g. SENSUS) and using the reference ontology
for name alignment. The problem of different definitions can be resolved
by merging the definitions and allowing the user to resolve conflicts when
they arise. The problems with lifting axioms from one context to another,
however, is considerably harder and requires fundamental advances in the
theory of contexts. We will collaborate with Professor John McCarthy (they
have independently submitted a proposal for the development of the theory
of contexts) on further development of the formal theory of contexts with
the goal of (1) providing formal understanding of the process of combining
ontologies, and (2) pragmatically making the merging of ontologies efficient.
Semantic Analysis and Maintenance: General KR systems have traditionally
provided powerful reasoning services aimed towards the development of knowledge-based
systems. Many of these reasoning services can also be used to develop intelligent
tools for the development and maintenance of ontologies. Ontology maintenance,
however, requires additional meta-level reasoning capabilities that allow
tools to inspect inferences and their logical dependencies. Through the
use of Ontosaurus we have identified a number of such reasoning services.
These include: (a) identifying undefined concepts and relations, (b) identifying
inconsistencies and conflicts within ontologies, (c) verifying completeness,
and (d) identifying semantic differences between related concepts. These
capabilities are described below.
Undefined terms: The construction of a domain ontology is an incremental
process. While this process is in progress, many terms remain undefined.
It is essential that the ontology development environment identify these
undefined items and allow the user to incrementally complete the ontologies.
Furthermore, the KR system must continue to function in the presence of
undefined concepts. The current implementation of Ontosaurus provides the
capability for identifying and editing undefined concepts. The proposed
implementation will extend this capability by providing an agenda mechanism
similar to that implemented in Expect system
[Gil and Melz 1996;
Swartout and Gil 1995]
to further enhance the usability of this function.
Inconsistencies and conflicts: A good ontology construction tool
must provide capabilities for identifying conflicts and inconsistencies.
Furthermore, when such conflicts arise, the tool must assist user in identifying
the source of the problem and resolving it. Many KR systems detect logical
conflicts, but do not provide enough information to analyze the sources
of conflicts (i.e. sets of conflicting assumptions). In the proposed effort
we will develop reasoning capabilities for identifying sources of conflicts
and develop tools to assist users in detecting and resolving inconsistencies.
Completeness: The completeness of an ontology can be characterized
in two ways. First, within the knowledge specification of the ontology,
we can verify that all referenced terms are defined, all required fields
of an instance are filled, and when a concept is exhaustively partitioned,
that all its instances belong to exactly one of the partition subclasses,
and that all subsumption relations implicit in the definitions have been
made explicit. These capabilities are not only useful during the initial
construction of an ontology, but are also essential for continued integrity
and maintenance of the knowledge bases. Determining completeness of a KB
with respect to the knowledge needed for problem solving-- a task commonly
performed by knowledge acquisition tools such as the EXPECT system. In a
separate proposal submitted to the HPKB BAA, our team will work closely
with ISX (or other selected integration contractors) to integrate Knowledge
Builder with problem-solvers and knowledge acquisition tools to support
this functionality.
Semantic differences between related definitions: While composing,
merging or updating ontologies, one is often faced with the need to identify
semantic differences between two concepts ( similar concepts in different
ontologies, or the same concept in different versions of a single ontology).
To assist in this process, we will develop reasoning capabilities that can
compare two concepts to identify semantic differences between them and develop
tools for comparing different versions of an ontology. Pairwise comparison
between concepts is needed both for merging and maintenance.
Translation Toolkit: Although the ontologies in the ontology library
are represented in a common representation language, different knowledge-based
systems that use these ontologies will no doubt use a variety of knowledge
representation systems and programming languages. To fully benefit from
a common ontology development environment, it is necessary to make the ontologies
developed in Ontosaurus available in different languages and KR formalisms.
The current Ontosaurus system has a small number of translators for producing
KIF, Ontolingua, KRSS, and C++. In a companion proposal (PALINDROME, ISX),
we will develop a common translation toolkit for the rapid construction
of translators based on declarative mappings between PowerLoom and the target
language. We will use this toolkit to create additional translators from
Loom to IDL and Java. We expect our users to be able to assemble additional
translators for translating ontologies into frame-based and logic-based
representation languages.
C.3. PowerLoom
ISI's Loom and PowerLoom systems have both been designed to provide high-end
KRS services. Loom was developed originally as a general-purpose KRS, but
it has always provided certain services highly useful to an ontology server--the
ability to reason fluently with definitions is a hallmark of classifier-based
KR systems. ISI's Ontosaurus tool is designed to exploit the inference capabilities
that Loom makes available. The combination of Ontosaurus and Loom results
in an ontology manager that is quite advanced along certain dimensions.
Loom can immediately validate knowledge as it is entered into an ontology
using Ontosaurus--it flags any classes or instances whose information (definitions
and/or assertions) is inconsistent. Loom also organizes definitions into
a hierarchy, and classifies instances below the most specific definitions
that they satisfy. The inferences generated by the classifier enable users
to view the logical structure of a knowledge base, and to trace implied
as well as explicitly-defined relationships.
A capable ontology server must support a rather long list of features. Below
we list eight features that help to distinguish existing KRSs that can fill
the role of ontology server. Table 1 in Section H (Comparison with Ongoing
Work) compares several systems using this feature set. Ideally, an ontology
server will provide all of these features; unfortunately most of the KRSs
listed (including Loom) fail to support many of them.
- Scalable Knowledge Base. The server can access a knowledge base of
definitions and axioms larger than what can be stored run-time memory. It
will efficiently support navigation across the entire knowledge space.
- Fully expressive. The server will allow concept definitions and axioms
at least as expressive as the first order predicate calculus.
- Efficient deductive support for ontology validation and organization.
Today, only servers that include a concept classifier provide an adequate
combination of efficiency and depth of inference to support this feature.
- Deliverable in a commercially-oriented language. Today, this means a
C or C++ version of the server.
- Query processor. The server supports a query language, equivalent to
SQL in power, to query all parts of an ontology.
- Concept editing. Users can edit already loaded concepts (some classifier-based
KRSs don't support this).
- Contexts. Knowledge can be partitioned into a hierarchy of contexts,
some of which inherit from others. Reusable ontologies depend on this form
of modularity.
- Scalable Inference. Multiple machines can run in parallel to solve an
inference problem (e.g., to classify a concept hierarchy).
We consider the first three features to be essential. Scalability
is crucial to the successful marriage of ontology-based software with large
scale applications; Expressivity insures that users will not outgrow
the representation language; Deductive support is key to providing
intelligent assistance to an ontology engineer. No KRS in use today provides
more than one of these three key features. The PowerLoom KRS being implemented
as a successor to Loom provides or lays the groundwork for each of the eight
features. The enhancements to PowerLoom detailed in this proposal round
out the feature set. In particular, the enhanced PowerLoom provides scalability,
enabling it to manage very large knowledge bases, and it provides distributed
inference--any combination of processing resources available on a PowerLoom
server or client can contribute to the inferencing needed by the server.
The proposed effort also completes the functionality of PowerLoom's just-in-time
classifier, which fixes a problem (exhibited by all other classifier-based
KRSs) of delayed system response due to over-aggressive inferencing. Summarizing,
a PowerLoom KRS, enhanced with the capabilities proposed in this effort,
will significantly advance the capabilities that users and applications
can expect from an ontology server, or from a general-purpose KRS; particularly
in terms of the scale of knowledge bases that can be managed and reasoned
with.
In the remainder of this section, we outline our approach to enhancing PowerLoom.
In the process, we will discuss additional problems and benefits that accompany
our approach. As noted in Table 1, the PowerLoom system being developed
under an existing DARPA contract will provide many of the features we consider
important in an ontology server. This effort will round out those features,
which a particular emphasis on scaling and performance issues. The proposed
work breaks down into the following tasks:
- Knowledge Pager. The Knowledge Pager manages the transfer (paging) of
knowledge from a backend storage system (e.g., an RDBMS) into run-time memory.
The addition of the Knowledge Pager to PowerLoom enables it to manage very
large knowledge bases. A research task is to efficiently store and index
complex knowledge structures that don't conform to a tabular or object representation.
- Built-in Schema Mapper. Enhancing the Knowledge Pager with the ability
to map between pairs of schemas enables PowerLoom to access data from data
stores (e.g., legacy data) whose schemas don't conform to its own schemas.
- Configurable Distributed Inference Architecture. By generalizing the
Knowledge Pager to interface PowerLoom to another PowerLoom server, we can
produce multiple client/server configurations. Then, by adding a mechanism
for remote classification and remote query processing, PowerLoom can by
configured to reduce bottleneck computations and to accelerate classification
and queries using parallel processing.
- Preference-based (Just-in-Time) Classifier. While conventional concept
classifiers introduce a computational overhead that increases at least linearly
with the size of a knowledge base, PowerLoom's just-in-time classifier has
a sublinear overhead (proportional to the quantity of data accessed by an
application). We will experiment with preference heuristics that maximize
PowerLoom's ability to focus the classifier's efforts only on knowledge
accessed by a user or application.
- Java-based PowerLoom. PowerLoom is programmed in a language called STELLA
that translates into either C++ or Common Lisp. By developing a STELLA-to-Java
translator, we can release C++, Common Lisp and Java versions of the PowerLoom
KRS.
- Extensible Classification. Incompleteness of classification-based deduction
is unavoidable in a fully expressive KRS. Enhancing PowerLoom with user-extensible
classification rules will enable users to augment the deductive capabilities
of the classifier to satisfy the specific demands of their applications.
Figure 6 illustrates a scalable KRS architecture designed to accommodate
very large knowledge bases and exploit parallel processing power to achieve
scalable inference. A minimal configuration consists of a central server
accessing a knowledge base stored in main memory, connected to a Web-based
client (Ontosaurus). Scalable storage is achieved by interfacing the server
to a backend relational DBMS (RDBMS) or other commercial storage system.
The coupling between the central server and the RDBMS is achieved using
a "wrapper" module referred to as the "Knowledge Pager"
(KPager).
 |
| Figure 6. Configurable Distributed Inference Architecture |
The Knowledge Pager mediates all accesses between the server and the RDBMS,
pulling knowledge structures into the KRS's local memory on demand. The
Knowledge Pager wrapper comes in three versions. The first interfaces to
the RDBMS. The second version attaches to a client- enabling the client
to "page in" knowledge from a remote server. A third version attaches
to a "satellite" KRS, and enables that KRS to upload knowledge
from a server over a (high bandwidth) local network. These different wrappers
make it possible to assemble many different client/server/satellite configurations.
Scalable inference is achieved by (i) moving inference from the central
server onto client-side servers or satellite workstations, and by (ii) eliminating
the need to reason with knowledge not directly accessed by a user or application.
The configurable architecture just described provides the scalable processing
power needed to achieve (i). The other piece of needed technology is a "just-in-time"
(JIT) classifier. Conventional concept classifiers operate efficiently only
in a "batch mode" where they classify all instances and concepts
in a knowledge base in one pass. This approach to classification does not
scale well. A JIT classifier is capable of classifying any subset of existing
concepts and instances, in any order. PowerLoom implements the world's first
and only JIT concept classifier. Having a JIT classifier means that portions
of a knowledge base can be transferred onto a client or satellite processor
and classified independently of the central server. The results of such
classifications (which have a very concise encoding) are returned to the
server to be shared by all clients. Thus, the sharing of processing described
in (i) above becomes feasible. The batch mode approach to classification
used by conventional classifiers requires that all concepts and instances
in a knowledge base be classified. This makes the process of loading very
large knowledge bases increasingly slow, i.e., the batch classification
process does not scale well (for example, Loom hits an upper limit around
50,000 concepts). A JIT classifier makes it possible to classify only those
portions of a knowledge base actually referenced by an application, leaving
the remainder of the KB untouched. The overhead of classification can be
sublinear in the size of the KB, solving the objective of (ii) above.
A. Bibliography
[Abbrett and Burstein, 1987] G. Abbrett and Mark Burstein. The KREME knowledge
editing environment. In International Journal of Man-Machine Studies (1987)
27, pp. 103-126.
[Basili et. al. 1996] V. R. Basili, L. C. Briand, and W. L. Melo. How Reuse
Influences Productivity in Object-Oriented Systems. In the Communications
of the ACM, 39:10:104-116, October 1996.
[Bateman et. al. 1989] J. A. Bateman, R.T. Kasper, J.D. Moore, and R.A.
Whitney. A General Organization of Knowledge for Natural Language Processing:
The Penman Upper Model. Unpublished research report, USC/Information Sciences
Institute, Marina del Rey. 1989.
[Chandrasekaran and Johnson 1993] B. Chandrasekaran and T.R. Johnson. Generic
Tasks and Task Structures: History, Critique and NewDirections. In J.-M.
David, J.-P. Krivine and R. Simmons (eds.) Second Generation Expert Systems
(Berlin) 1993.
[Finin et al. 1994] T. Finin, D. McKay, R. Fritzson, and R. McEntire, "KQML:
An Information and Knowledge Exchange Protocol", in Kazuhiro Fuchi
and Toshio Yokoi (Ed.), Knowledge Building and Knowledge Sharing ,
Ohmsha and IOS Press, 1994.
[Fowler, Cross and Owens 1995] N. Fowler III, S. E. Cross and C. Owens.
The ARPA-Rome Knowledge Based Planning and Scheduling Initiative, IEEE
Expert, 10(1) 4-9. February 1995.
[Gaines 1994] B.R. Gaines. Class library implementation of an open architecture
knowledge support system. International Journal of Human-Computer Studies
41(1/2) 59-107, 1994.
[Gaines and Shaw 1995] B. R. Gaines and M. L. G. Shaw. WebMap Concept Mapping
on the Web. In the Proceedings of the Fourth International World Wide
Web Conference, Boston, Massachusetts. 1995.
[Genesereth 1991] M. R. Genesereth. Knowledge Interchange Format. Principles
of Knowledge Representation and Reasoning: Proceedings of the Second International
Conference, Cambridge, MA, pages 599-600. Morgan Kaufmann, 1991.
[Genesereth and Fikes 1992] M. R. Genesereth, R. E. Fikes (Editors). Knowledge
Interchange Format, Version 3.0 Reference Manual. Computer Science Department,
Stanford University, Technical Report Logic-92-1, June 1992.
[Gil and Melz 1996] Y. Gil and E. Melz Explicit Respresentation of Problem-Solving
Strategies to suport Knowledge Acquisition. In Proceedings of the Thirteenth
National Conference on Artificial Intelligence, Portland, Oregon, 1996.
[Gruber 1993] T. R. Gruber. Toward principles for the design of ontologies
used for knowledge sharing. In Formal Ontology in Conceptual Analysis
and Knowledge Representation, N. Guarino and R. Poli, editors, Kluwer
Academic, in preparation. Original paper presented at the International
Workshop on Formal Ontology, March 1993. Available as Stanford Knowledge
Systems Laboratory Report KSL-93-04.
[Gruber and Tenenbaum 1992] T. R. Gruber, J. M. Tenenbaum, and J. C. Weber.
Toward a knowledge medium for collaborative product development. Proceedings
of the Second International Conference on Artificial Intelligence in Design,
Pittsburgh, pages 413-432. Kluwer Academic, 1992.
[Gruber 1992] T. R. Gruber. Ontolingua: A mechanism to support portable
ontologies. Stanford University, Knowledge Systems Laboratory, Technical
Report KSL-91-66, March 1992.
[Guarino and Carrara 1993] N. Guarino, M. Carrara, and P. Giaretta. An Ontology
of Meta-Level Categories. LADSEB-CNR Int. Rep. 6/93, Preliminary version
- November 30, 1993.
[Hatzivassiloglou and Knight 1995] V. Hatzivassiloglou and K. Knight. Unification-Based
Glossing. Proceedings of the 14th IJCAI Conference. Montreal, Quebec. 1995.
[Hovy and Nirenburg 1992] E. H. Hovy and S. Nirenburg. Approximating an
Interlingua in a Principled Way. Proceedings of the DARPA Speech and
Natural Language Workshop. Arden House, NY. 1992.
[Hovy and Knight 1993] E. H. Hovy and K. Knight. Motivation for Shared Ontologies:
An Example from the Pangloss Collaboration. Proceedings of the Workshop
on Knowledge Sharing and Information Interchange, IJCAI-93. Chambry,
France. 1993.
[Karp et al. 1995] P. D. Karp, K. Myers and T. Gruber, The Generic Frame
Protocol. Proceedings of the 1995 International Joint Conference on Artificial
Intelligence, pp. 768-774, 1995.
[Karp et al. 1994] P. D Karp, S. M. Paley and I. Greenberg,. A Storage System
for Scalable Knowledge Representation. Proceedings of the Third International
Conference on Information and Knowledge Management, 1994.
[Karp and Paley 1995] P. D. Karp and S. M. Paley. Knowledge representation
in the large. Proceedings of the 1995 International Joint Conference
on Artificial Intelligence, pp. 751-758, 1995.
[Knight and Luk 1994] K. Knight. and S. Luk. Building a Large Knowledge
Base for Machine Translation. Proceedings of the American Association
of Artificial Intelligence Conference (AAAI-94). Seattle, WA. 1994.
[Knight et. al. 1995] K. Knight, I. Chander, M. Haines, V. Hatzivassiloglou,
E. H. Hovy, M. Iida, S.K. Luk, R.A. Whitney, and K. Yamada. 1995. Filling
Knowledge Gaps in a Broad-Coverage MT System. Proceedings of the 14th IJCAI
Conference. Montreal, Quebec.
[Lenat and Guha 1990] D. B. Lenat and R. V. Guha. Building Large Knowledge-Based
Systems: Representation and Inference in the CYC Project. Addison-Wesley
Publishing Company, Inc., Reading, Massachusetts. 1990.
[MacGregor 1994] R. M. MacGregor. A Description Classifier for the Predicate
Calculus. in Proceedings of the Twelfth National Conference on Artificial
Intelligence, (AAAI-94), 1994.
[MacGregor 1991] R. M. MacGregor. Using a Description Classifier to Enhance
Deductive Inference in Proceedings of the Seventh IEEE conference on AI
Applications, 1991.
[MacGregor 1991] R. MacGregor. The Evolving Technology of Classification-Based
Representation Systems. In J. Sowa (ed.), Principles of Semantic Networks:
Explorations in the Representation of Knowledge. Morgan Kaufmann, 1990.
[Mallery 1994] John C. Mallery. A Common LISP Hypermedia Server, in Proceedings
of the First International Conference on The World-Wide Web, Geneva
CERN, May 25, 1994.
[McGuire et. al. 1993] J. G. McGuire, D. R. Kuokka, J. C. Weber, J. M. Tenenbaum,
T. R. Gruber, G. R. Olsen. SHADE: Technology for Knowledge-Based Collaborative
Engineering. Journal of Concurrent Engineering: Applications and Research
(CERA), 1(2), September 1993.
[Michalski 1980] R. S. Michalski. Knowledge Acquisition Through Conceptual
Clustering: A Theoretical Framework and an Algorithm for Partitioning Data
into Conjunctive Concepts. Policy Analysis and Information Systems,
4(3) 219-244, 1980.
[Miller 1990] G. Miller. WordNet: An on-line lexical database. International
Journal of Lexicography , 3(4). (special Issue). 1990.
[Neches et. al. 1991] R. Neches, R. Fikes, T. Finin, T. Gruber, R. Patil,
T. Senator, & W. R. Swartout. Enabling technology for knowledge sharing.
Enabling technology for knowledge sharing. AI Magazine, 12(3):16-36, 1991.
[Patel-Schnider et. al. 1993] DRAFT of the specification for Description
Logics, produced by the KRSS working group of the DARPA Knowledge Sharing
Effort. updated July 1993.
[Swartout and Gil 1995] W.R. Swartout and Y. Gil. EXPECT: Explicit Representations
for Flexible Acquisition. In Proceedings of the Ninth Knowledge Acquisition
for Knowledge-Based Systems Workshop, Banff, Alberta, 1995.
[Swartout et. al. 1993] W. R. Swartout, R. Neches and R. Patil. Knowledge
Sharing: Prospects and Challenges. In Proceedings of the International
Conference on Building and Sharing of Very Large-Scale Knowledge Bases `93,
Tokyo, Japan 1993.
[Swartout et. al. 1996] W. R. Swartout, R. Patil, K. Knight, and T. Russ.
Toward Distributed Use of Large-Scale Ontologies. In Proceedings of the
Banff Knowledge Acquisition Workshop, Banff, Canada, Nov. 1996
available
in html.
[Tate 1996] A. Tate. Towards a Plan Ontology. Journal of the Italian AI
Association (AIIA) January 1996.
[Wielinga and Breuker 1986] B.J. Wielinga and J.A. Breuker, Models of Expertise,
ECAI 1986, 497-509.