Tools for Assembling and Managing Scalable Knowledge Bases


Robert MacGregor & Ramesh S. Patil
USC Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292
Tel: (310) 822-1511 Fax: (310) 823-6714
Email: MacGregor@isi.edu, ramesh@isi.edu

A. Abstract

The HPKB program aims to produce technology that will enable system developers to rapidly construct large, reusable, and maintainable ontologies and knowledge bases (KBs). In HPKB, these large-scale KBs will not be built from scratch, but rather assembled from libraries of existing resources. However, current tools for constructing and maintaining ontologies suffer from some important limitations: We propose to develop a system called "Knowledge Builder", a system for building and managing scalable knowledge bases, that will provide solutions to each of these problems. Two key technology components make this possible: ISI's Ontosaurus, a web-based collaboration system for ontology browsing and editing, is pioneering the development of scalable KB development. Ontosaurus provides prototype capabilities for composing, pruning, merging, and extraction capabilities for rapid construction and tailoring of new KBs from existing KBs. Using Loom, Ontosaurus already provides some capabilities for organizing concepts in hierarchy, detecting conflicts and validating completeness. The Knowledge Builder will expand these capabilities with a full suite of operations for KB assembly and manipulation. It will provide a powerful set of semantic tools to aid in organizing, debugging, validating and maintaining ontologies and knowledge bases.

The other key component is ISI's PowerLoom KR system. A successor to Loom, PowerLoom is highly expressive, with multiple built-in deductive reasoning capabilities including a query processor, a description classifier, and a context mechanism. The proposed effort will extend PowerLoom system with view, abstraction, and mapping components needed to support KB construction. We will make PowerLoom scalable by interfacing it to a secondary storage system. The combination of a "just in time classifier" and modular inference architecture will allow PowerLoom to dynamically add additional workstations to meet the demands on its reasoning services, thus allowing it to handle a truly large library of ontologies and knowledge bases.

Knowledge Builder will provide a powerful Web-based collaboration environment for assembly, maintenance and export of large knowledge bases, supported by a scalable, high-performance KR system whose reasoning services will be utilized by end-user applications as well as by the Web tools. The ontologies, knowledge bases and knowledge-based systems developed using Knowledge Builder will be delivered in Common Lisp, C++, and Java to facilitate integration with AI and conventional software.

We will support the integration of Knowledge Builder with other HPKB technology products and demonstrate its application to HPKB challenge problem domains.

B. Innovative Claims

As discussed in the Abstract, no existing system is capable of providing the full range of capabilities needed to construct, manage, and reason with very large, high-performance knowledge bases. Our major innovative claim is the development of a new system, Knowledge Builder, that achieves this objective. Knowledge Builder will be a Web-based collaboration system for assembling, validating, maintaining, and exporting large ontologies and knowledge bases. Underlying Knowledge Builder will be a scalable KR system able to represent and reason effectively with a library of knowledge bases too large to fit within the virtual memory of a single workstation. Knowledge Builder will demonstrate unprecedented interoperability. It will: export ontologies and knowledge bases in a variety of KR languages such as, KIF, Ontolingua, KRSS, Loom, and GFP; automatically generate and populate object schema in CLOS, C++, and Java object systems; generate an IDL interface specification for integration with service-based HPKB integration architectures; and support HPKB system development in LISP, C++ and Java programming languages.

Implementing the entire range of capabilities envisioned for Knowledge Builder represents an ambitious undertaking. The proposed effort will leverage two DARPA-funded systems developed at the University of Southern California's Information Sciences Institute (ISI): the Ontosaurus Ontology server and the PowerLoom KR system. These systems will be extended and integrated to develop the Knowledge Builder system. Technology advances are needed in several areas to achieve the capabilities planned for Knowledge Builder. These innovations are described below.

Knowledge Builder will include the first ontology tool able to construct new KBs by assembling subtrees or subnetworks of other knowledge bases. The assembly tool will implement a variety of complex knowledge base operations, including composing, extracting, pruning, and merging of coarse-grained knowledge structures. Existing ontology tools are designed to add, modify, or delete a single axiom/frame/rule at a time, or to compose entire knowledge-bases. Having a coarse-grained editing capability is essential to the task of rapidly assembling large (high performance) knowledge bases from existing, reusable knowledge bases developed for related but distinct purposes. At DARPA's Fall 1996 I*3 workshop Ontosaurus demonstrated a preliminary coarse-grained editor merging parts of two different knowledge bases. We plan on collaborating with Prof. John McCarthy to develop a formal theory for ontology assembly operations.

As knowledge bases become larger, it becomes essential to provide automated support for evaluating the quality , correctness, and completeness of a knowledge base--humans will not be able to cope with the complexity unaided. While small scale examples of semantic validation tools abound, no comprehensive system exists for validating a knowledge base. Until recently, this lack could be attributed in part to the difficulty of developing a portable KRS interface. The advent of Web technology, combined with standard languages like the Knowledge Interchange Format (KIF) and translator technology now makes it practical to make a large investment in semantic validation software. We are proposing to incorporate a comprehensive set of semantically-motivated checks into an Ontosaurus-based tool. This tool, coupled with our translation technology, will enable validation to be applied to a broad range of ontologies and knowledge bases; those developed using Knowledge Builder as well as those developed using other KR systems. A prototype of this capability was demonstrated at the I*3 workshop by importing a subset of C2 Schema encoded in Ontolingua, classifying it and returning the augmented version back to Ontolingua system.

While database management system technology considers scalability--the ability to manage databases too large to fit in a workstation's virtual memory--as a fundamental requirement, few KRSs are scalable. The Loom KRS provides one example--a module developed at SRI enables Loom to manage knowledge bases stored in a relational database management system (RDBMS). The University of Maryland's PARKA system is also scalable. However, no existing KRS is both scalable and has a fully-expressive representation language. We have designed the knowledge structures and inference procedures in PowerLoom with scalability in mind. In this effort we propose to implement our design, making PowerLoom the first fully-expressive and scalable KRS.

Loom and PowerLoom belong to a family of KRSs ("KL-ONE-like" or "description logic" systems) that implement a specialized reasoner called a "description classifier". A description classifier is uniquely suited to the task of organizing conceptual networks into semantic hierarchies, and for validating the consistency of a knowledge base. Classifier inferences help to make explicit semantic relationships and derived facts that exist implicitly in a knowledge base, thereby assisting users in visualizing the contents and consequences of a complex base of axioms and facts. Classifiers are also able to detect contradictory definitions, thereby providing a semantic check on the integrity of a domain model.

Up to now, all classifiers have been designed to operate in a sort of "batch" mode--for every concept and instance entered into a classifier-based KRS the classifier computes subsumption relationships derivable between it and all other concepts. The computational overhead of classification becomes prohibitively large as knowledge bases increase in size. In practice, this prevents a classification-based KRS from managing very large knowledge bases; in other words, it is not scalable. For PowerLoom, we have invented a new mode of classification wherein any portion of a knowledge base can be loaded into PowerLoom's working memory, and the PowerLoom classifier will classify only the main memory-resident knowledge entities. Alternatively, a fully "lazy" strategy, called "just in time" classification, can be adopted wherein concepts and instances are not classified until they are first referenced by a user, GUI or application. In this effort, we will apply the just-in-time classifier to very large knowledge bases, and develop preference heuristics that optimize the classifier's ability to apply its computational resources where they will yield the most value to an application or user. The flexibility provided by the just-in-time classifier architecture will enable us to develop a scalable inference architecture wherein multiple workstations can be harnessed to classify different portions of a knowledge base, with their results returned to a central server that shares those results with all users.

As programming languages drift in and out of fashion, large software systems built in those languages can become obsolete, because the expense of porting them to newer languages can be prohibitive. This phenomenon is currently afflicting systems built in Common Lisp--large research systems such as Loom are gradually becoming less useful to certain classes of users as those users migrate to other languages (e.g., to C or C++). ISI has developed a unique programming language called STELLA, tailored for programming intelligent symbol processing applications, that eliminates this problem. Currently, programs written in STELLA can be translated into efficient C++ and Common Lisp programs. We plan to develop a STELLA-to-Java translator as part of this effort. A system programmed in STELLA can therefore be used by the (still considerable) body of researchers that use Common Lisp, as well as by (more product-oriented) users who base their software on C, C++, or Java. PowerLoom is written in STELLA, and hence will run efficiently in three different languages. We also plan to implement major portions of the Ontosaurus tools in STELLA, making them portable as well. The use of STELLA-based technology means that research systems like PowerLoom will be able to transition much more rapidly and smoothly into commercial or product environments.


C. Technical Rationale, Approach, and Plan

C.1. Introduction

The goal of the High-Performance Knowledge Base (HPKB) program is to produce and field the technology system developers need to rapidly and inexpensively construct large comprehensive knowledge-based systems. Unfortunately, acquiring, formalizing, and representing knowledge is difficult and time-consuming. It is therefore hard to build large knowledge-based systems if one continues to start from scratch each time a new system is built. Building the next generation of knowledge-based systems (qualitatively larger than today's systems) will be possible only if we abandon the approach of hand-crafting each knowledge base. Researchers need to develop ontologies and knowledge bases that can be shared and reused. Developers need to collaborate to build up a sizable library of reusable knowledge bases. Implementors must be able to construct domain-specific knowledge bases by composing, and specializing reusable knowledge bases. To further enhance the reusability of a KB, the developers should also be able to translate a KB to the target application environment, that is, to a specified programming language, KR system and problem-solver(s).

The problems in sharing and reuse of knowledge have already been identified by DARPA and the research community as a key impediment to wider use of knowledge-based systems technology. This realization led to the establishment of the DARPA Knowledge Sharing Effort [Neches et. al. 1991]. Over the past several years, we have started to build the infrastructure for knowledge sharing and reuse. Today there is a significant body of work on pieces of the problem, such as a portable library of ontologies and knowledge bases [Gruber 1993], knowledge interchange formats [Genesereth 1991; Genesereth and Fikes 1992], generic application program interfaces (APIs) (e.g. KQML [Finin et al. 1994], GFP [Karp et al. 1995]) for knowledge exchange, and prototype environments for collaborative development and management of ontology libraries. Additional research in the area of generic tasks [Chandrasekaran and Johnson 1993] in the United States and Common KADS [Wielinga and Breuker 1986] in Europe has developed a foundation for characterizing abstract problem-solving strategies and ontologies for describing the tasks and the role knowledge plays in problem-solving.

In parallel, the emergence of the World Wide Web, the client-server and intelligent agent architectures, and platform-independent programming languages such as Java, now make it possible to exploit the Internet for the development of distributed collaboration tools. These tools will facilitating team development of knowledge bases and make libraries of knowledge-based components readily available to the community. A number of tools (e.g. Ontosaurus [Swartout et. al. 1996] at ISI, and Ontolingua [Gruber 1992] at Stanford) are beginning to provide such capabilities for knowledge base development.

These emerging systems and technologies provide a solid foundation for a new paradigm of knowledge base development and maintenance that will meet the goals of the HPKB program. This effort proposes to develop Knowledge Builder by integrating and extending Ontosaurus, a web-based environment for collaborative development of knowledge bases, with PowerLoom, a highly expressive, scalable knowledge representation and reasoning system designed to support reasoning services necessary to support ontology assembly, validation, and maintenance of very large libraries of ontologies and domain theories. The proposed effort will result in an integrated knowledge base development environment that will provide: The proposed system architecture will integrate these tools and techniques to provide a comprehensive knowledge base development, validation, and maintenance environment. In our architecture, a well-integrated set of Web-based tools provide various ontology management services. Underlying these tools is a powerful KRS that manages the knowledge base and provides a variety of inferencing services. The tools exploit the KRS's reasoning capabilities to provide an exceptional degree of assistance to an ontology developer. A KRS used in this capability is functioning as an "ontology server".

The notions of "ontology server" and "knowledge representation system" (KRS) overlap considerably. The difference is mainly one of emphasis--an ontology server emphasizes the ability to construct, manipulate, and maintain knowledge bases containing definitional and background knowledge, while a KRS emphasizes support for reasoning services that can be applied to a knowledge base by an intelligent application. At the high end, both kinds of systems demand highly expressive knowledge representation capabilities, combined with efficient, in-depth reasoning services.

Figure 1. Architecture for the proposed Ontology Server.

Figure 1 shows the proposed structure for the Knowledge Builder system. The proposed system will significantly enhance the capabilities of the current version of the Ontosaurus Ontology Server. The present version of Ontosaurus is built around the Loom KR system. During our implementation of Ontosaurus, we have discovered that a number of capabilities, such as a view mechanism and semantic analysis, require data structures and reasoning capabilities not commonly implemented in KR systems. The proposed implementation will be integrated with PowerLoom, a much more powerful scalable KR system. The PowerLoom system being developed under an existing DARPA contract will provide many of the features we consider important in an ontology server. This effort will extend those features with a particular emphasis on scaling and performance issues. The PowerLoom will provide a wider array of services for ontology construction, validation and maintenance, as well as for the construction of intelligent knowledge-based systems. Other key features of the Ontosaurus-PowerLoom system will be its scalability (to operate on very large knowledge bases), its just-in-time classifier that provide scalable reasoning (workstations can be added incrementally to meet the computational load), and its modular design (to allow user configuration and easier upgrades).

C.2. Ontosaurus Ontology Server

Ontosaurus, developed under DARPA funding (Knowledge Sharing, and ARPI Jump Start), is a Web-based ontology and knowledge base development environment using the Loom KR system. Because Ontosaurus uses Loom, it can provide a number of ontology manipulation and inference capabilities. For example, it can identify additional subsumption relations implicit in user-specified ontologies, it can detect inconsistencies in user definitions, and it can propagate and validate type and number constraints on hierarchically organized roles and relations. The Ontosaurus Web server has been available to the DARPI community since September 1996. At a recent I*3 PI meeting we demonstrated prototype capabilities for extracting and merging ontologies, ontology translation between Loom and Ontolingua, and compilation of the C2 Schema into C++ class definitions. As a result it is currently being evaluated by OMWG at the Air Force's Armstrong Laboratories for use in the development of the C2 Schema.

The proposed effort will continue the development of the Ontosaurus system to provide foundation-building technology (section B.1.1) for ontology libraries and knowledge bases. Towards this end we will extend existing capabilities and develop new capabilities to support: Each of these points will be addressed in detail below:

Collaborative Development Environment: Rapid development of large multifaceted knowledge bases can only be achieved through collaboration between teams of experts. Users will need a supportive environment that enables them to collaborate in building, editing and maintaining an ontology, including the ability to work individually or in a group session. They need fine grain access control and locking to support simultaneous use by multiple users. They need audit trails to identify authors and sources of information, and an ability to rollback earlier changes. They also need an ability to work on independent copies of an ontology, identify differences, and merge the results into newer versions (see ontology construction below). They want to view and manipulate ontologies from the perspective of their area of expertise or responsibility (see view manager below).

Support of distributed collaboration was a key motivation underlying the use of Web tools and protocols in the development of Ontosaurus. Although Ontosaurus supports collaborative development, its capabilities are limited to one group at a time. This capability will be extended to allow multiple simultaneous sessions and support for users to create sessions, locate sessions in progress and join or leave sessions. Currently users are provided with read/write privileges on all information in an ontology. Proper support of collaboration requires that we implement access control at much finer level. For example, a user my be allowed to change only a part of an ontology (related to his area of expertise) or only certain fields in a definition. However, the precise nature of the requirements for access control can only be determined through actual use. In this proposed effort, we will experiment with access controls at different levels of granularity (ontology, sub-hierarchy, concept and slot level) and refine them through actual use.

View Manager: The ontologies and knowledge bases designed to support the entire spectrum of concerns of a given domain, such as the JFACC or ACP will contain knowledge about each concept in great detail, only a small fraction of which is required to solve any one aspect (e.g. logistics planning) of the problem. Consider for example the JFACC domain. An air campaign planner will be interested in different types of aircraft, their armaments and mission capabilities, whereas a logistic planner supporting the same mission will be interested in the fuel type and capacity for the same aircraft. Describing each concept in its full detail is likely to confuse both user, or unnecessarily clutter KBSs supporting them. To overcome this problem, we propose to develop a view mechanism (similar in spirit to database views) that will tailor the information as well as the presentation based on the needs of a domain or user. Although databases have supported view mechanisms for some time, view mechanisms have not been used in the knowledge-based area because knowledge bases in the past have been designed for a single purpose, and because KR languages are significantly more expressive than database schema definition languages. The proposed effort will design and implement a general view definition facility that will allow layers of abstract views to be defined to meet the needs of a user or an application. Because KR languages are significantly more expressive than databases schema languages, the KR-language-based view definition facility will allow creation of substantially more complex views while providing precise logical semantics to operations performed over views. In particular, unlike database, which do not allow updates through views (because they lack the logical semantics that would eliminate ambiguities), these views will be updatable. The implementation of a view definition and mapping facility will rely on the more general schema mapping and translation facilities incorporated in the PowerLoom system.

Ontology Construction: This module will provide a toolkit for constructing new ontologies tailored to a particular domain by composing, pruning, extracting, and merging existing ontologies.

Composing ontologies: The process of creating a domain knowledge base begins with the identification of concepts and relations for modeling basic aspects of the domain, these include the fundamental mathematical building blocks for representing (sets, sequences, numbers etc.) and the formalization of concepts required to model the physical world (such as the animate and inanimate objects, time, and space). In the absence of outside guidance, this task is often very difficult and generally leads to an ad hoc organization of knowledge. Given a library of carefully defined ontologies addressing these fundamental areas, the task of constructing the upper ontology can be achieved by selecting appropriate modules (micro-theories) from the library and joining them together. (See Figure 2). Many existing KRSs including Loom provide the capability for composing ontologies and knowledge bases in an hierarchic manner. The process of composition, however, can only be applied to micro-theories that are designed modularly and do not overlap, or where the overlap between the ontologies is carefully managed. In the proposed effort we will develop modular interfaces for micro-theories that will allow different formalizations to be interchanged without affecting other theories, and develop guidelines for the selection of theories for composition.

Figure 2. Composing Ontologies Figure 3. Pruning an ontology

Pruning: The pruning operation allows a designer to delete concepts or a sub-hierarchy of concepts (see Figure 3) that are not needed for a given domain. Some KRSs (including Loom) provide support for recursively deleting all sub-concepts and other concepts which refer to a given concept. In our experience with Ontosaurus, we have identified a number of additional pruning operations that are less drastic than the existing pruning operations. These include selective deletion of offending clauses from the definition of referring concepts and generalization of the definitions to eliminate references to deleted concepts. These additional deletion capabilities will be implemented within the PowerLoom system and made available to the user as options in the pruning process.

Figure 4. Extracting a domain-specific ontology from a broad coverage ontology

Ontology Extraction: One of the most time consuming aspect of modeling a domain is the enumeration and organization of tens of thousands of domain concepts. ISI has been exploring a novel approach to assist in this process using existing broad coverage ontologies such as the SENSUS [Knight and Luk 1994] and WordNet [Miller 1990] ontologies [Swartout et. al. 1996]. Our approach begins by identifying a small number of key domain terms (called seeds). These terms are then mapped to concepts in the SENSUS ontology (Figure 4a). Using a variety of structural and probabilistic heuristics, we then identify concepts in the ontology that relate to the seed concepts (Figure 4b). These concepts delimit the domain relevant aspects of the larger ontology, and are extracted to create an approximate domain-specific ontology (Figure 4c). The designer can then browse this ontology and delete unrelated terms or import additional terms to arrive at an initial domain model. This process was used to create the ACP-SENSUS ontology for air campaign planning (ARPI). Using 50 seed terms provided by experts, an initial ontology of approximately 1200 domain concepts was created in less than a week. A detailed description of this process is in [Swartout et. al. 1996].

Figure 5. Merging parts of one ontology into another.

Knowledge Base Merging: The merging operation allows a designer to augment one ontology or knowledge base by extracting and incorporating selected parts from independently developed ontologies (shown in Figure 5). A prototype of an interactive merging capability was demonstrated at the I*3 workshop in November 1996. In this demonstration, Ontosaurus was used to augment the C2 Schema (JTF-ATD) with more detailed information about combat aircraft from an independently developed ACP ontology (ARPI). During this demonstration, we were able to extend the C2 Schema by approximately 200 additional concepts within 10 minutes, a process which would take hours to days to complete without this tool.

The extraction and merging operation presents a variety of technical challenges. For example, the two knowledge bases may use different naming conventions, they may refer to the same underlying concept with different names, or different concepts with same name. Furthermore, because ontologies are richly interconnected, the process of extracting only selected concepts requires deciding how to isolate the desired portions from the remainder of the ontology. Finally, the axioms and definitions written in the context of the originating ontology must be reinterpreted and made consistent with those in the receiving context. Some of these problems can be solved pragmatically. For example, the problem with naming conventions is solved in the Ontosaurus using a mapping table that can translate between common naming conventions. The problem of name mismatch between ontologies can be solved by mapping concepts in both ontologies to a common reference ontology (e.g. SENSUS) and using the reference ontology for name alignment. The problem of different definitions can be resolved by merging the definitions and allowing the user to resolve conflicts when they arise. The problems with lifting axioms from one context to another, however, is considerably harder and requires fundamental advances in the theory of contexts. We will collaborate with Professor John McCarthy (they have independently submitted a proposal for the development of the theory of contexts) on further development of the formal theory of contexts with the goal of (1) providing formal understanding of the process of combining ontologies, and (2) pragmatically making the merging of ontologies efficient.

Semantic Analysis and Maintenance: General KR systems have traditionally provided powerful reasoning services aimed towards the development of knowledge-based systems. Many of these reasoning services can also be used to develop intelligent tools for the development and maintenance of ontologies. Ontology maintenance, however, requires additional meta-level reasoning capabilities that allow tools to inspect inferences and their logical dependencies. Through the use of Ontosaurus we have identified a number of such reasoning services. These include: (a) identifying undefined concepts and relations, (b) identifying inconsistencies and conflicts within ontologies, (c) verifying completeness, and (d) identifying semantic differences between related concepts. These capabilities are described below.

Undefined terms: The construction of a domain ontology is an incremental process. While this process is in progress, many terms remain undefined. It is essential that the ontology development environment identify these undefined items and allow the user to incrementally complete the ontologies. Furthermore, the KR system must continue to function in the presence of undefined concepts. The current implementation of Ontosaurus provides the capability for identifying and editing undefined concepts. The proposed implementation will extend this capability by providing an agenda mechanism similar to that implemented in Expect system [Gil and Melz 1996; Swartout and Gil 1995] to further enhance the usability of this function.

Inconsistencies and conflicts: A good ontology construction tool must provide capabilities for identifying conflicts and inconsistencies. Furthermore, when such conflicts arise, the tool must assist user in identifying the source of the problem and resolving it. Many KR systems detect logical conflicts, but do not provide enough information to analyze the sources of conflicts (i.e. sets of conflicting assumptions). In the proposed effort we will develop reasoning capabilities for identifying sources of conflicts and develop tools to assist users in detecting and resolving inconsistencies.

Completeness: The completeness of an ontology can be characterized in two ways. First, within the knowledge specification of the ontology, we can verify that all referenced terms are defined, all required fields of an instance are filled, and when a concept is exhaustively partitioned, that all its instances belong to exactly one of the partition subclasses, and that all subsumption relations implicit in the definitions have been made explicit. These capabilities are not only useful during the initial construction of an ontology, but are also essential for continued integrity and maintenance of the knowledge bases. Determining completeness of a KB with respect to the knowledge needed for problem solving-- a task commonly performed by knowledge acquisition tools such as the EXPECT system. In a separate proposal submitted to the HPKB BAA, our team will work closely with ISX (or other selected integration contractors) to integrate Knowledge Builder with problem-solvers and knowledge acquisition tools to support this functionality.

Semantic differences between related definitions: While composing, merging or updating ontologies, one is often faced with the need to identify semantic differences between two concepts ( similar concepts in different ontologies, or the same concept in different versions of a single ontology). To assist in this process, we will develop reasoning capabilities that can compare two concepts to identify semantic differences between them and develop tools for comparing different versions of an ontology. Pairwise comparison between concepts is needed both for merging and maintenance.

Translation Toolkit: Although the ontologies in the ontology library are represented in a common representation language, different knowledge-based systems that use these ontologies will no doubt use a variety of knowledge representation systems and programming languages. To fully benefit from a common ontology development environment, it is necessary to make the ontologies developed in Ontosaurus available in different languages and KR formalisms. The current Ontosaurus system has a small number of translators for producing KIF, Ontolingua, KRSS, and C++. In a companion proposal (PALINDROME, ISX), we will develop a common translation toolkit for the rapid construction of translators based on declarative mappings between PowerLoom and the target language. We will use this toolkit to create additional translators from Loom to IDL and Java. We expect our users to be able to assemble additional translators for translating ontologies into frame-based and logic-based representation languages.

C.3. PowerLoom

ISI's Loom and PowerLoom systems have both been designed to provide high-end KRS services. Loom was developed originally as a general-purpose KRS, but it has always provided certain services highly useful to an ontology server--the ability to reason fluently with definitions is a hallmark of classifier-based KR systems. ISI's Ontosaurus tool is designed to exploit the inference capabilities that Loom makes available. The combination of Ontosaurus and Loom results in an ontology manager that is quite advanced along certain dimensions. Loom can immediately validate knowledge as it is entered into an ontology using Ontosaurus--it flags any classes or instances whose information (definitions and/or assertions) is inconsistent. Loom also organizes definitions into a hierarchy, and classifies instances below the most specific definitions that they satisfy. The inferences generated by the classifier enable users to view the logical structure of a knowledge base, and to trace implied as well as explicitly-defined relationships.

A capable ontology server must support a rather long list of features. Below we list eight features that help to distinguish existing KRSs that can fill the role of ontology server. Table 1 in Section H (Comparison with Ongoing Work) compares several systems using this feature set. Ideally, an ontology server will provide all of these features; unfortunately most of the KRSs listed (including Loom) fail to support many of them.
  1. Scalable Knowledge Base. The server can access a knowledge base of definitions and axioms larger than what can be stored run-time memory. It will efficiently support navigation across the entire knowledge space.
  2. Fully expressive. The server will allow concept definitions and axioms at least as expressive as the first order predicate calculus.
  3. Efficient deductive support for ontology validation and organization. Today, only servers that include a concept classifier provide an adequate combination of efficiency and depth of inference to support this feature.
  4. Deliverable in a commercially-oriented language. Today, this means a C or C++ version of the server.
  5. Query processor. The server supports a query language, equivalent to SQL in power, to query all parts of an ontology.
  6. Concept editing. Users can edit already loaded concepts (some classifier-based KRSs don't support this).
  7. Contexts. Knowledge can be partitioned into a hierarchy of contexts, some of which inherit from others. Reusable ontologies depend on this form of modularity.
  8. Scalable Inference. Multiple machines can run in parallel to solve an inference problem (e.g., to classify a concept hierarchy).

We consider the first three features to be essential. Scalability is crucial to the successful marriage of ontology-based software with large scale applications; Expressivity insures that users will not outgrow the representation language; Deductive support is key to providing intelligent assistance to an ontology engineer. No KRS in use today provides more than one of these three key features. The PowerLoom KRS being implemented as a successor to Loom provides or lays the groundwork for each of the eight features. The enhancements to PowerLoom detailed in this proposal round out the feature set. In particular, the enhanced PowerLoom provides scalability, enabling it to manage very large knowledge bases, and it provides distributed inference--any combination of processing resources available on a PowerLoom server or client can contribute to the inferencing needed by the server. The proposed effort also completes the functionality of PowerLoom's just-in-time classifier, which fixes a problem (exhibited by all other classifier-based KRSs) of delayed system response due to over-aggressive inferencing. Summarizing, a PowerLoom KRS, enhanced with the capabilities proposed in this effort, will significantly advance the capabilities that users and applications can expect from an ontology server, or from a general-purpose KRS; particularly in terms of the scale of knowledge bases that can be managed and reasoned with.

In the remainder of this section, we outline our approach to enhancing PowerLoom. In the process, we will discuss additional problems and benefits that accompany our approach. As noted in Table 1, the PowerLoom system being developed under an existing DARPA contract will provide many of the features we consider important in an ontology server. This effort will round out those features, which a particular emphasis on scaling and performance issues. The proposed work breaks down into the following tasks:
Figure 6 illustrates a scalable KRS architecture designed to accommodate very large knowledge bases and exploit parallel processing power to achieve scalable inference. A minimal configuration consists of a central server accessing a knowledge base stored in main memory, connected to a Web-based client (Ontosaurus). Scalable storage is achieved by interfacing the server to a backend relational DBMS (RDBMS) or other commercial storage system. The coupling between the central server and the RDBMS is achieved using a "wrapper" module referred to as the "Knowledge Pager" (KPager).

Figure 6. Configurable Distributed Inference Architecture

The Knowledge Pager mediates all accesses between the server and the RDBMS, pulling knowledge structures into the KRS's local memory on demand. The Knowledge Pager wrapper comes in three versions. The first interfaces to the RDBMS. The second version attaches to a client- enabling the client to "page in" knowledge from a remote server. A third version attaches to a "satellite" KRS, and enables that KRS to upload knowledge from a server over a (high bandwidth) local network. These different wrappers make it possible to assemble many different client/server/satellite configurations.

Scalable inference is achieved by (i) moving inference from the central server onto client-side servers or satellite workstations, and by (ii) eliminating the need to reason with knowledge not directly accessed by a user or application. The configurable architecture just described provides the scalable processing power needed to achieve (i). The other piece of needed technology is a "just-in-time" (JIT) classifier. Conventional concept classifiers operate efficiently only in a "batch mode" where they classify all instances and concepts in a knowledge base in one pass. This approach to classification does not scale well. A JIT classifier is capable of classifying any subset of existing concepts and instances, in any order. PowerLoom implements the world's first and only JIT concept classifier. Having a JIT classifier means that portions of a knowledge base can be transferred onto a client or satellite processor and classified independently of the central server. The results of such classifications (which have a very concise encoding) are returned to the server to be shared by all clients. Thus, the sharing of processing described in (i) above becomes feasible. The batch mode approach to classification used by conventional classifiers requires that all concepts and instances in a knowledge base be classified. This makes the process of loading very large knowledge bases increasingly slow, i.e., the batch classification process does not scale well (for example, Loom hits an upper limit around 50,000 concepts). A JIT classifier makes it possible to classify only those portions of a knowledge base actually referenced by an application, leaving the remainder of the KB untouched. The overhead of classification can be sublinear in the size of the KB, solving the objective of (ii) above.


A. Bibliography

[Abbrett and Burstein, 1987] G. Abbrett and Mark Burstein. The KREME knowledge editing environment. In International Journal of Man-Machine Studies (1987) 27, pp. 103-126.

[Basili et. al. 1996] V. R. Basili, L. C. Briand, and W. L. Melo. How Reuse Influences Productivity in Object-Oriented Systems. In the Communications of the ACM, 39:10:104-116, October 1996.

[Bateman et. al. 1989] J. A. Bateman, R.T. Kasper, J.D. Moore, and R.A. Whitney. A General Organization of Knowledge for Natural Language Processing: The Penman Upper Model. Unpublished research report, USC/Information Sciences Institute, Marina del Rey. 1989.

[Chandrasekaran and Johnson 1993] B. Chandrasekaran and T.R. Johnson. Generic Tasks and Task Structures: History, Critique and NewDirections. In J.-M. David, J.-P. Krivine and R. Simmons (eds.) Second Generation Expert Systems (Berlin) 1993.

[Finin et al. 1994] T. Finin, D. McKay, R. Fritzson, and R. McEntire, "KQML: An Information and Knowledge Exchange Protocol", in Kazuhiro Fuchi and Toshio Yokoi (Ed.), Knowledge Building and Knowledge Sharing , Ohmsha and IOS Press, 1994.

[Fowler, Cross and Owens 1995] N. Fowler III, S. E. Cross and C. Owens. The ARPA-Rome Knowledge Based Planning and Scheduling Initiative, IEEE Expert, 10(1) 4-9. February 1995.

[Gaines 1994] B.R. Gaines. Class library implementation of an open architecture knowledge support system. International Journal of Human-Computer Studies 41(1/2) 59-107, 1994.

[Gaines and Shaw 1995] B. R. Gaines and M. L. G. Shaw. WebMap Concept Mapping on the Web. In the Proceedings of the Fourth International World Wide Web Conference, Boston, Massachusetts. 1995.

[Genesereth 1991] M. R. Genesereth. Knowledge Interchange Format. Principles of Knowledge Representation and Reasoning: Proceedings of the Second International Conference, Cambridge, MA, pages 599-600. Morgan Kaufmann, 1991.

[Genesereth and Fikes 1992] M. R. Genesereth, R. E. Fikes (Editors). Knowledge Interchange Format, Version 3.0 Reference Manual. Computer Science Department, Stanford University, Technical Report Logic-92-1, June 1992.

[Gil and Melz 1996] Y. Gil and E. Melz Explicit Respresentation of Problem-Solving Strategies to suport Knowledge Acquisition. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, 1996.

[Gruber 1993] T. R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In Formal Ontology in Conceptual Analysis and Knowledge Representation, N. Guarino and R. Poli, editors, Kluwer Academic, in preparation. Original paper presented at the International Workshop on Formal Ontology, March 1993. Available as Stanford Knowledge Systems Laboratory Report KSL-93-04.

[Gruber and Tenenbaum 1992] T. R. Gruber, J. M. Tenenbaum, and J. C. Weber. Toward a knowledge medium for collaborative product development. Proceedings of the Second International Conference on Artificial Intelligence in Design, Pittsburgh, pages 413-432. Kluwer Academic, 1992.

[Gruber 1992] T. R. Gruber. Ontolingua: A mechanism to support portable ontologies. Stanford University, Knowledge Systems Laboratory, Technical Report KSL-91-66, March 1992.

[Guarino and Carrara 1993] N. Guarino, M. Carrara, and P. Giaretta. An Ontology of Meta-Level Categories. LADSEB-CNR Int. Rep. 6/93, Preliminary version - November 30, 1993.

[Hatzivassiloglou and Knight 1995] V. Hatzivassiloglou and K. Knight. Unification-Based Glossing. Proceedings of the 14th IJCAI Conference. Montreal, Quebec. 1995.
[Hovy and Nirenburg 1992] E. H. Hovy and S. Nirenburg. Approximating an Interlingua in a Principled Way. Proceedings of the DARPA Speech and Natural Language Workshop. Arden House, NY. 1992.

[Hovy and Knight 1993] E. H. Hovy and K. Knight. Motivation for Shared Ontologies: An Example from the Pangloss Collaboration. Proceedings of the Workshop on Knowledge Sharing and Information Interchange, IJCAI-93. ChambŽry, France. 1993.

[Karp et al. 1995] P. D. Karp, K. Myers and T. Gruber, The Generic Frame Protocol. Proceedings of the 1995 International Joint Conference on Artificial Intelligence, pp. 768-774, 1995.

[Karp et al. 1994] P. D Karp, S. M. Paley and I. Greenberg,. A Storage System for Scalable Knowledge Representation. Proceedings of the Third International Conference on Information and Knowledge Management, 1994.

[Karp and Paley 1995] P. D. Karp and S. M. Paley. Knowledge representation in the large. Proceedings of the 1995 International Joint Conference on Artificial Intelligence, pp. 751-758, 1995.

[Knight and Luk 1994] K. Knight. and S. Luk. Building a Large Knowledge Base for Machine Translation. Proceedings of the American Association of Artificial Intelligence Conference (AAAI-94). Seattle, WA. 1994.

[Knight et. al. 1995] K. Knight, I. Chander, M. Haines, V. Hatzivassiloglou, E. H. Hovy, M. Iida, S.K. Luk, R.A. Whitney, and K. Yamada. 1995. Filling Knowledge Gaps in a Broad-Coverage MT System. Proceedings of the 14th IJCAI Conference. Montreal, Quebec.

[Lenat and Guha 1990] D. B. Lenat and R. V. Guha. Building Large Knowledge-Based Systems: Representation and Inference in the CYC Project. Addison-Wesley Publishing Company, Inc., Reading, Massachusetts. 1990.

[MacGregor 1994] R. M. MacGregor. A Description Classifier for the Predicate Calculus. in Proceedings of the Twelfth National Conference on Artificial Intelligence, (AAAI-94), 1994.

[MacGregor 1991] R. M. MacGregor. Using a Description Classifier to Enhance Deductive Inference in Proceedings of the Seventh IEEE conference on AI Applications, 1991.

[MacGregor 1991] R. MacGregor. The Evolving Technology of Classification-Based Representation Systems. In J. Sowa (ed.), Principles of Semantic Networks: Explorations in the Representation of Knowledge. Morgan Kaufmann, 1990.

[Mallery 1994] John C. Mallery. A Common LISP Hypermedia Server, in Proceedings of the First International Conference on The World-Wide Web, Geneva CERN, May 25, 1994.

[McGuire et. al. 1993] J. G. McGuire, D. R. Kuokka, J. C. Weber, J. M. Tenenbaum, T. R. Gruber, G. R. Olsen. SHADE: Technology for Knowledge-Based Collaborative Engineering. Journal of Concurrent Engineering: Applications and Research (CERA), 1(2), September 1993.

[Michalski 1980] R. S. Michalski. Knowledge Acquisition Through Conceptual Clustering: A Theoretical Framework and an Algorithm for Partitioning Data into Conjunctive Concepts. Policy Analysis and Information Systems, 4(3) 219-244, 1980.

[Miller 1990] G. Miller. WordNet: An on-line lexical database. International Journal of Lexicography , 3(4). (special Issue). 1990.

[Neches et. al. 1991] R. Neches, R. Fikes, T. Finin, T. Gruber, R. Patil, T. Senator, & W. R. Swartout. Enabling technology for knowledge sharing. Enabling technology for knowledge sharing. AI Magazine, 12(3):16-36, 1991.

[Patel-Schnider et. al. 1993] DRAFT of the specification for Description Logics, produced by the KRSS working group of the DARPA Knowledge Sharing Effort. updated July 1993.

[Swartout and Gil 1995] W.R. Swartout and Y. Gil. EXPECT: Explicit Representations for Flexible Acquisition. In Proceedings of the Ninth Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Alberta, 1995.

[Swartout et. al. 1993] W. R. Swartout, R. Neches and R. Patil. Knowledge Sharing: Prospects and Challenges. In Proceedings of the International Conference on Building and Sharing of Very Large-Scale Knowledge Bases `93, Tokyo, Japan 1993.

[Swartout et. al. 1996] W. R. Swartout, R. Patil, K. Knight, and T. Russ. Toward Distributed Use of Large-Scale Ontologies. In Proceedings of the Banff Knowledge Acquisition Workshop, Banff, Canada, Nov. 1996
available in html.

[Tate 1996] A. Tate. Towards a Plan Ontology. Journal of the Italian AI Association (AIIA) January 1996.

[Wielinga and Breuker 1986] B.J. Wielinga and J.A. Breuker, Models of Expertise, ECAI 1986, 497-509.