This sub-section describes the end-user interaction with WebScripter (WS) in detail, to (a) demonstrate that it can handle real-world tasks, to (b) describe user interaction in enough detail to support our claim that regular users can use this tool, and to (c) describe system behavior and reasoning in enough detail so that the purpose of the software components in the Architecture page becomes evident.
The end-user interface uses the familiar paradigms of working with spreadsheets and navigating the Web. The user starts WS and types the names of several universities into the first column. At the point shown below, truly nothing is known about these hand-typed values.

Figure D-1
After the analyst selects “classify” from a menu, WS uses a list of well-known indices to find an existing taxonomy that matches all of the typed phrases[1]. Yahoo:UniversitiesAndColleges and Lycos:Universities both apply (note that commercial search engines do not have to be DAMLized to be useful to our reasoning here). The universities now appear underlined because they are recognized by the system - double-clicking on them now would bring up their web pages. The system then proactively fetches their DAML-enabled Web pages in the background (or just goes to the local crawled knowledge base), and computes a minimal covering set of declared DAML IS-A types that cover all the universities. In our example, all of the current universities are declared to be instances of the World-Wide Web Consortium’s W3C:University concept. After this step, the user interface looks like this:

Figure D-2
The analyst now selects “find more” from the menu bar. The system will fetch every entity that the two known indices point to (several hundred). It simultaneously performs a different type of analysis: Which are the DAML is-a types that are declared by more than 10%[2] of the entities (result: U.N.:University, UsPostalService:Recipient, W3C: University, and IRS:NonProfitInstitution)? Of these, which apply to less than 1% of nearby categories of the same index (remaining result: W3C:University and U.N.:University)?. (The first is a test of recall, the second is a test of precision.) The latter category is now automatically treated as alternative valid type, and WebScripter will consult its crawled database and include every entity declaring to be of new alternative types, thereby finding institutions not yet listed by the well-known indices.
Note that there are no duplicate universities in this column (such as UCLA and UC Los Angeles). The challenge, of course, is to be able to determine that they are “different”, as they subscribe to different DAML ontologies. We propose that any DAML description of an entity existing on the Web contain its normalized HTTP URL in the standardized attribute home, which can serve as a simple unique id for comparisons across ontologies (first choice for disambiguation in our example). We further propose that DAML concepts contain (possibly composite) keys that point into popular external ontologies for the same reason (“companies in this ontology are uniquely identified by their UsTreas:IRS:TaxPayerId”), (“universities are identical if they point to the same UsPostalService:UsStreetAddress”).

Figure D-3
Let’s pause here for a moment to reflect on how much WebScripter achieves even with relatively little underlying DAML. (1) The user did not have to do anything but type out some university names that came to mind – she didn’t have to understand an ontological query language or the notion of an ontology or even a taxonomy for that matter - yet the result is perfectly ontologically typed. (2) Precious little DAML has to be in place for WebScripter to work: for this particular example, two external DAML ontologies of existing non-DAML university web sites would be sufficient for the inferencing of this example. (3) WebScripter can use existing non-DAML commercial taxonomies for its reasoning. (4) WebScripter can seamlessly integrate different ontologies without the need for pre-merging the ontologies. We believe that if our proposed project accomplished nothing more than making this classification and retrieval robust and intuitive, that alone would lead to widespread adoption of DAML on the Web.
The analyst now demonstrates that she wants to extract the nationality of the universities in the following matter. The user double-clicks on USC, which brings up a Web browser on its DAML-enabled home page. The user then clicks on Maps & Directions, and copies and pastes United States from that page[3].

Figure D-4
In response, the system now fills in all those cells that use the same underlying W3C university definition by inferring the ontological path from university to country and applying it to all other instances of this ontology (we are now at the stage shown in Figure D-4, the four non-bold United States entries were just filled in). In our particular case, the user is best served by now doing the same for the UN-based university entry “Stanford” (not shown) because there are only two ontologies involved[4]. As a result, all missing countries in the second column are now filled in. The user selects a United States cell in the second column and invokes “filter by” from a menu, which removes Oxford and all other foreign-university rows from sight.
Performing a number of substantially similar steps, the user can navigate to the universities’ chemistry departments, from the department to the faculty, from the faculty to their research interests, and filter by a particular research area, resulting in the table shown below. (As before, bold entries were provided or demonstrated by the user; but we are no longer underlining recognized cells in the tables below for readability).

Figure D-5
For the sake of completing the intelligence analysis running example, the analyst would now create a new worksheet by starting out with conferences from the American Chemical Society, filtering them by location outside the United States, and listing their attendees.

Figure D-6
The user obtains the end result in Figure D-7 through a step-by-step progression of relational operations on the tables: copy the contents of the first worksheet to a third, filter out those faculty who do not appear in the second worksheet, join the result with the second worksheet over the person columns, project away some columns that were useful only for intermediate filtering, re-arrange the columns, and sort the rows by frequency of conference attendance. (This long sentence may make it sound more difficult than that is; remember that it describes a series of small steps, not a single huge step.)[5]
The astute reader may have noticed that there seems to be a disconnect in our example, as the contents of column E in Figure D-7 are not literally identical to the contents of column D in Figure D-5 (Environmental Chemistry suddenly reads Stanford Environmental Chemistry). In reality, what happened is that the user simply changed the surface representation of the academic departments column – similar to changing an Excel cell type from Date to Currency, which has no effect on the actual cell content. This is possible because the underlying department has and always had a link to its university so that it can offer a variety of surface representations (“USC Chem. Dept.”, “Dept. of Chemistry”, “Department of Chemistry at the University of Southern California”).
Thus, the table below contains the end result of the intelligence analysts’ report on U.S. supercritical CO2 experts frequently attending overseas conferences. Now that it is defined, its data can be refreshed at any time, and it itself can become the source for further Web scripting as it carries all its DAML within the generated HTML report.

Figure D-7
Our proposed system imposes a unique requirement on the structure of DAML, namely that the underlying DAML closely match and be embedded within the human-readable HTML content of the Web page. This is because we want regular Web users to compose DAML queries by interacting with a standard (but proxy-server-instrumented) HTML browser. We believe that this requirement is reasonable, as this is also desirable to keep the human- and machine-readable content in sync. In addition, we are especially concerned with being able to tell if two instances of concepts from different ontologies are “the same”, and propose the two approaches “normalized URL” and “declared (possible composite) key to external ontology” to address that problem.