USC/ISI DAML Homework 3

M. Frank (with B. Yan, P. Szekely, and R. Neches) frank@isi.edu 2001-02-01

This is a short note on the USC/ISI WebScripter submission.

We wrote a C program that parses files in the ubiquitous BibTex format for bibliographic references (using the GNU Bison and Flex parsing utilities), and produces DAML statements according to a BibTex ontology. We chose the BibTex domain because there is plentiful data for it on the Web, because we want to use it to keep track of our division's publication in an automated fashion (producing personal, project, and division publication listings automatically), and because it is probably of interest for other DAML contractors as well.

The BibTex ontology is a one-to-one mapping to the BibTex types and fields, as defined in "The LaTex Companion"(M. Goossens, F.Mittelbach and A.Samarin) and Dana Jacobsen's BibTex tutorial at http://www2.ecst.csuchico.edu/~jacobsd/bib/formats/bibtex.html. The ontology lives at http://www.isi.edu/webscripter/bibtex.o.daml; it results in about 350 triples.

An example generated BibTex DAML data file covering Planning and Scheduling lives at http://www.isi.edu/webscripter/planning.scheduling.daml. It is 1.5MB big and results in 28373 triples.

We have used WebScripter to produce a simple report of the 588 journal articles in that file, sorted by reverse chronological order (http://www.isi.edu/webscripter/planningarticles.gen.html). It takes nearly eight minutes to produce the report, of which about one minute is consumed by RDFAPI, 30 seconds for the default XML->HTML transformation, and the rest by the WebScripter core itself. (We believe we can reduce the core processing time for WebScripter to about a minute by using hashing where we currently run through lists sequentially.)

Notes on DAML+OIL

We intend to build a very simple validation step for WebScripter. This is justifiable because the debugging time this saves us (which we now spend figuring out why some data does not appear in WebScripter reports that should) exceeds the implementation time of the simple validity checks below:

We will likely refrain from the following because we can't justify the effort with the benefits for WebScripter.

Obviously, we'd be ecstatic if someone would contribute more thorough DAML+OIL ontology validation, ideally as a RDFAPI built-in or add-on.

What we are currently up to

We are using WebScripter internally to keep track of open and closed to-do items on a variety of our projects, such as Meta-Ja (http://www.isi.edu/~frank/metaja/metajatodo.gen.html, http://www.isi.edu/~frank/metaja/metajadone.gen.html) and WebScripter itself (http://www.isi.edu/webscripter/wstodo.gen.html), http://www.isi.edu/webscripter/wsdone.gen.html) .

We are currently working on the first real-world application for WebScripter, which is the official personnel Web page of ISI division 2. It will fuse information that comes from n+3 DAML source files, where n is the number of people in our division: one official division roster DAML page [Jeanine DiCamillo], one page listing the location of large photo files for division people [Pedro Szekely], one page for thumb-nails produced from the photos [Martin Frank], and optionally one personal DAML page per division member that lists interests, the location of a personal home page, and so on. We expect to demo this application at the PI Meeting two weeks from now.

Where to Find This Document

This document is written in GNU texinfo format. The official location of its most readable format is http://www.isi.edu/webscripter/isidamlhw3.gen.html.

A plain text version suitable for e-mailing is at http://www.isi.edu/webscripter/isidamlhw3.gen.txt

An Adobe Portable Document Format version suitable for printing is at http://www.isi.edu/webscripter/isidamlhw3.gen.pdf.