What is it?

The Dictionary Parsing Project is a collaborative effort to extract useful information from dictionaries. We are beginning with the publically available Webster's 1913 dictionary. (A Web interface is available from ARTFL.) We are looking at building an ontology of the words found in the dictionary. As well as linking hypernyms and hyponyms, we are looking at linking words based on other relations found in the dictionary; e.g., a driver relation between car and person.

Participation

Help Out!

DPP has a number of tasks and welcomes additional collaborators and advisors. Please contact Ken Litkowski of CL Research if you would like to be included in this important effort. We have a parser with PC and Unix source code available to interested researchers. We have an initial dictionary and welcome offerings of additional lexical resources. We are making intermediate and final lexical resources publicly available.

Information Sciences Institute

So far, the algorithm developed at ISI does the following:

  1. modify the dictionary into an ASCII format
  2. regularize various parts of the dictionary
  3. identify phrases
  4. identify senses of prepositions

Data at various stages of development:

CL Research

CL Research is analyzing the results of parsing the dictionary to identify semantic relations using the DIMAP software. Please write Ken Litkowski for more information. These parses are available. DIMAP dictionaries and software are also available to interested researchers.

A sample parse.

Micra

MICRA is maintaining the dictionary. Any comments, additions, or corrections regarding the machine-readable form of this dictionary should be directed to Pat Cassidy.
Help Out! Pat is also adding new words and definitions, and any help would be appreciated. Everything is welcome: the dictionary isn't overly technical.

Resources

During the course of this project, we have developed the following resourses:

Preposition chart postscript source
This shows relationships between many prepositions in English based on Frank, and Quirk and Greenbaum.

Scripts

The following scripts have proved useful to us:

dict
This script looks up a word in the dictionary and returns its definition.
pgrep
This script searches through the dictionary for a particular string, returning all definitions containing the string and their headwords.
possearch
This script searches through the dictionary for a particular part-of-speech category, returning all definitions for words of that part-of-speech.