Qtargets used in Webclopedia
USC Information Sciences Institute

Introduction

Weblcopedia uses so-called qtargets to narrow the search space of possible answers. For example, given a question like

  • How tall is Mt. Everest?

    Webclopedia parses the question and then, using recursive pattern matching, determines that the answer must be a distance quantity, something we refer to as the qtarget for this question. After parsing various answer candidate sentences, e.g.

  • Jack knows exactly how tall Mt. Everest is.
  • Jack climbed the 29,028-foot Mt. Everest in 1984 and the 7,130-foot Mt. Kosciusko in Australia in 1985.
  • Mt. Everest is 2.8% taller than K2.

    Webclopedia determines that there are no distance quantities in the first and third sentences (even though the words might look "promising"), and two distance quantities ("29,028-foot" and "7,130-foot") in the second sentence. For these three sentences, with thousands of substrings in the range of 1-50 bytes, the search space has now been reduced to a mere two candidates.

    Some qtargets are very narrow: for example, I-EN-PROPER-PLANET currently has only nine sub-concepts, the nine planets of our solar system. Other qtargets can be much more vague: S-NP considers all noun phrases, leaving much of the answer-pinpointing work to subsequent modules of the question-answer matcher.

    There are several different types of qtargets:

    Back to main Webclopdia demo page



    Detailed listing of qtargets

    Abstract qtargets

    Semantic qtargets

    Semantic qtargets limit the search space to sentence constituents that satisfy a particular semantic class with respect to the Webclopdia ontology. Semantic qtargets include:

    Syntactic qtargets

    Syntactic qtargets are fairly weak, that is they generally don't restrict the search space much. However they still enforce the answer to be a constituent in a parse tree. Webclopedia uses S-NP as the default qtarget.

    Role qtargets

    Roles refer to the role of a constituent within a phrase. For example, in the parse tree
    [1] The tournament was cancelled due to bad weather.  [S-SNT]
        (SUBJ LOG-OBJ) [2] The tournament  [S-NP]
        (PRED) [5] was cancelled  [S-VERB]
        (REASON) [6] due to bad weather  [S-PP]
        (DUMMY) [14] .  [D-PERIOD]
    
    the phrase "due to the bad weather" would satisfy the qtarget ROLE REASON. The constraint is independent from the syntactic category, which also could have been a subordinate clause (because the weather was so bad) or a verb phrase (to avoid injuries).

    Slot qtarget

    Slots can access any slot associated with a phrase. Slots can be filled during parsing or some post-parsing processing, and then be used for qtarget matching.

    Lexical qtargets

    Lexical qtargets are used when the answer is already available from some external knowledge, and all the system still has to do is look for text supporting that answer:

    LEX and SURF differ in that for LEX, only the lexical head has to match, whereas for SURF the strings have to match exactly. That is, "LEX Washington" matches not only "Washington", but also "WASHINGTON" and "Washington, DC".

    Note: in this specific example, the principal "external knowledge source" is WordNet, which lists "Berlin" and "German capital" as synonyms.

    Predicative qtargets

    Predicative qtargets take the name of a predicate as an argument. A constituent is considered an answer candidate if the predicate holds for that constituent. The predicate airport-code-p for example checks whether the surface string of a constituent consists of three capital letters, e.g. LAX.

    Wouldn't it be better to have a semantic class for airport codes, chemical formulas, etc.? The problem here is the difficulty to identify such categories with high accuracy in a parse tree. It is not trivial to decide in the general case whether EPA might be an airport code or the abbreviation of something else. However, the system can greatly benefit if the module in the matcher that proposes an initial set of answer candidates can narrow the space of candidates from any noun phrase to words consisting of three capital letters. So the predicative qtarget is in some way a compromise between a strong semantic qtarget that a parser/preprocessor might not be able to support and a weak syntactic qtarget that fails to exploit our knowledge about how certain answers are expected to look like.

    Combinations of qtargets

    For many questions, the qtarget is a combination of simple qtargets.

  • Question: Where is the Getty Museum?
  • Qtarget: ((I-EN-PROPER-CITY 1.0) (C-AT-LOCATION 0.7) (I-EN-PROPER-PLACE 0.7) (I-EN-PLACE 0.5))

    In the question above, Webclopedia prefers a proper city (score factor 1.0), but also considers other location expressions, proper places and general places.

  • Question: What is the capital of Ontario?
  • Qtarget: ((I-EN-PROPER-CITY 1.0) (EQ I-EN-PROPER-PLACE 0.8))

    In the example above, the system prefers a proper city as well, but given that the named entity tagger might identify Toronto only as a general location, the system also allows I-EN-PROPER-PLACE, although with lower preference - just to be on the safe side.

    Note: The EQ option means that sub-classes of I-EN-PROPER-PLACE can *NOT* be considered. For example, if the system identifies "Canada" as a I-EN-PROPER-COUNTRY, "Canada" will properly be disqualified as a potential answer to our question.


    Back to main Webclopdia demo page

    Written by Ulf Hermjakob
    Last updated: November 7, 2002