Interlingua for MT at USC/ISI

Interlingua for MT at USC/ISI

(jump right to relations)

Semantic Interpretation

GAZELLE works by translating source language text into a less ambiguous conceptual representation, which is subsequently translated into the target language. We call this conceptual representation interlingua, even though it does not capture all of the meaning and subtlety in the source text. Our approach is to start with a simple interlingua design, and then extend it in practically useful ways. We seek at first to capture only the basic "who did what to who" in the source text.

There are two advantages to working with a conceptual representation. One is that we can abstract away from language-dependent encodings of things like modality, negation, etc. Another is that we can resolve ambiguities with language-independent world knowledge. This knowledge is reusable from one language to another.

The form of an interlingua sentence is a labeled directed graph. This graph can be printed in a number of ways. GAZELLE uses a format similar to PENMAN's SPL, e.g.:

        (w / |desire, want|
           :agent (P / |someone|
                     :name "John")
           :patient (A / |take in food|
                       :agent P))
The concept at the root of each nested level (in this example: |desire, want|, |someone|, and |take in food|) generally corresponds to the linguistic head word of the phrase formed at that level. Thus, main verbs are the heads of sentential clauses, and head nouns are at the root of noun phrases, etc. This might represent: "John wants to eat." Notationally, it is equivalent to the following graph:


                 ---------------o---------------
               /                |                \
     instance  |                |  agent         |  patient
               o                |                |
        |desire, want|          |        --------o--------
                                |      /                   \
                                |     |  agent              |  instance
                                |     |                     o
                                 \   /                |take in food|
                                   o
                                 /   \
                      instance  |     |  name
                                o     o
                         |someone|  "John"

Notice that the "/" in SPL-notation means "instance," as in "W is an instance of the concept or class |desire, want|".

An interlingua can also be read as a conjunction of assertions about the world, e.g.:

  1. W is an instance of wanting.
  2. P is an instance of a person.
  3. The name of P is "John".
  4. A is an instance of eating.
  5. The agent of A (the "eater") is P.
  6. The agent of W (the "wanter") is also P.
  7. The patient of W (the "wantee") is A.

In logic, we might write:

  EwEpEa: instance (w, WANTING) ^ instance (a, EATING) ^
          instance (p, PERSON) ^ name (p, JOHN) ^
          agent (w, p) ^ agent (a, p) ^ patient (w, a)

A final representation used in the semantic interpreter is the feature structure, e.g.:

  ((instance |desire, want|) 
   (agent ((instance |someone|)
           (name "John"))) 
   (patient ((instance |take in food)
             (agent ((instance |someone|)
                     (name "John"))))))
So much for format. More interesting are (1) what concepts and relations we allow in our interlingua sentences, and (2) how we produce candidate interlinguas from natural language, (3) how we can tell which candidate interlinguas are reasonable and which are nonsensical, and (4) how we turn interlinguas back into natural language. We deal with (1) in this section.

Our concepts and relations make up the SENSUS knowledge base. The SENSUS conceptual inventory is midway between a handful-of-primitives scheme and a one-concept-per-word scheme. We have 70,000 primitive concepts and a complex mapping between words and concepts. A concept like |desire, want| might be expressed by a noun in one language, a verb in another, etc. Our 70,000 concepts are hierarchically arranged so that statements concerning one concept may imply statements concerning another.

The concepts are drawn from sources like WordNet. See (Knight & Luk, AAAI-94) for an overview. These sources are based on English word meanings, so that there is no single concept for "to clean rice" (a common verb in Japanese). However, there are plenty of concepts for representing such a meaning, e.g.:

        (c / |remove unwanted substances from|
           :patient (R / |rice|))
Our interlingua is loose in allowing English strings instead of concepts. For example,
        (s / "see"
           :patient (m / "mouse"))
can represent "The mouse is seen." However, concepts are preferred, as this gives us flexibility in ranking interlingua candidates and generating natural language.

There are several tools for browsing the conceptual inventory. One is:

  % /nfs/bach/trans/bin/wn dog -hypen -norvig
This will display concepts referred to by the noun "dog", with their superclasses. -hypev works for verbs. -synsa works for adjectives. Another tool is available via the World Wide Web, at http://mozart:8003/sensus/sensus_frame.html.

Next come relations: We currently use about 40 relations between entities in our interlingua sentences. All of these are produced and consumed by GAZELLE modules. From an ideal point of view, these relations are few in number, perhaps overloaded, and close to the syntax of languages like Japanese and English -- but again, we want to start simple and work up. We can group the relations into subsets:

CASE
:AGENT, :PATIENT, :SOURCE, :VIA, :DIFF-QUANT, :DESTINATION, :SPATIAL-LOCATING, :TEMPORAL-LOCATING, :ACCOMPANIER, :SANS, :ROLE-OF-AGENT, :ROLE-OF-PATIENT
LOGICAL
:POLARITY, :DEFINITENESS, :OP1, :OP2, :OP3, :OP4, :DOMAIN, :RANGE, :QUOTED, :MOOD, :INSTANCE
FEATURE
:GPI, :MOD, :MOD1, :MOD-1, :CONCAT, :NAME, :RESTATEMENT, :QUANT, :PRO, :ANCHOR, :COMPARED-TO, :TEXT-CLASS, :TEXT-ELEMENT
EVENT
:PURPOSE, :REASON, :CONDITION, :CONSEQUENCE, :MANNER, :MEANS, :CLAUSE, :SUBORDINATE
MISC
:TOPIC
INVERSE
:AGENT-OF, :PATIENT-OF, :DOMAIN-OF, :GENERALIZED-POSSESSION, etc.
SYNTACTIC
:SUBJECT, :OBJECT, :DATIVE, :BY-OBJECT, :PRED, :COMPL, :COMPL-THAT-S, :ADJUNCT, :S-ADJUNCT, :V-ADJUNCT, :N-ADJUNCT, :O-ADJUNCT, :AMOD, :CLAUSE, :ANCHOR, :RESTATEMENT, :ARTICLE,

Many of these relations have aliases, e.g., you may refer to :AGENT with :SAYER or :SENSER. Currently, :RECIPIENT is an alias of :DESTINATION.

Aliases and other links:
a, against, age, article, can, :CREATED-ENTITY, during, exist, :GENERALIZED-POSSESSION-INVERSE, :GOAL, :INCLUSIVE, :INSTRUMENT, (to be) likely (to), may, might, must, near, need (to), :NN-MOD, not, :PHENOMENON, :PREFIX, :Q-MOD, question, :RECIPIENT, :ROLE, :SAYER, :SAYING, :SENSER, :THEME, that, the, this, wh-question, yn-question
Here is an alphabetical list of relations with sample natural language fragments they can represent:


:ACCOMPANIER (= :INCLUSIVE)

        (g / go
          :agent (b / boy)
          :accompanier (d / dog))
"The boy went with the dog."
(e / eat
   :AGENT (s / person
               :PRO she)
   :PATIENT (p / pasta
               :ACCOMPANIER (m / meatballs))
   :MEANS (f / fork)
   :ACCOMPANIER (g / gentleman
                   :GENERAL-POSSESSION (s / sunglasses)))
"She ate pasta with meatballs with a fork with the gentleman with sunglasses."


:ADJUNCT

(superset of :S-ADJUNCT, :V-ADJUNCT, :N-ADJUNCT, :O-ADJUNCT, and :AMOD. The role :AMOD is a special case of :N-ADJUNCT, for pre-modifiers only)

 
       (g / go
          :agent (b / boy)
          :s-adjunct (q / quickly))
"Quickly, the boy went."
 
       (g / go
          :agent (b / boy)
          :s-adjunct (q / into
                        :anchor (h / house)))
"Into the house, the boy went."
 
       (g / go
          :agent (b / boy)
          :v-adjunct (q / quickly))
"The boy went quickly."
 
       (g / go
          :agent (b / boy)
          :v-adjunct (q / into
                        :anchor (h / house)))
"The boy went into the house."
 
       (b / boy
          :n-adjunct (g / big))           (or :amod)
"the big boy"
 
       (b / boy
          :n-adjunct (g / in
                        :anchor (c / corner)))
"the boy in the corner"
 
       (b / red
          :o-adjunct (v / very)
          :o-adjunct (b / bright)
"very bright red"


:AGENT (= :SAYER, :SENSER)

 
       (g / go
          :agent (b / boy))
"The boy went."


:AGENT-OF

        (b / boy
          :agent-of (s / sing))
"The boy who sang"
        
(c / citizen
   :agent-of (o / oppose
		:patient (t / tobacco))
   :mood fragment)
"Citizens Against Tobacco"


:ANCHOR

        (l / live
          :agent (b / boy)
          :spatial-locating (n / near
                              :anchor (d / dock)))
"The boy lives near the dock."
        (l / leave
          :agent (b / boy)
          :temporal-locating (d / during
				:anchor (a / attack)))
"The boy left during the attack"


:BY-OBJECT

        (e / eat
          :voice passive
          :subject (b / bug))
          :by-object (b / boy)
"Bugs were eaten by the boy."


:CLAUSE

       (s / stay
          :agent (g / girl)
          :subordinate (n / "even if"
                         :clause (g2 / go
                                   :agent (b / boy))))
"Even if the boy goes, the girl will stay."


:COMPARED-TO

        (l / like
          :agent (b / boy)
          :patient (a / apple)
          :compared-to (o / orange))
"The boy likes apples compared to oranges." Note: this relation is still preliminary and being worked on.


:COMPL ( :COMPL-THAT-S is a subset of :COMPL)

        (s / say
          :subject (b / boy)
          :compl (e / eat                (or :COMPL-THAT-S)
                     :patient (b / bug)))
"The boy said that bugs were eaten."
       (s / wait
          :subject (g / girl)
          :compl (n / go
                    :agent (b / boy))))
"The girl waited for the boy to go."


:CONCAT

        (n / nicole
          :concat (s / simpson))
"Nicole Simpson"


:CONDITION

(c / cancel
   :PATIENT (t / trip)
   :CONDITION (w / weather
                 :MOD (b / bad)))    
"The trip will be canceled in case of bad weather."


:CONSEQUENCE

(f / fall
   :AGENT (d / DowJones)
   :DIFF-QUANT (p / point
                  :QUANT 5)
   :CONSEQUENCE (c / close
                   :AGENT d
                   :DESTINATION (po / point
                                    :QUANT 7772)))
"The Dow-Jones fell 5 points to close at 7772 points."


:DEFINITENESS (= :ARTICLE)

Values: indefinite, definite, none, demonstrative_this, demonstrative_that, demonstrative_that2, interrogative

(t / teacher
   :DEFINITENESS indefinite)
"a teacher"
(t / teacher
   :DEFINITENESS definite)
"the teacher"
(t / teacher
   :DEFINITENESS demonstrative_this)
"this teacher"
(t / teacher
   :DEFINITENESS demonstrative_that)
"that teacher"
(t / teacher
   :DEFINITENESS demonstrative_that2)
"ano sensei" (in Japanese)
Note: demonstrative_that2 is yet more removed than demonstrative_that.
(t / teacher
   :DEFINITENESS interrogative)
"which teacher"
(p / person
   :DEFINITENESS interrogative)
"who"
(c / cross
   :AGENT (ch / chicken)
   :PATIENT (r / road)
   :TEMPORAL-LOCATING (t / time<period
                         :DEFINTENESS interrogative)
   :SPATIAL-LOCATING (l / location<spatiality
                        :DEFINTENESS interrogative)
   :PATIENT-OF (p / permit
                  :AGENT (pe / person
                             :DEFINTENESS interrogative))
   :MOOD wh-question)
"When, where, and with whose permission did the chicken cross the road?"


:DESTINATION (= :RECIPIENT)

        (g / go
          :agent (b / boy)
          :destination (j / japan))
"The boy went to Japan."
        (t / tell
          :patient (s / story)
          :destination (b / boy))
"The story was told to the boy."


:DIFF-QUANT

(i / increase
   :AGENT (l / landlord)
   :PATIENT (r / rent)
   :DIFF-QUANT (d / dollar
                  :QUANT 200)
   :DESTINATION (do / dollar
                    :QUANT 1100))
"The landlord increased the rent by $200 to $1100."


:DOMAIN

        (b / blue
          :domain (c / car))
"The car is blue."
        (l / lawyer
          :domain (m / man))
"The man is a lawyer."
        (p / |possible>workable|
          :domain (e / eat
                    :patient (w / worm)))
"Worms can be eaten."


:DOMAIN-OF

        (c / car
          :domain-of (b / blue))
"The blue car"
        (m / man
          :domain-of (l / lawyer))
"The man who is a lawyer"
        (l / lawyer
	   :DOMAIN-OF (a / age
			 :RANGE (y / year
                                   :QUANT 34)))
"a 35 year old lawyer"


:GENERALIZED-POSSESSION

        (b / boy
          :GENERALIZED-POSSESSION (n / nose
                                      :agent-of (bl / bleed)))
"the boy whose nose was bleeding"


:GPI (= :GENERALIZED-POSSESSION-INVERSE)

        (h / handle
          :gpi (d / door))
"The handle of the door"
        (o / officer
          :gpi (n / navy))
"Naval officials"
        (h / hair
          :gpi (s / |someone|
                 :pro i))
"My hair"


:DATIVE

        (g / give
           :subject (j / "John")
           :object (b / book)
           :dative (m / "Mary"))
"John gave the book to Mary."


:INSTANCE

        (d / dog)
"A dog"


:MANNER

        (b / build
          :patient (t / tunnel)
          :manner (b2 / excavate
                    :patient (m / mine)))
"The tunnel was built by excavating the mine"


:MEANS (= :INSTRUMENT)

        (r / reach
          :patient (h / house)
          :means (c / car))
"The house is reached by car."


:MOD (= :NN-MOD, :Q-MOD, :PREFIX)

        (c / car
          :mod (s / shift
                 :mod (s2 / stick)))
"Stick shift car"


:MOD-1

        (m / machine
          :mod-1 (x / xerox)
          :mod (b / big))
"Big xerox machine"

Note: MOD-1 binds more closely to the instance it modifies than MOD.


:MOD1

        (m / machine
          :mod (x / xerox)
          :mod1 (b / big))
"Big xerox machine"

Note: MOD1 binds farther from the instance it modifies than MOD.


:MOOD

        (e / eat
          :patient (b / bug)
          :mood imperative)
"Eat bugs."
(r / |rain down|
   :mood yn-question)
"Does it rain?"

Values of mood can be statement (the default), imperative, yn-question, wh-question, and fragment, which directs the generator to output only words and phrases, like "small", "researcher", "a new computer", "to rain", "to reach an agreement" etc., as one would for example find them in a dictionary.


:NAME

        (s / sing
          :agent (p / |someone|
                     :name "John"))
"John sang."


:OBJECT

                  (e / eat
                     :subject (b / boy)
                     :object (b / bug)))
"The boy eats bugs."


:OP1
:OP2
:OP3
:OP4 ...

        (a / and
          :op1 (s / sing
                 :agent  (b / boy))
          :op2 (d / dance
                 :agent  (g / girl)))
"The boys sang and the girls danced."

Note: A system with multiple OPn has been chosen over a list of "unnumbered" OP rules to (1) clearly indicate and preserve order during generation and (2) to keep unification operations easy for the mapper.


:PATIENT (= :PHENOMENON, :SAYING, :GOAL, :CREATED-ENTITY)

        (e / eat
          :patient (b / bug))
"Bugs were eaten."
        (s / say
          :agent (b / boy)
          :patient (e / eat
                     :patient (b / bug)))
"The boy said that bugs were eaten."


:PATIENT-OF

        (b / bug
          :patient-of (e / eat
                        :agent (b / boy)))
"Bugs that the boy ate"


:POLARITY

        (g / go
          :agent (b / boy)
          :polarity -)
"The boy did not go."
        (n / |necessary<inevitable|
          :domain (g / go
                    :polarity -
                    :agent (b / boy)))
"The boy must not go."
        (n / |necessary<inevitable|
          :polarity -
          :domain (g / go
                    :agent (b / boy)))
"The boy need not go."


:PRED

        (w / seem
          :subject (p / she)
          :pred (b / blue))
"She seems blue."
        (w / |has the quality of being|
          :subject (p / man)
          :pred (b / lawyer))
"The man is a lawyer."


:PRO

        (w / win
          :agent (p / |someone|
                   :pro i))
"I won."


:PURPOSE

        (s / shout
          :agent (b / boy)
          :purpose (g / get
                     :patient (h / help)))
"The boy shouted to get help."


:QUANT

        (t / toy
          :quant 5)
"5 toys"
        (b / beyond
          :anchor (s / school)
          :quant (bl / block
		     :quant 5))
"5 blocks past the school"


:QUOTED

        (s / say
          :agent (b / boy)
          :patient (r / rain
                     :quoted +))
"The boy said, 'It rained.'"


:RANGE

        (b / become
          :domain (b2 / boy)
          :range (f / fish))
"The boy became a fish."


:REASON

        (g / go
          :agent (b / boy)
          :reason (h / hurricane))
"The boy went because of the hurricane."


:RESTATEMENT

        (a / abdicate
          :agent (p / |someone|
                   :name "Nicholas"
                   :restatement (c / czar
                                  :gpi (r / russia))))
"Nicholas, Czar of Russia, abdicated."


:ROLE-OF-AGENT

        (s / speak
          :agent (m / man)
          :role-of-agent (p / president))
"The man spoke as president."
(= :ROLE)


:ROLE-OF-PATIENT

        (u / use
          :patient (b / barn)
          :role-of-patient (g / garage))
"The barn was used as a garage."


:SANS

        (c / car
          :sans (w / wheel))
"a car without wheels"


:SOURCE

        (c / come
          :agent (b / boy)
          :source (b2 / beyond
                    :anchor (g / galaxy)))
"The boy came from beyond the galaxy."


:SPATIAL-LOCATING

        (r / rain
	  :polarity -
          :spatial-locating (m / mars))
"It does not rain on Mars."


:SUBJECT

 
       (g / go
          :SUBJECT (b / boy))
"The boy went."


:SUBORDINATE

        (s / stay
          :agent (g / girl)
          :subordinate (n / "even if"
                         :clause (g2 / go
                                   :agent (b / boy))))
"Even if the boy goes, the girl will stay."


:TEMPORAL-LOCATING

        (s / snow
          :temporal-locating (n / november))
"It snowed in November."


:TEXT-CLASS

        (a / announce
          :text-class newspaper-article
	  :agent (p / president)
	  :patient (r / reform
		      :mod1 (t / tax)
		      :mod (n / new))
          :temporal-locating (to / tomorrow)
          :patient-of (r / report
			 :agent (s / source
				   :mod (g / government))))
"According to government sources, the president will announce new tax reforms tomorrow."

This attribute appears at the top level and indicates the class of text. Other possible values include business letter and Web page, but also more detailed classes like WSJ article.


:TEXT-ELEMENT

        (b / bite
          :text-element headline
	  :agent (m / man)
          :patient (d / dog))
"Man Bites Dog"

Other values for :TEXT-ELEMENT include body (default), salutation, and address.


:TOPIC (= :THEME)

        (e / eat
          :patient (w / worm)
          :topic (b / boy))
"As for the boy, worms were eaten."


:VIA

(p / place
   :VIA-OF (e /enter
              :AGENT (s / person
                        :PRO you)))
"the place through which you enter"
(f / fly
   :AGENT (a / agent)
   :SOURCE (l1 / location<spatiality
               :NAME "Austin")
   :VIA    (l2 / location<spatiality
               :NAME "Phoenix")
   :DESTINATION (l3 / location<spatiality
                    :NAME "Los Angeles"))
"The agent flew from Austin via Phoenix to Los Angeles."

Collection of examples with things that are possible, obligatory, likely, permitted, etc.

The concepts |possible>workable|, |possible<latent|, |permitted|, |obligatory<necessary|, and |necessary>inevitable| are treated specially by the English generator to sometimes introduce modal verbs.

        
(h / |possible>workable|
   :domain (a / |eat,take in|
              :agent she
              :patient (C / |poulet|)))
"She can eat chicken."

Note: If `can' is used in the sense of 'might', please use the concept |possible<latent| as shown in the following example.


        
(h / |possible<latent|
   :domain (a / |eat,take in|
              :agent she
              :patient (C / |poulet|)))
"She might eat chicken."


        
(h / |obligatory<necessary|
   :domain (a / |eat,take in|
              :agent she
              :patient (C / |poulet|)))
"She must eat chicken."


        
(h / |necessary>inevitable|
   :domain (a / |eat,take in|
              :agent she
              :patient (C / |poulet|)))
"She needs to eat chicken."


        
(h / |permitted|
   :domain (a / |eat,take in|
              :agent she
              :patient (C / |poulet|)))
"She may eat chicken."


        
(h / |likely>apt|
   :domain (a / |eat,take in|
              :agent she
              :patient (C / |poulet|)))
"She is likely to eat chicken."


Note: Be careful when mixing modals and negation. Consider the placement of polarity in the following two cases.

        
(h / |possible>workable|
   :polarity -
   :domain (a / |eat,take in|
              :agent she
              :patient (C / |poulet|)))
"She can not eat chicken."

        
(h / |obligatory<necessary|
   :domain (a / |eat,take in|
              :polarity -
              :agent she
              :patient (C / |poulet|)))
"She must not eat chicken."


        
(h / |exist,be|
   :polarity -
   :domain (i / interlingua
              :mod (p / perfect)))
"There is no perfect interlingua."

A relation like :AGENT applies to a pair of instances (things in the world), for example x and y in:

        
        (x / |take in food|
           :agent (y / |someone|))
The special relation :INSTANCE relates an instance to a concept (a class of things in the world), for example d and |dog/canid| in:
        (d / |dog/canid|)
There are also relations between concepts. These do not usually appear in interlinguas. One important one is :ISA, as in ISA(|dog/canid|, |carnivore|). This is shorthand for "for all x, if x is an instance of |dog/canid|, then x is also an instance of |carnivore|." Another is :DISJOINT; if C and D are disjoint, then "for all x, x cannot be an instance of both C and D." You can be both a bird and a pet, but you cannot be both a bird and a sphere. Another type of knowledge takes the form of constraints on relations. If a certain relation holds between x and y, can we draw any inferences about what classes x and y belong to? Such constraints can help us rank candidate interlinguas like:
        (P / |perish|
           :temporal-locating (M / |March|))
and
        (P / |perish|
           :spatial-locating (M / |March|))
The first one is better because you are not usually spatially located within March, or any time period, for that matter. This type of knowledge is language independent and shareable. Instances and relations between them are usually created on the fly as text is processed. Concepts and relations between them are stored in the SENSUS knowledge base.

Last updated: September 17, 1997, by Ulf Hermjakob