Learning from natural tutorial instruction











Natural instruction is inherently error prone and incomplete. Human teachers regularly omit important details, either because for a human such details might be obvious, or because they simply forgot about them. We are developing a system that is able to semantically reason about the consistency of instruction, capable of hypothesizing ways to correct errors or fill gaps in instructions, and rank such hypotheses in terms of their likelihood.

TellMe: Allowing Non-Programmers to TEACH Procedures through Natural Tutorial Instruction

We describe the main features of our approach using an example of the interaction experienced by one of our users who is a non-programmer and used our TellMe system to create a procedure.

The following gives an overview of the interaction of TellMe with a user. The utterances are verbatim what the user would type. We use scenarios where the system learns procedures executed by airplane pilots to patrol an area looking for oil pollution from ships. The user is teaching the kinds of reconnaissance tasks that pilots do in the Belgian Navy, which are scenarios that we chose for our evaluation, as we will describe later on. The system starts off having a number of primitive actions for recording the situation with a variety of instruments, including infrared and ultraviolet cameras, SLAR cameras, and digital picture and video cameras. There are also primitive actions to send alerts and reports back to the base, and to generate initial estimates of the volume of the spill.

User: "find oil spill, descend to a position of height 200"

The system shows the user that it assumes that meant to descend after finding the spill. It also shows the user an alternative interpretation where the descent was meant to happen before finding the spill. It asks the user to either accept the assumed interpretation and if not to choose the alternative.

User: "film the spill"

The system indicates it did not understand that

User: "record videos and send them"

The system shows the user that it assumes that meant to iterate over each of the videos and send each in turn, since the send action is defined for sending one document at a time.

The system shows the user that the result of sending the videos is a series of message receipts.

The system shows that it assumes the position to be after the descent, it also shows the alternative interpretation that it is the position before descending.

User: "record GPS reading"

The system shows the user that it assumes that the instruction meant to record the GPS reading over the position after descending. It also shows the user an alternative interpretation which is to record the GPS reading at the position when the spill was first found. It asks the user to either accept the assumed interpretation and if not to choose the alternative.

User: (selects the latter option)
User: "record image"

The system shows three interpretations, one for the action to record IR image, another for record UV image, and another to record SLR image

User: "send thickness image"

The system shows the user that it is not familiar with the term high level alert, but that it assumes it is a kind of alert given the context in which the term is used. The user does not have to interrupt the instruction and define it now.

The system shows the user that the send action requires some evidence as input, and that it assumes that to be the output of the record GPS reading action.

This is an example of how a user might teach a procedure where a plane is to descend closer once a spill is found, then take videos and send them to the headquarters, and to record the GPS readings and send them along as well.

We now discuss the main features of our approach and referring to the screenshot example below to illustrate how the features manifest in the user interface.

Key Feature: Exposing Prior Knowledge

A challenge that users face when teaching a system is to figure out what the system already knows or what capabilities it has. Lessons always build on prior knowledge, using it as building blocks to the instruction.

Our approach is to constrain the user's input with a command line interface that completes the user's utterance based on the objects and actions that are already known to the system [Groth and Gil 2009]. In this way the system exposes known actions and object types, which serve as building blocks to the user's expression of each instruction command.

Here is an example of how the system is exposing what it knows about properties of positions:

Key Feature: User Input as Controlled Natural Language

One important challenge that we need to address is that while natural language is a very natural way to provide tutorial instruction, interpreting unconstrained natural language is far beyond the state of the art.

Our approach is to use a paraphrase-based interpretation system that matches the user's utterance against a set of pre-defined paraphrase patterns, following the approach in [Gil and Ratnakar 2008]. Each paraphrase pattern is associated with a set of primitive commands that the user would have to use in order to have the intended effect that is described with the paraphrase pattern. The paraphrase patterns are exposed to the user through the command line interface described above.

For example, the utterance "descend to a position with altitude 200" is mapped to a paraphrase pattern component-as-verb +output-object +output-property + output-property-value. This paraphrase pattern is tied to a command that finds a component whose name matches the verb and adds it to the procedure. It further determines which of its defined outputs matches the uttered output and asserts the utter value for the uttered property of the object corresponding to that output.

When an utterance cannot be mapped to any paraphrase pattern, the system indicates so to the user and then the user has to reformulate that instruction. This is the case with the utterance "film the spill" in the above example.

Many studies have found that users bring up new terms in any domain following a Zipf's law and there are always new terms that come up [Bugmann et al 01]. When a new term appears in an utterance, TellMe will make assumptions about what it might mean. For example, when the user utters "send thickness image" and the system is not familiar with that term, it will assume that the term refers to an object (as opposed to an action or a property), and that it is a type of image since only images are sent.

The combination of the command line interface and the paraphrase-based interpretation system gives the user the illusion of entering free text while the system actually is controlling what the user can input in ways that are amenable to understanding and interpretation.

Key Feature: Share Learning State to Establish Trust

An important principle in user interface design is establishing user trust. A user needs to understand what the system is doing about the input she provided, and trust that the system is taking appropriate action. In our case, the system should give feedback to the user about what it is learning from the instruction. It must do so unintrusively, more as a nod than a detailed report, so that the user can focus on continuing with the lesson. Users need to know what the system has understood and learned so far as the lesson progresses.

Our approach is that the system always shares its internal learning state. The below screen-shot of TellMe corresponds to the interaction above. Its current procedure hypothesis is shown on the right hand side. At the bottom a dataflow diagram is shown. At the top, a set of constraints is shown, most of which were inferred by the system. In many cases, the user's instruction is ambiguous and the system creates alternative interpretations, each resulting in a different procedure hypotheses. To show that it is considering these hypotheses, it shows them in the History panel, where the user can view them. She is always asked to select one of them.

For example, in the third utterance the user specifies to "record GPS reading". The user did not say from what position to take the reading. In this case, the system generates three interpretations and shows them as options in the history window. The first interpretation is that the image should be taken at the position after the descent. But it is possible that the user meant the position before the descent, and that is presented as a second option. The third interpretation considers taking the reading from yet another position that the user may want to describe later. The user has to select one before continuing, and the top option is selected by default. As we will see next, TellMe uses heuristics to rank these options, and because it considers the first interpretation of the three to be more likely it will rank it first.

Key Feature: Deductive and Heuristic Reasoning

One important challenge is that natural instruction is often incomplete. Therefore, the system has to address those shortcomings if it is to learn the complete procedure. Our approach is to use deductive and heuristic reasoning.

Deductive reasoning is used to make assumptions about the objects and steps in the procedure in order to create constraints on objects that are underspecified. The system is effectively performing deductive reasoning to infer what is not mentioned in the instruction. All the constraints shown in the top right panel were deduced by the system, and most refer to objects that were not mentioned in the instruction. For example, the user does not mention that the input to the procedure is an area to survey to find the spill, but the system deduces that from what the instruction says. Also, the fact that taking a picture results in a new image being created is not mentioned in the instruction, but the system adds that to its knowledge.

Deductive reasoning is also used to interpret new terms that the system has never seen before. Recall that based on their role in the paraphrase patterns the system assigns a syntactic category. Through deduction, the system infers what is the type of that new entity and possibly other constraints. In our example, "thickness image" will be classified as a type of image.

The second kind of reasoning used in TellMe is heuristic. Heuristic reasoning is used to figure out how those issues could be resolved. These heuristics essentially create possible completions or corrections of the procedure hypothesis that the system created from the user instruction. TellMe shows the user options that are ranked heuristically.

Heuristic reasoning makes the instruction more natural in that the system not only has identified what issues to resolve, which would result in questions to the user. The system has gone further in taking the initiative to formulate possible answers to those questions. This makes the instruction more natural because this is something that teachers expect from human students.

Key Feature: Selective QuestionsM

An important principle in designing effective user interfaces is to take into account the cost of requesting user interventions. Because instruction is incomplete, the system may have many possible interpretations and therefore it could ask many questions to the user to determine which is the one that the user intended. Yet, later instruction may address those questions and so the user intervention was unnecessary and would not be considered natural. Although a user in teaching mode can be expected to be more willing to cooperate than in other circumstances, the system should not insist on asking questions constantly just to satisfy its learning goals to disambiguate and to complete the instruction.

Addressing this challenge is difficult, because if the system postpones all its questions then there may be a large space of possible candidate interpretations that would make learning very unmanageable.

We use eager questioning to ensure that a single procedure hypothesis can be shown to the user as a single procedure sketch. The system asks the user to select among procedure sketches when several are possible. We use lazy questioning for other matters. For example when an unknown term is used in the instructions, the system makes assumptions about it and proceeds without interrupting the user with questions. This is the case in the example interaction above when the user refers to a "thickness image" which is an unknown term.

<< Back to IKCAP