next up previous
Next: Bibliography Up: Recognizing and Interpreting Tables Previous: Interpreting Tables

Planned Improvements

There are several limitations to the present system that we intend to improve upon in the near future.

In the recognition of tables, we plan to extend the treatment to cover separators between columns other than spaces. Many military messages, for example, use slashes or other characters, and make no attempt to align the items in a column. In newspaper articles the space between items in one column and the next is filled with a row of dots.

In general, as we extend our table interpretation methods from military messages to business news, we expect to encounter and accommodate a variety of different structures of tables. In addition, we plan to extend the treatment to cover vertical lists of items (one-column tables).

One of the principal shortcomings of the interpretation procedures is the limited use we make of the information in headers. In Table 3 we are able to recognize that the items in the third field are facilities because of the header HOME BASE, and the items in the fourth field are locations because of the header LOCATION. But we do not now treat the headings as names of relationships, which in this table they are. It is necessary to be open to the possibility of the heading being the name of a relationship, and then to be able to identify the ``principal'' column of the table so we know what entity the relationship is with. In Table 3, for example, it is necessary to know that the UNIT field is the principal one, so that the LOCATION relation is between units and the locations, rather than, say, between the home bases and the locations.

Another use of headers is to provide a third entity, frequently a time, in the relationship, as in the following table:

                         Hourly Compensation
                        For Production Workers
                   (As a percentage of U.S. costs)
                                              Jan. 31
                                1985    1986     1985
                 Germany         75      103     120
                 Japan           50       73      79
                 South Korea     11       12      12
In this table, the problem is first to recognize from the title (a kind of pretabular sentence) that the elements in the matrix are percentages, and then to discover the relation among the country, the year, and the percentage, as they fit into the description of a relation provided by the header.

A common kind of table is one in which one record lists the name of a property or relation and the second record gives the value, as in

			 Height        5'10''
			 Weight        175 lbs
			 Eyes          brown
We do not at present handle this case.

Another kind of problem involves the use of free text in fields of a table, as in the following example from the Wall Street Journal:

       A Chronology of the Stock-Trading Scandal
   May 12, 1986 -- SEC charges Dennis Levine of Drexel
Burnham Lambert Inc. with making $12.6 million since mid-1980
from insider trading. SEC also names as defendant Bernhard
Meier, Mr. Levine's broker at Bank Leu International in
Nassau, Bahamas.
   May 13, 1986 -- Mr. Levine is arrested and charged with
obstructing justice for attempting to destroy records. He is
released on a $5 million bond.
Here it is necessary to be able to process each free text item within the context provided by the date field, and to recognize the relations inherent in the structure of the table.

In conclusion, we have articulated a conception of the processing of tables, a pervasive and important phenomenon in real-world text, as an instance of a more general problem in local pragmatics and discourse structure. We have translated this conception into an implementation in a tractable manner in a specific domain. We have used this implementation with significant success in a practical application.


next up previous
Next: Bibliography Up: Recognizing and Interpreting Tables Previous: Interpreting Tables
Jerry Hobbs 2004-02-24