An Annotated Corpus of Typical Durations of Events

(Annotations Integrated into TimeBank)

Suppose we read the sentence, "George W. Bush met with Vladimir Putin in Moscow." We don't know exactly how long that meeting lasted, but we do get some temporal information from the sentence.  We know the meeting lasted more than ten seconds and less than one year. As we guess narrower and narrower bounds, our chances of being correct go down. Just how accurately can we make duration judgments like this? How much agreement can we expect among people? Will it be possible to extract this kind of information from text automatically?

The goal of the project is to extract such implicit and vague typical durations of events from text automatically. This research is potentially very important in applications in which the time course of events is to be extracted from news. For example, whether two events overlap or are in sequence often depends very much on their durations.  If a war started yesterday, we can be pretty sure it is still going on today.  If a hurricane started last year, we can be sure it is over by now.

We have developed annotation guidelines to minimize discrepant judgments and annotated all the 48 non-Wall-Street-Journal (non-WSJ) news articles (2132 events), as well as 10 WSJ articles (156 events) from the TimeBank corpus; we have developed a method for measuring inter-annotator agreement when the judgments are intervals on a scale; and we have showed that machine learning techniques applied to this data yield coarse-grained event duration information, considerably outperforming a baseline and approaching human performance.

Documents

Notes

Current Annotated Data

    ABC19980108.1830.0711.tmldur.xml
    ABC19980114.1830.0611.tmldur.xml
    ABC19980120.1830.0957.tmldur.xml
    ABC19980304.1830.1636.tmldur.xml
    AP900816-0139.tmldur.xml
    APW19980213.1310.tmldur.xml
    APW19980213.1320.tmldur.xml
   
APW19980213.1380.tmldur.xml
    APW19980219.0476.tmldur.xml
    APW19980227.0468.tmldur.xml
    APW19980227.0476.tmldur.xml
    APW19980227.0489.tmldur.xml
    APW19980227.0494.tmldur.xml
    APW19980301.0720.tmldur.xml
    APW19980306.1001.tmldur.xml
    APW19980308.0201.tmldur.xml
    APW19980322.0749.tmldur.xml
    APW19980418.0210.tmldur.xml
    APW19980501.0480.tmldur.xml
    APW19980526.1320.tmldur.xml
    APW19980626.0364.tmldur.xml
    CNN19980126.1600.1104.tmldur.xml
    CNN19980213.2130.0155.tmldur.xml
    CNN19980222.1130.0084.tmldur.xml
    CNN19980223.1130.0960.tmldur.xml
    CNN19980227.2130.0067.tmldur.xml
    ea980120.1830.0071.tmldur.xml
    ea980120.1830.0456.tmldur.xml
    ed980111.1130.0089.tmldur.xml
    NYT19980206.0460.tmldur.xml
    NYT19980206.0466.tmldur.xml
    NYT19980212.0019.tmldur.xml
    NYT19980402.0453.tmldur.xml
    NYT19980424.0421.tmldur.xml
    PRI19980115.2000.0186.tmldur.xml
    PRI19980121.2000.2591.tmldur.xml
    PRI19980205.2000.1890.tmldur.xml
    PRI19980205.2000.1998.tmldur.xml
    PRI19980213.2000.0313.tmldur.xml
    PRI19980216.2000.0170.tmldur.xml
    PRI19980303.2000.2550.tmldur.xml
    PRI19980306.2000.1675.tmldur.xml
    SJMN91-06338157.tmldur.xml
    VOA19980303.1600.0917.tmldur.xml
    VOA19980303.1600.2745.tmldur.xml
    VOA19980305.1800.2603.tmldur.xml
    VOA19980331.1700.1533.tmldur.xml
    VOA19980501.1800.0355.tmldur.xml
    wsj_0006.tmldur.xml
    wsj_0026.tmldur.xml
    wsj_1025.tmldur.xml
    wsj_1031.tmldur.xml
    wsj_1035.tmldur.xml
    wsj_1038.tmldur.xml
    wsj_1039.tmldur.xml
    wsj_1040.tmldur.xml
    wsj_1042.tmldur.xml
    wsj_1073.tmldur.xml