Name |
The RST Corpus |
Number documents/texts in the corpus |
385 (347 training; 38 test) |
Number words in the corpus |
176,383 |
Avg. # words/text |
458 |
Avg. # elementary discourse units/text |
57 |
High-level description of the corpus |
README |
Annotation Manual |
tagging-ref-manual.pdf |
Number documents that were double tagged |
53 (13.8%) |
Discourse units |
Clauses and smaller. |
Number of discourse units |
21789 |
Avg. # words/discourse unit |
8.1 |
Corpus samples |
Click here |
Related utilities |
Click here |
How can I obtain the corpus? |
From LDC |