Annotated Corpora

The RST Discourse Treebank

Name The RST Corpus
Number documents/texts in the corpus 385 (347 training; 38 test)
Number words in the corpus 176,383
Avg. # words/text 458
Avg. # elementary discourse units/text 57
High-level description of the corpus  README
Annotation Manual  tagging-ref-manual.pdf
Number documents that were double tagged 53 (13.8%) 
Discourse units Clauses and smaller.
Number of discourse units 21789
Avg. # words/discourse unit 8.1
Corpus samples Click here
Related utilities Click here
How can I obtain the corpus? From LDC