Discourse structures: trees or graphs?

Daniel Marcu, ISI/USC



Very recently, Mark Liberman has alerted me of a discourse-related entry that he has added to his “Language Log”. The entry discusses and compares two discourse annotation efforts:

·        The discourse annotation work of Marcu et al. (1999; 1999), which led to the development of an RST annotated corpus (Carlson et al., 2003), which is now distributed by the LDC.

·        The discourse annotation work of Wolf and Gibson (2003), which was also submitted for publication to the LDC.


 The fundamental distinction between the corpora created by Carlson et al. (2003) and Wolf and Gibson (2003) pertains to the constraints imposed on the discourse annotation process. The corpus of Carlson et al. (2003) adopts an RST-like annotation schema that constrains the set of allowable discourse structures to tree-like representations. The annotators who created the RST-Discourse TreeBank were forced to build tree-like representations that subsumed all the discourse units in a text. In contrast, Wolf and Gibson (2003) take a less-constrained approach: in their protocol, they encourage annotators to make explicit all coherence relations that hold between any two discourse units in a text. When they apply this annotation protocol on a large collection of texts, they observe that the discourse structures that are created in this manner look more like graphs than like trees. Because the links in the resulting graphs cross often, their results strongly suggest that trees are an inadequate representation for discourse structures. On the bases of their corpus analysis, Wolf and Gibson estimate that in order to obtain tree-like representations from the graph-like representations in their corpus, one would have to delete approx. 12% of the coherence relations identified by the annotators. Obviously, this is undesirable because, in the process, one loses important information.


At prima facie, the approach proposed by Wolf and Gibson is well motivated from an empirical linguistic perspective. On what grounds, after all, should someone declare that coherence judgments yield hierarchical structures? Instead of taking this claim for granted, it is better to devise experiments in which humans annotate all discourse relations in a text using a protocol that imposes no structural constraints on the representations one builds. Once this is done, one can estimate empirically the degree to which trees (or graphs) are adequate representations of discourse structures. The devil is though in the details because the outcome of such an experiment depends heavily on the nature of the inferences annotators are allowed to make.


Allowable vs. non-allowable inferences

When we annotate coherence relations, we read a text, derive inferences, and make explicit some of these inferences in a consistent annotation standard. The end result of our annotation process depends directly on the constraints we impose on the inferences we allow ourselves to make and the representation language we adopt for making the allowable inferences explicit. Previous work of Mann and Thompson (1988), Hobbs (1990), Marcu (2000), and Carlson et al. (2003), for example, impose strong, structural constraints on both the representations one is allowed to create; and, more subtly, the set of allowable inferences. The recent work of Wolf and Gibson (2003) puts though no (or fewer) constraints on the representations one is allowed to create and the set of allowable inferences.


Consider, for example, the following text fragment, which is used by Wolf and Gibson (2003, ex. 12. page 12) to justify the inadequacy of using trees for representing discourse:

0.      There is a train at Platform A.

1.      Its destination is Rome.

2.      There is another train at Platform B.

3.      Its destination is Zurich.

Wolf and Gibson argue that the following relations hold between the units:

1 → 0 elab

3 → 2 elab

0 ↔ 2 sim

1  ↔ 3 contr

Note though that not all these relations are created equal. The elab and sim relations can be inferred in isolation, in contexts that contain only the units that they relate. All the texts below make sense in isolation, by themselves.

Text (0;1):

  1. There is a train at Platform A.
  2. Its destination is Rome.

Text (2;3):

  1. There is another train at Platform B.
  2. Its destination is Zurich.

Text (0;2):

  1. There is a train at Platform A.
  1. There is another train at Platform B.

However, the contrast relation between units 1 and 3 cannot be interpreted in isolation. Text (1;3)  does not make sense by itself.

* Text (1;3):

     1.   Its destination is Rome.

     3.   Its destination is Zurich.

A human is capable of inferring a contrast relation between 1 and 3 only in the context of the whole text. Different contexts may yield different inferences. One can easily imagine, for example, contexts in which there could be a temporal relation between these two units (in a story about one train that goes from place to place and that stops in Rome and Zurich on the way). Given that the inference one makes in establishing the coherence relation of contrast between 1 and 3 is global, i.e., it needs context in order to be derived, I find it hard to justify the need to include it in a discourse representation of the sample text used by Wolf and Gibson. When we annotate and make explicit inferences of this sort, we let ourselves go on a slippery slope that gives no guidance and puts no constraints on the types of allowable inferences one is permitted to make. If we make explicit a contrast relation between units 1 and 3, why not also make explicit a contrast relation between units 0 and 3? After all, from 0 and 1, we can infer that the train at Platform A has Rome as destination, which contrasts with the train that has Zurich as destination. Drawing a link between units 0 and 3 is as justifiable as drawing a link between units 1 and 3.


Consider another example from Wolf and Gibson (2003, ex. 13, page 15) (source wsj_0306; LDC93T3A), with the discourse structure shown below.

  1. Farm prices in October edged up 0.7% from September
  2. as raw milk prices continued to rise,
  3. the Agriculture Department said.
  4. Milk sold to the nation’s dairy plants and dealers averaged $14.50 for each hundred pounds,
  5. up 50 percent from September and up $1.50 from October 1988,
  6. the department said.


Among the discourse relations that Wolf and Gibson annotate here, there is a similarity  relation between units 2 and 5 “since both segments 2 and 5 state the same source”. I am not at all convinced that it is a good idea to annotate the relation between 2 and 5 as a coherence relation simply because the text refers in both instances to the Agriculture Department. It appears to me that it is more adequate to interpret this as a simple instance of cohesive relation, i.e., a co-reference link. It seems to me that if we don’t do so, we potentially end up with ludicrous consequences. Just imagine a longer text about a company with many attributions spread all across the text “its CEO said”, “the CTO reported that”, “the COO declined that”, “the company said”, “the board of directors said”, etc. And many repeated instances of the same entity “the company” stating different things. Do we want similarity and elaboration relations between all the pairs of discourse segments that mention the company, its directors, and its management? Unlikely. These referents clearly play an essential role in making the text cohesive and in increasing our ability to understand it. But labeling such cohesive links as coherence relations does not seem to be too productive.


If you are unconvinced by this argument, consider the following analogy between discourse and syntactic analysis, in the context of discussing the consequences of imposing no constraints on the set of allowable inferences. Consider, for example, the sentence:


After walking hand in hand along the river, Mary and John sat on the bank playing with their bare feet in the cold water.”


Any linguist can create a dependency, phrase, or feature-based structure analysis for the sentence above, using their preferred linguistic theory/formalism. I doubt though that in any of these representations, there will be a direct link between the words river, bank, and water. Yet, from an inferential perspective, one can clearly argue that a dependency or phrase-structure analysis of the sentence above is insufficient for capturing the semantics of the sentence. To really understand this sentence, one needs to figure out the meaning/sense of the word bank. But that can be done only when one establishes a link between bank and river and bank and water. Naturally, such links will create at the syntactic level all the problems Wolf and Gibson observe at the text level: crossing links, words with multiple heads, etc. Imposing no constraints on the allowable links/inferences one should annotate at the syntactic level leads one to the same “problems” Wolf and Gibson discuss at the discourse/text level.


What do Wolf and Gibson’s experiments mean?

Let us reconsider for a moment the empirical results reported by Wolf and Gibson (2003), who found that in order to obtain tree-like representations from the graph-like representations in their corpus, one would have to delete approx. 12% of the coherence relations identified by the annotators. At the first sight, this number appears to give a lethal blow to the claim that trees are an adequate representation of discourse structures. In the light of this critique though, I find that, in fact, this number supports the claim that trees are an adequate representation of discourse structures because many of the coherence links in Wolf and Gibson’s representations should be deleted: some mirror cohesive links; others correspond to global, text-level inferential processes. I believe that the high-connected graphs annotated by Wolf and Gibson are a direct consequence of a loose, under-specified annotation protocol vis-à-vis the allowable inferences one is permitted to make when producing discourse relation annotations. If under such an under-constrained annotation protocol, one has to delete only 12% of the links in a graph in order to obtain tree-like representations, that can be interpreted as good news by those who believe that trees may be an adequate representation for discourse structures.


Having said this, I am quick to point out that I cannot agree more with Wolf and Gibson’s claim that discourse trees are a very poor representation for many text phenomena. In fact, Marcu et al. (1999) also argued that, in some instances, trees are not enough for making explicit all coherence relations in a text. Clearly, to capture the richness of all textual phenomena, we should be as inclusive as possible and conceive of text theories that effectively explain both coherence and cohesive phenomena, the flow of information, the mechanisms that enable us to go from syntax, to sentence semantics, and then on to text semantics. However, I don’t think we should rush and throw the baby with the bathwater and discard discourse trees as a useful representation simply because we annotate some cohesive links as coherence relations or exuberantly make explicit an unreasonable number of inferences in our annotations.


I believe the work of Wolf and Gibson is extremely important because it forces us to think deeply about these issues and reconsider the range of inferences we would like to operate with. The work of Wolf and Gibson also challenges our understanding of the relation between coherence and cohesion. Without work such as that of Wolf and Gibson, it is impossible to make progress in this area and increase our level of understanding of discourse-related phenomena.

Allowable vs. non-allowable inferences revisited

It often happens that it is easier to criticize than to provide solutions; and this discussion is no exception to this state of affairs. An area in which this critique fails miserably, for example, is in providing a workable definition for the set of allowable inferences that one should be permitted to make when annotating coherence relations. In the past, we proposed (Marcu, 2000)  a compositionality criterion that mildly restricts the range of allowable inferences one can make in the context of annotating RST-like structures. According to this compositionality principle, a discourse relation can hold between two large spans only if that relation also holds between the most important units in the spans. The important units are defined recursively as the units of the nuclei children. Carlson and Marcu (2001, Section 4.2) also discuss annotation strategies and additional structural criteria for restricting the range of allowable inferences.


The compositionality principle discussed above and the structural constraints proposed by Carlson and Marcu (2001) enable one to restrict the set of allowable inferences in the context of deriving discourse trees. Unfortunately, these principles are not sufficient when deriving tree structures; and inadequate when deriving graph structures. Additional constraints will need to come into play when annotators are allowed to make explicit coherence relations between any two discourse segments. Without such constraints, one can easily get into a position where any segment is linked to any segment. This problem is exacerbated by the human tendency to interpret texts as coherent even when they are not. The contributors to Costermans and Fayol’s (1997) collection of discourse studies discuss several psycholinguistic experiments that prove this point. When asked whether text fragments such as

George Bush is the president of the United States. Salmon is Michael Jackson’s favorite fish.

are coherent,  people have a strong tendency to invent explanations/scenarios as to why these sentences are juxtaposed and label the text as coherent. An unconstrained annotation protocol should clearly try to calibrate for this. An unconstrained annotation protocol should also provide an adequate means for annotating multinuclear relations, such as those corresponding to lists (“First”, “Second”, “Third”, etc.).


Jean Costermans and Michel Fayol. Processing Interclausal Relationships. Studies in the Production and Comprehension of Text, Lawrence Erlbaum Associates, 1997.

Discourse Web Page (http://www.isi.edu/~marcu/discourse).

Jerry Hobbs. Literature and Cognition. CSLI Lecture Notes Number 21, 1990.



For the interested reader, I append below the RST-like annotations of the problematic texts discussed by Wolf and Gibson (2003), in a graphical representation that mirrors that used by Wolf and Gibson.