BE: Basic Elements for Automated Evaluation of Summaries

Implementation for download: BEwT-E: Basic Elements with Transformations for Evaluation

Creators(s): Eduard Hovy, Chin-Yew Lin, Stephen Tratz, Liang Zhou (all from University of Southern California Information Sciences Institute), Junichi Fukumoto (Ritsumeikan University; visiting USC/ISI)

Contact:  hovy@isi.edu

The Basic Elements (BE) method was developed to automatically evaluate text summaries.  This page describes the general BE model.  A current implementation of the model, BEwT-E (Basic Elements with Transformations for Evaluation) can be downloaded here (see below).

Background

It has long been a goal of the summarization community to find automatic methods of summary evaluation that produce reliable and stable scores.  Generally, summaries are evaluated along two dimensions: for content and for style (readability).

Basic Elements address only the problem of assessing the content of a summary.

All automated content assessment methods today work by comparing the input summary to one of more reference summaries (ideally, produced by humans).  But experience has shown that measuring summary content at the sentence level is not precise enough: generally sentences contain too many bits of information, some of which may be important to include in a summary while others may not be.

There have been two kinds of response to this problem: the word-sized and the chunk-sized.  In ROUGE (Lin at USC/ISI) and similar systems, the approach was to measure the overlap of each word (or small ngram) with the reference summaries.  The problem here is that multi-word units (such as "United States of America") are not treated as single items, thereby skewing the scoring, and that relatively unimportant words (such as "from") count the same as relatively more important ones.  Simple efforts to circumvent these problems remain unsatisfactory and crude.  Nonetheless, this approach can be automated and can produce evaluation rankings that correlate reasonably with human rankings, as demonstrated in the ROUGE publications.

The other response was to extract longer chunks, namely the strings of contiguous words that express valuable material, from one or more of the reference summaries, and to treat these chunks as pieces of ideal content.  Each chunk, regardless of length, is treated as a semantic unit, that is, a unit that expresses one core notion.  Each unit is assigned an importance rating depending on how many reference summaries contain it.  In recent research, Van Halteren and Teufel in Europe and Nenkova and Passonneau at Columbia University in New York have independently investigated this type of approach.  Since an element that is included in many reference summaries is obviously more important than one that is included in only a few, this method provides a natural way of scoring each element.  The latter two researchers create a 'pyramid' of elements, with the most-frequently-included ones at the top, the next-most one layer down, etc.  Evaluating a new summary then becomes a process of comparing its contents to the elements in the pyramid and adding the appropriate score for each one matched.   A higher score means the new summary overlapped with more of the reference summary contents and is hence assumed to be a better summary.  Preliminary studies show this approach to correlate well with human intuition.  The trouble is that creating these chunks is difficult to automate, since they can be of arbitrary size and must incorporate quite different ways of saying the same thing (reference summaries typically say the same thing, or parts of the same thing, in different ways).

Basic Elements

Basic Elements (BEs) were designed to address both problems by using variable-sized, syntactically coherent, units.  We start with small units, because starting small allows one to automate the process of unit identification and, to some degree, facilitates the matching different equivalent expressions.  Grouping smaller units into larger ones can be done automatically, and eventually, we believe, to the larger-sized chunks used in the Pyramid Method.

In this approach, we break down each reference sentence into minimal semantic units, which we call Basic Elements.  After some experimentation, we define BEs as follows:

A Basic Element is one of 1) the head noun of a major syntactic constituent (noun phrase or verb phrase).  In the current implementation, this includes: a noun (sequence) or a verb; 2) a relation (includes prepositions) between a head-BE and a single dependent 

As described below, one can produce BEs in several ways.  Most of them involve a syntactic parser to produce a parse tree and a set of 'cutting rules' to extract just the valid BEs from the tree.

With units of minimal length, one can much more easily decide whether any two units match (express the same meaning) or not.  For instance, "United Nations", "UN",and "UNO" can bematched at this level, and any larger unit encompassing this one can accept any of the three variants.  And since the units are matched at the lowest levels, the danger of potentially double-counting segments that are contained in longer ones can also be avoided.

To match non-identical units that carry the same meaning, we apply rules to transform each unit into a number of different variants.  The software downloadable here, BEwT-E (BEs with Transformations for Evaluation; pronounced "beauty"), is a package that automatically creates BEs for a text, applies transformation rules to expand BE units into numerous variants, and performs matching of these units against a list of units produced by BEwT-E from another text.

In order to implement Basic Elements as a method of evaluating summary content, four core questions must be addressed:

1.      What or how large is a Basic Element? The answer to this is strongly conditioned by: How can BE units be created automatically?

2.      What score should each BE unit have?

3.      When do two BE units match?  What kinds of matches should be implemented, and how?

4.      How should an overall summary score be derived from the individual matched BE units' scores?

Different answers to each of these questions provide a different summary evaluation method.  The Pyramid Method, for example, takes as BEs maximal-length semantic units shared by the reference summaries; gives each unit a score equal to the number of reference summaries containing it; allows two units to match when they express all or most of the same semantic content, as judged by the (human) assessors; and derives the overall score by simply summing the scores of each unit of the candidate summary.  In contrast, ROUGE uses as BEs various ngrams (for example, unigrams); scores each unigram by a function that depends on the number of reference summaries containing that unigram; allows unigrams to match under various parameterizable conditions (for example, exact match only, or root form match); and derives the overall summary score by some weighted combination function of unigram matches.

There are multiple possible approaches to implementing in software each of these four points; exploring the whole space in order to find the most stable and optimum evaluation configuration is obviously not a trivial task.  The current implementation of BEwT-E uses variable-length syntactically coherent units, gives each one the same score, matches units that are derivable from one another through the transformations, and weights and adds the match scores in various ways. 

BEs and ROUGE

ROUGE is a software package for automated summary evaluation that matches input summary to references summaries using a variety of fixed-length word ngrams.  ROUGE was built at USC/ISI by Lin and Hovy.  Note that ROUGE itself is also an instance of the BE framework, in which the BEs are unigrams (or ngrams of various types, depending on the parameter choice), the scoring function is simple unit points, and the simplest matching criterion is lexical identity.

BE and BEwT-E Software Packages

We created and distributed the Basic Element (BE) Package in 2005.  This package was a framework in which one can insert and/or vary modules that perform each of these four functions.  The BE Package provided several parameterized modules as well as APIs for people wishing to build and test their own.  Used as provided, the BE Package provided several implementations of the ideas of Van Halteren, Teufel, Nenkova, and Passonneau.  We performed a series of experiments to obtain reasonably good modules and parameter settings, but welcome additional studies and improvements.

We have created a new package BEwT-E (Basic Elements with Transformations for Evaluation) that includes transformations.  A current implementation of BEwT-E can be downloaded after filling in the form below and accepting the licensing terms.

Please direct all inquiries here.

BE References

Tratz, S. and E.H. Hovy. 2008. Summarization Evaluation Using Transformed Basic Elements. Proceedings of Text Analytics Conference (TAC-08).  NIST, Gaithersburg, MD.

Zhou, L. N. Kwon, and E.H. Hovy. 2007. A Semi-Automated Evaluation Scheme: Automated Nuggetization for Manual Annotation. Proceedings of the Human Language Technology / North American Association of Computational Linguistics conference (HLT-NAACL 2007). Rochester, NY.

Zhou, L. and E.H. Hovy. 2007. A Semi-Automatic Evaluation Scheme. Proceedings of the DARPA GALE PI workshop. San Francisco, CA.

Hovy, E.H., C.-Y. Lin, L. Zhou, and J. Fukumoto. 2006.  Automated Summarization Evaluation with Basic Elements. Unpublished ms.

 

LICENSE AGREEMENT

This License Agreement (the "Agreement") is entered, effective this date, by and between University of Southern California, and the individual executing this Agreement below as "Licensee" (hereinafter, the "Licensee").

 

WHEREAS, USC has developed the BE Package and related documentation (the "Software"); and

 

WHEREAS, Licensee desires, and USC is willing to grant to Licensee, a license to use the Software in accordance with this Agreement;

 

NOW, in consideration of the foregoing, the mutual covenants hereinafter set forth, and for other good and valuable consideration, the receipt and sufficiency of which is hereby acknowledged, the parties agree as follows:

 

1. USC hereby grants Licensee a royalty-free, non-exclusive, non-transferable right to use the Software as follows solely for a Non-Commercial Purpose:

 

(a) Licensee may prepare derivative works (the "Derivative Works") which are based on or incorporate all or part of the Software, including, without limitation, works (the "Adaptations") which

 

(i) are translations of all or part of the Software into different programming languages, or

 

(ii) are revisions, improvements or corrections to all or part of the Software, provided that, Licensee shall treat all Derivative Works as Software under this Agreement; and

 

(b) Licensee may make only such copies of the Software as are necessary for Licensee's development of the Derivative Works.

 

2. All copies of the Software and Derivative Works prepared in accordance with paragraph 1 shall retain the copyright notice appearing in the Software. If the Software includes computer programs in object code form, Licensee shall not de-compile, reverse engineer or disassemble such programs.

 

3. As used in this Agreement, "Non-Commercial Purpose" means use of the Software and Derivative Works solely for education or research. "Non-Commercial Purpose" excludes, without limitation, any use of the Software or Derivative Works for, as part of, or in any way in connection with a product (including software) or service which is sold, offered for sale, licensed, leased, loaned or rented.

 

4. Licensee hereby grants USC a non-exclusive, royalty-free, fully paid-up, worldwide, perpetual license to:

 

(a) Reproduce, prepare derivative works based on and distribute all or part of the Adaptations; and

 

 

 

(b) Make, have made, use, offer to sell, sell, license or import any products (including software) or services under any intellectual property rights owned or licensed by Licensee which relate to

 

(i) all or part of the Adaptations (including as executed by a CPU), or

 

(ii) methods or concepts embodied in, or implemented through the execution by a CPU of, the Adaptations. If the Software includes documentation that identifies a contact person at USC, Licensee shall provide such person with feedback concerning Licensee's Adaptations and, if requested by USC, provide such person with source code copies of Licensee's Adaptations.

 

5. This Agreement is personal between USC and Licensee. No ownership interest in the Software (or the copy of which is provided by USC pursuant to paragraph 1) is transferred to Licensee. Licensee's interest in the Derivative Works is limited solely to Licensee's additions and the Derivative Works are subject in their entirety to USC's intellectual property rights. USC may assign or transfer to any company or person, or grant to any company or person a license or sublicense under, all or part of its interest in any rights to the Software, this Agreement, or any license granted to USC hereunder. Licensee may not assign, transfer or sublicense Licensee's rights hereunder without the written consent of USC.

6. USC may terminate this Agreement at any time by sending written notice of termination to Licensee at the address specified below. Termination shall be effective as provided in the notice. Unless the notice shall provide otherwise, upon termination, Licensee shall destroy all copies of the Software and Derivative Works. Licensee's obligations under this Agreement, including any rights granted to USC pursuant to paragraph 5, shall survive and continue after termination.

7. Licensee shall not, directly or indirectly, export the Software or any Derivative Work to any country to which such export is prohibited by law.

8. Licensee agrees to comply with all export laws and restrictions and regulations of the United States or foreign agencies or authorities, and not to export or re-export the Software or any direct product thereof in violation of any such restrictions, laws or regulations, or without all necessary approvals. Neither the Software nor the underlying information or technology may be downloaded or otherwise exported or re-exported

(i) into Cuba, Iran, Iraq, Libya, North Korea, Sudan, Syria, Serbia, Taliban-controlled portions of Afghanistan or any other country subject to U.S. trade sanctions covering the Software, to individuals or entities controlled by such countries, or to nationals or residents of such countries other than nationals who are citizens or lawfully admitted permanent residents of the United States and not currently domiciled in countries subject to such sanctions; or

(ii) to anyone on the U.S. Treasury Department's list of Specially Designated Nationals and Blocked Persons or the U.S. Commerce Department's Table of Denial Orders. By downloading or using the Software, Licensee agrees to the foregoing and represents and warrants that it complies with these conditions.

9. USC has no obligation to support or maintain the Software and grants Licensee this right to use the Software "AS IS". LICENSEE ASSUMES TOTAL RESPONSIBILITY AND RISK FOR LICENSEE'S USE OF THE SOFTWARE. USC DOES NOT MAKE, AND EXPRESSLY DISCLAIMS, ANY EXPRESS OR IMPLIED WARRANTIES, REPRESENTATIONS OR ENDORSEMENTS OF ANY KIND WHATSOEVER, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, AND THE WARRANTIES OF TITLE OR NON-INFRINGEMENT. IN NO EVENT SHALL USC BE LIABLE FOR

(a) ANY INCIDENTAL, CONSEQUENTIAL, OR INDIRECT DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, LOSS OF PROGRAMS OR INFORMATION, AND THE LIKE) ARISING OUT OF THE USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF USC OR ANY OF ITS AUTHORIZED REPRESENTATIVES HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES,

(b) ANY CLAIM ATTRIBUTABLE TO ERRORS, OMISSIONS, OR OTHER INACCURACIES IN THE SOFTWARE, OR

(c) ANY CLAIM BY ANY THIRD PARTY.

10. This Agreement shall be governed by and construed in accordance with the laws of the State of California, USA, applicable to agreements made and to be performed wholly therein without regard to its conflicts of law rules. Any cause of action or claim Licensee may have with respect to the Software must be brought within one (1) year after the claim or cause of action arises or such claim or cause of action is barred. USC's failure to insist upon or enforce strict performance of any provision of this Agreement is not a waiver of any provision or right.

11. If a dispute arises out of, or relates to, this Agreement or the subject matter of this Agreement, either party may submit the dispute to a sole mediator selected by the parties or, at any time prior to selection of a sole mediator, to mediation by the American Arbitration Association ("AAA"). If not thus resolved, it shall be referred to a sole arbitrator selected by the parties or to the AAA for arbitration. The arbitration shall be governed by the United States Arbitration Act, shall be conducted in the County of Los Angeles, California, USA, and judgment on the award may be entered by any court having jurisdiction. The arbitrator shall not limit, expand or modify the terms of the Agreement nor award damages in excess of compensatory damages, and each party waives any claim to excess damages. A request by a party to a court for interim protection shall not affect either party's obligation hereunder to mediate and arbitrate. Each party shall bear its own expenses and an equal share of all cost and fees of the mediation and/or arbitration. Any arbitrator selected shall be competent in the legal and technical aspects of the subject matter of this Agreement. The content and result of mediation and/or arbitration shall be held in confidence by all participants.

12.  Limitation of Liability:  To the maximum extent permitted by law, in no event will either party be responsible for any incidental damages, consequential damages, exemplary damages of any kind, lost goodwill, lost profits, lost business and/or any indirect economic damages whatsoever regardless of whether such damages arise from claims based upon contract, negligence, tort (including strict liability or other legal theory), a breach of any warranty or term of this Agreement, and regardless of whether a party was advised or had reason to know of the possibility of incurring such damages in advance.

 

I certify that I am not a national or a resident of Cuba, Iran, Iraq, Libya, North Korea, Sudan, Syria, Serbia, Taliban-controlled portions of Afghanistan or any other country subject to U.S. trade sanctions, nor, to the best of my knowledge, have I been designated a Specially Designated National, Blocked Person, or otherwise been denied export-related privileges by the United States Government.

 

LICENSEE First Name

  

LICENSEE Last Name

  

Affiliation

  

Email

  

Street Address

  

City

  

State

  

Zip Code

  

Country