MuST Multilingual Summarization and Translation
|
MuST (Multilingual Summarization and Translation) performs web access (or document retrieval from a set of documents), plus text summarization, plus (if you request it) translation into English, of either the retrieved texts or the summaries, or both. MuST is being built as a prototype under DARPA funding in the Natural Language Group of the Information Sciences Institute of the University of Southern California in Los Angeles. Other research performed in the Group is embodied in the systems mentioned below. The team building MuST consists of Dr. Chin-Yew Lin (research scientist and principal builder) and Dr. Eduard Hovy (PI and research scientist). Please email comments to cyl@isi.edu and hovy@isi.edu. MuST includes several subsystems:
Running MuST To run MuST, please point your browser (Microsoft Explorer or Netscape) to http://www.isi.edu/~cyl/must/must_beta.htm. Perform the following steps:
Response should be fairly quick. If you wait for more than 45 seconds, something is wrong. Please try again. Remember please that these are research systems! Your comments are very welcome--we hope that this can be of use to someone! For information, please contact: Dr. Eduard Hovy or Dr. Chin-Yew Lin Advanced Search Operators For the Indonesian news retrieval, MuST supports the following operators besides AND and OR (default): W/n proximity Searching for Words that are Near Each Other Note: You can enter query operators in upper- or lower-case. They are capitalized in the following examples only for purposes of clarity.The Proximity Operator You can use the proximity operator to search for word pairs in which the pair's second term occurs within a specified number of words after the first.Note: The proximity operator does not work across field boundaries; you cannot use it to search for a word pair in which the words occupy separate fields within a record. Syntax: word1 W/n word2where n is the number of words within which word2 must occur after word1. The variable n can be any integer greater than 0 (the actual limit is between 1,000,000,000 and 2,000,000,000). Stopwords and punctuation do not count as words in the range specified by n. The proximity operator is unidirectional from left to right. It retrieves only those records in which word2 occurs within n words after word1. Occurrences of word1 within n words after word2 are not considered hits. Example: amphibian W/5 DNA This query will retrieve records in which DNA occurs within five words after amphibian. The Adjacency Operator The adjacency operator--ADJ--is equivalent to a proximity operator with a defined range of one word (i.e., W/1). Certain punctuation marks--hyphen, apostrophe, comma, and period--function as adjacency operators when they appear in the middle of a character string; they do not function as such if immediately preceded or followed by a character not recognized by PLWeb Turbo (e.g., a space).Note: The adjacency operator does not work across field boundaries; you cannot use it to search for a word pair in which the words occupy separate fields within a record. Syntax: word1 ADJ word2 The adjacency operator is unidirectional from left to right. It retrieves only records in which word2 follows word1. Occurrences of word1 that follow word2 are not considered hits. Examples: great ADJ white The Near Operator The near operator duplicates the functions of the proximity and adjacency operators, with one exception: it is bidirectional; you can use it to search for word pairs in which the second term occurs within a specified number of words before or after the first.If you specify a word range with it, the near operator functions as a bidirectional proximity operator. If no word range is specified, it serves as a bidirectional adjacency operator. Note: The near operator does not work across field boundaries; you cannot use it to search for a word pair in which the words occupy separate fields within a record. Syntax: word1 NEAR/n word2 where n is the number of words within which word1 must occur before or after word2. The variable n can be any integer greater than 0 (the actual limit is between 1,000,000,000 and 2,000,000,000). Stopwords and punctuation do not count as words in the range specified by n. Examples: Whitewater NEAR/5 indictment The first query will retrieve records in which Whitewater occurs within five words of indictment. The second query will retrieve records in which tax occurs immediately before or after increase. [ NLG overview | Project Members | Publications | Projects | Demonstrations ] |