P.O.
Box 218
Yorktown
Heights, NY 10598
USA
http://www.ou.edu/special/owp/goodies/writegood.html
2. Prepositions are not
words to end sentences with.
3. Avoid cliches like the
plague. (They're old hat.)
4. Employ the vernacular.
5. Eschew ampersands & abbreviations, etc.
6. Parenthetical remarks (however
relevant) are
unnecessary.
7. It is wrong to ever
split an infinitive.
8. Contractions aren't
necessary.
9. Foreign words and
phrases are not apropos.
10. One should never
generalize.
11. Eliminate quotations.
As Ralph Waldo Emerson once said:
"I hate quotations.
Tell me what you know."
12. Comparisons are as
bad as cliches.
13. Don't be redundant;
don't more use words than
necessary; it's highly
superfluous.
14. Profanity sucks.
15. Be more or less
specific.
16. Understatement is always
best.
17. Exaggeration is a
billion times worse than
understatement.
18. One-word sentences?
Eliminate.
19. Analogies in writing
are like feathers on a snake.
20. The passive voice is
to be avoided.
21. Go around the barn at
high noon to avoid colloquialisms.
22. Even if a mixed
metaphor sings, it should be derailed.
23. Who needs rhetorical questions?
Ÿ
MT systems are not good enough
Ÿ
Statistical MT systems tend to use more simplistic language
models that do not allow for several layers of abstraction. This can
result in less adequate coverage of linguistic rules and linguistic
generalizations.
Ÿ
Knowledge-based MT systems depend on large amounts of
hand-coded data (lexical data and syntactic rules). It is very
time-consuming to gain enough linguistic coverage.
Ÿ
MT input is not good enough
Ÿ
Bad markup
Ÿ
Incorrect punctuation
Ÿ
Incorrect spelling
Ÿ
Incorrect grammar
Ÿ
Ambiguous constructions
Ÿ
Bad style
Ÿ
What aspects can the MT user control?
Ÿ
MT input
Ÿ
Lexical coverage
Ÿ
Ways to change input in order to increase MTranslatability
and thus improve the MT output.
Ÿ
Is it possible to predict the output quality for given input
automatically?
Ÿ
Avoid bitmaps when possible; these are usually not
translated by MT systems
Ÿ
Do not abuse tags to accomplish a purely physical effect
(e.g. a header tag just to achieve a bigger font) or tags that accomplish
formatting on their own (e.g. <br>).
Ÿ
Use mark-up to accomplish the desired layout for tables etc,
rather than “manual” indentation.
Ÿ
Specify the LANG attribute for HTML documents. Mark
any parts that are in a different language from that of the main document.
Ÿ
Write hypertext links and bold-faced (italicized etc) text
such that they can be translated as a single entity. This way the markup
will look better for the translation. Mark strings that should not be
translated.
Ÿ
Use ISO 8859 (or Unicode characters) throughout. Else, use entities
for characters that are not part of the ASCII character set. For
instance, in the SGML/HTML source code, your entity for ü [u-umlaut] should
be:ü.
Ÿ
Make sure that words that are used as labels or names are
properly identified.
E.g. The red button vs The “RED”
button.You can
use defined tags such as <q> RED </q>.
Ÿ
Punctuation that indicates a new segment2
is especially important.
Ÿ
Remember correct use of hyphens.
Do not
write: If the
user provided file is not found, an error message is issued.
Do
write: If
the user-provided file is not found, an error message is issued.
Do not
write: He
bit-off more than he can chew.
Do write: He bit off more than he can
chew.
Ÿ
Commas do make a
difference.
Do not write: Since Jay always jogs a mile
doesn't seem that far to him.
Do write: Since Jay always jogs, a mile
doesn't seem that far to him.
Ÿ
If a word is misspelled, it
will -- at best --produce a non-translation. At worst it will mess up
source analysis and produce a wrong grammatical structure.
Ÿ
Special terminology
Ÿ
You may use certain
words in a nonstandard sense, but make sure you update your dictionary.
Ÿ
Multi words
Ÿ
Many noun strings
cannot be translated compositionally and have to be treated as a unit.
But beware: Not all MT systems can handle coordination of premodifiers in multi
words. E.g. Forward and backward compatible; side and back
exits.
Do not
write: File
information and data type is of utmost importance.
Do
write: File
information and data type are of utmost importance.
Do not
write: Woven
of combed cotton, you will love our sweater's soft feel.
Do
write: Woven
of combed cotton, this sweater will delight you with its soft feel.
Do
write: Our
sweater is woven of combed cotton, and you'll love its soft feel.
Ÿ
Use articles whenever
possible
Do write: Meeting the requirements.
Ÿ
In coordinated
phrases:
Repeat articles
Do not write: The system reads the file or result field definition.
Do write: The system reads the file or the result field
definition.
Repeat any modal/auxiliary verb
Do not write: The application can use the window to establish a
dialog with the user and format text responses.
Do write: The application can use the window in order to
establish a dialog with the user and can format text responses.
Repeat “to” before infinitives
Do not write: The application can use the window to establish a
dialog with the user and format text responses.
Do write: The application can use the window in order to
establish a dialog with the user and to format text responses.
Do not write: The coordinates that are displayed correspond to the
top of the slider in the vertical slide bar, and the top edge of the slider in
the horizontal slide bar.
Do write: The coordinates that are displayed correspond to the
top of the slider in the vertical slide bar, and to the top edge of the slider
in the horizontal slide bar.
Use “either”-”or” instead of “or” alone
Do not write: The system immediately terminates the program if a
hard error or exception occurs.
Do write: The system immediately terminates the program if
either a hard error or an exception occurs.
Use “both”-”and” instead of “and” alone
Do not write: The system immediately terminates the program if it
detects a hard error and exception.
Do write: The system immediately terminates the program if it
detects both a hard error and an exception.
Do
not write: The cotton shirts are made from comes from
Arizona.
Do write: The cotton that shirts
are made from comes from Arizona.
Do not write: In experiment 6 we were interested in the reading
subjects spontaneously achieve for such a headline.
Do write: In experiment 6 we were interested in the reading
that subjects spontaneously achieve for such a headline.
Do not write: After a process creates a resource, any process it
starts inherits the resource identifiers.
Do write: After a process creates a resource, any process that
it starts inherits the resource identifiers.
Do
not write: The amount of adjacent space available in storage
does not restrict the size of a library, or of any other object.
Do write: The amount of adjacent
space that isavailable in storage does not restrict the size of a library, or
of any other object.
Do not write: Programs currently running in the system are
indicated by icons in the lower part of the screen.
Do write: Programs that are currently running in the system
are indicated by icons in the lower part of the screen.
Do write: Icons in the lower part of the screen indicate
programs that are currently running in the system.
Do not write: The horse raced past the barn fell.
Do write: The horse that was raced past the barn fell.
Do
not write: Make sure the power is turned off.
Do write: Make sure that the
power is turned off.
Do not write: Use this function to copy project data to a new or existing project.
Do write: Use this function in order to copy project data to a
new or existing project.
Ÿ
Rewrite -ing verbs
that post-modify a noun as a relative clause or add a suitable preposition,
depending on what you mean
Do not write: You can develop an application using the TCP/IP sockets.
Do write: You
can develop an application by using the TCP/IP sockets.
Ÿ
Rewrite -ing verbs
pre-modifying a noun to include an article
Do not write: DATAMAX continues
processing statements after repairing the data set.
Do write: DATAMAX
continues the processing statements after it repairs the data set.
If that is what you meant...
Ÿ
Rewrite -ing verbs
that are complements of other verbs
Do not write: The motor starts
using a gas-powered pull start or pushbutton
Do write: You
use a gas-powered pull start or pushbutton ignition via a rechargeable battery
in order to start the motor.
Ÿ
Rewrite -ing verbs
that can take an infinitive complement as “to” + infinitive
Do not write: Receiving notices.
Do write: To
receive notices.
Ÿ
Make sure the
implicit subject of an -ing verb that occurs in a subordinate clause starting
with a subordinate conjunction (“after”, “when”, “while” etc.) has the same
subject as in the superordinate clause
Do not write: After inserting the
diskette, the system will read the file.
Do write: After you insert the diskette, the system
will read the file.
Ÿ
Beware.
Kohl(1999) claims that it is not necessary to worry about the following cases:
a.
ing-verbs that are
preceded by a preposition. A slight variation of his example is For more information about
printing files, see Chapter 3.
However, in the context of MT, this is ambiguous between the reading
where files is the object of print,
and the reading where printing pre-modifies files.
b.
ing-verbs that are
the subject of a clause of a sentence .His example is Specifying the system
password gives you full administrative access.
He goes on to say:“When it’s the first word of a simple sentence, an -ING can only be a gerund.” This is not generally true. The
reason this example is not ambiguous is that there is a determiner (the) between the ing-verb and the following noun.
Humans
often disambiguate by applying real-world knowledge, but even then there may be
problems as evidenced by the notorious example Visiting relatives can be a
nuisance.
Or how about this real, but truly ambiguous sentence: At XYZ Inc. we don't waste
any time improving service for our customers!
Ÿ
In many languages the
pronoun has to agree in number and gender with its antecedent. Most MT
systems do not support pronoun resolution, which is a rather difficult task.
The
police refused the anarchists a permit because they advocated violence.
La police a refusé un permis aux anarchistes parce
qu'elle craint des actes de violence.
La police a refusé un permis aux anarchistes parce qu'ils
prônent la violence.
English verb particles represent a challenge to MT
systems because of the ambiguity of particles and prepositions. If there is
a choice between two synonymous verbs, one with a particle and one without, do
choose the latter.
Do not write:She
ran up a bill.
Do write:She
accumulated a bill.
Do not write: Transfer file.
Do write: Transfer the file.
Do write: The transfer file.
Do write: At all levels of security,
the system-supplied defaults in the user profile can be changed. Authority can
be specifically given to the users or taken away from the users.
Ÿ
Do not write: He got my goat.
Ÿ
Do write: He annoyed me.
Do not write: Communication between programs, between jobs, between users, between users
and programs and between users and the system occurs through messages.
Do write: Communication occurs through messages. This is true for
communciation between programs, between jobs, between users, as well as for
communcation between users and programs, and between users and the system.
Do not write: Is she suing the hospital? -- She is the doctor.
Do write: Is she suing
the hospital? -- She is suing the doctor.
Do write: The amount of adjacent
space that is available in storage does not restrict the size of a library, or
of any other object.
After you have set up your workstation, you can:
a.
Log
on to the network
b.
Work
locally
Do write: After you have set up
your workstation, you can log on to the network or work locally.
Do write:
After you have set up your
workstation, you can do the following:
a.
You
can log on to the network
b.
You
can work locally
Ÿ
Exact repetitions make it more fruitful to use translation
memory
Ÿ
The objective of
spell checkers is to point out misspelled words and, where possible, suggest
the correct spelling.
Do
not write: There
very happy.
Do
write: They’re
very happy.
I have a spelling
chequer
It came with my pea sea
It plainly marques four
my revue
Miss steaks eye cannot
sea
I weight four it two say
Weather eye am wrong or
wright
It shows me strait away
It nose bee fore two
late
And eye can put the
error rite
Its rarely, rarely grate
I'm shore your pleased
to no
It's letter perfect in
it's weigh
My chequer tolled me
sew.
March, 1990 -- p. 209, author name n.a. --
As an extra addled service, I am going to put this column in the Spilling
Checker, where I tryst it will sale through with flying colons.In this modern
ear, it is simply inexplicable to ask readers to expose themselves to
misspelled swords when they have bitter things to do. And with all the other
timesaving features on my new work processor, it is in realty very easy to
pit together a colon like this one and get it tight. For instants, if there is
a work that is wrong, I just put the curse on it, press Delete and its Well sometimes it deletes
to the end of the lion or worst yet the whole rage. Four bigger problems, there is the Cat and Paste option. If there is some test that is somewhere
were you wish it where somewhere else you jest put the curse at both ends and
wash it dissapear. Where you want
it to reappear simply bring four quarts of water to a rotting boil and throw in
112 pounds of dazed chicken.
Sometimes it brings in the Cat that was Pasted yesterday. But usually it comes out as you planned,
or better. And if it doesn't,
there are lots of other easy to lose options...
Ÿ
The objective of
grammar checkers is to point out ungrammatical constructions.
Ÿ
Grammatical input to
MT stands a better chance of getting a good translation; however, it is not
sufficient to guarantee a correct translation.
Ÿ
Grammar checking is a
very difficult process because the program basically has to try to make
(grammatical) sense of (grammatical) nonsense. Consequently, the
precision of grammar checkers is notoriously low.
Ÿ
Grammar checkers show
a tendency to lump together different kinds of problems. Some of these
problems are more relevant for MTranslatability than others; consequently, some
checks fall into more than one usefulness category, depending on which aspect
you are looking at.
Ÿ
Capitalization of first word in a sentence
Ÿ
Hyphenated and compound words
Ÿ
Words in split infinitives ( > 1)
Ÿ
Passive sentences
Ÿ
Commonly confused words ( its/it’s, their/there/they’re)
Ÿ
Punctuation
Ÿ
Relative clauses (who, which, that)
Ÿ
Sentence structure (e.g. bad participial modification:
Having run the marathon, it was time to rest.)
Ÿ
Subject-verb agreement
Ÿ
Successive nouns ( > 3)
Ÿ
Successive prepositional phrases ( > 3)
Ÿ
Verb and noun phrases
Ÿ
Cliches (these tend to be idiomatic)
Ÿ
Colloquialisms
Ÿ
Jargon
Ÿ
Unclear phrasing (various cases of ambiguous scope)
Ÿ
Double negation
Ÿ
Sentence length ( > 60 words) (this maximum is very high,
but it’s better than nothing)
Ÿ
Wordiness (to the extent it reduces sentence length)
Ÿ
Verb contractions (‘s, which is ambiguous between is, has,
and possessive; ‘d, which is ambigous between had and would)
Ÿ
Possessives and plurals(houses vs. house’s)
Ÿ
Misused words (includes various grammatical mistakes for
adjectives and adverbs; wrong case)
Ÿ
Gender-specific words
Ÿ
Sentences beginning
with And, But, Hopefully, and Plus
Ÿ
Use of first person
Ÿ
Numbers (use of
digits instead of spelled-out numbers)
Ÿ
Verb contractions (‘m, n’t,
‘re, ‘ll, ‘ve; these help parsing)
Ÿ
Sentence structure(e.g. repetition
of conjunctions:She ate a hot dog and a coke and an ice cream cone.)
Ÿ
Wordiness (to the extent it
prevents disambiguation)
Ÿ
Verb agreement with
there/here
Ÿ
Capitalization errors
Ÿ
Compounding errors
(missing or superfluous hyphen.)
Ÿ
Doubled words (the
the)
Ÿ
Open vs closed
spelling (spelling errors that result from incorrect use of spaces. never the
less instead of nevertheless.)
Ÿ
Clause errors (punctuation;
incomplete sentences)
Ÿ
Double negations
Ÿ
Formatting errors
Ÿ
format of numbers
(placement of periods and commas; endings of ordinal numbers; spelling of
fractions and other numbers)
Ÿ
dates (use of
cardinal and ordinal numbers)
Ÿ
times (use of abbreviations
and punctuation marks)
Ÿ
currency and other
symbols
Ÿ
addresses
Ÿ
Inappropriate
prepositions (adhere to instead of adhere by; center on instead of center
around.)
Ÿ
Mass/count noun
agreementwith adjectives (less vs fewer)
Ÿ
Misused words
(confused words: sit vs. set)
Ÿ
Nonstandard
modification (adjectives instead of adverbs; hyphenation).
Ÿ
Noun phrase
consistency errors (errors of number agreement between determiners and nouns).
Ÿ
Pronoun errors (errors
in case and ordering; which instead of that in restrictive clauses.)
Ÿ
Punctuation errors
Ÿ
Subject-verb
agreement errors
Ÿ
Non-standard English
(seeing as how instead of since)
Ÿ
Verb group consistency
errors (errors in the use of the present, the past, and the past participle, as
well as errors in the choice of auxiliary verbs.)
Ÿ
Word order errors
(incorrect ordering of certain words that modify nouns; my both instead of both
my).
Ÿ
Commonly confused
words (commonly confused words of different parts of speech that have similar
though not identical pronunciations; advice vs advise.) and homonyms.
Ÿ
Clichés
Ÿ
Verb contractions
(‘s, which is ambiguous betweenis, has, and possessive; ‘d, which is ambigous
between had and would)
Ÿ
Informal expressions
Ÿ
Jargon
Ÿ
Passive voice usage
Ÿ
Overused phrases
(blissful ignorance instead of ignorance), stock phrases (fillers like in
fact), and wordy expressions (vague or wordy expressions; in all probability
instead of probably).
Ÿ
Redundant expressions
(sufficient enough instead of sufficient or enough).
Ÿ
Weak modifiers
(overused or colloquial modifiers; funny, pretty well, or nice).
Ÿ
Many consequtive
prepositional phrases (limit is user-definable)
Ÿ
Many consequtive
nouns (limit is user-definable)
Ÿ
Split infinitives
(limit is user-definable)
Ÿ
Misspelled foreign
expressions
Ÿ
Nonstandard terms
Ÿ
Archaic expressions
Ÿ
‘A’ vs ‘An’
Ÿ
Gender-specific
expressions
Ÿ
Sexist expressions
Ÿ
Vague, wordy, or
informal quantifiers
Ÿ
Unnecessary
prepositions.
This check seems incorrect,
judging from the help text, which is as follows:
These
rules flag expressions that include an unnecessary preposition and suggest
deleting it to make the expression more concise. Example: in the sentence 'I
sat down on the lawn,' the preposition 'down' is superfluous since it is
implied by the word 'sat.'
In our view, the sentence without
the particle has a different meaning.
Ÿ
Clause errors
(repetition of conjunctions: We chopped up fruit,
and we diced the potatoes, and we made a pie crust)
Ÿ
Verb contractions (‘m, n’t, ‘re, ‘ll,‘ve)
Ÿ
Pretentious words
(unnecessarily complex words; eventuate instead of take place).
Ÿ
Identical sentence
openers
Ÿ
Abbreviation
Ÿ
Confused adjective or
adverb
Ÿ
Archaic
Ÿ
‘A’ vs ’An’
Ÿ
Capitalization
Ÿ
Cliche (idiomatic)
Ÿ
Colloquial (idiomatic)
Ÿ
Commonly confused
words and similar words (from vs form)
Ÿ
Wrong comparative or
superlative
Ÿ
Conditional Clause
(incorrect verb forms)
Ÿ
Conjunctions
(neither-nor; between X and Y; parallelism)
Ÿ
Consequtive elements
(number of nouns or prepositions in a row; user-definable)
Ÿ
Date and time format
Ÿ
Double negation
Ÿ
Doubled word or
negation
Ÿ
End-of-sentence
preposition
Ÿ
End-of-sentence
punctuation
Ÿ
Foreign expressions
Ÿ
Formalisms
Ÿ
Dangling modifiers
(subjectless -ing-verb)
Ÿ
disinterested vs. uninterested
Ÿ
Wrong use of hopefully (the value of this is questionable)
Ÿ
Latin singulars and
plurals (singular of strata is stratum)
Ÿ
who vs. whom
Ÿ
Hyphenation
Ÿ
Idiomatic usage
Ÿ
Incomplete sentence, including
stand-alone subordinate clauses
Ÿ
Other incorrect verb
forms, including infinitive used incorrectly instead of -ing-verb and tense
shifts
Ÿ
Jargon
Ÿ
Long sentence
Ÿ
Mid-sentence adverb
(position before auxiliary verb)
Ÿ
Noun phrases (missing
article before singular, countable noun; number disagreement; scrambled word
order)
Ÿ
Object of verb
(missing or superfluous objects; number disagreement with complement of linking
verb; missing preposition for prepositional complement)
Ÿ
Overstated
Ÿ
Passive voice
Ÿ
Pronoun errors
(errors in case and number agreement; which vs who)
Ÿ
Punctuation (missing
commas; comma splice; apostrophe; colon; semicolon; question mark; quotation
marks, unbalanced (), {}, [], “”)
Ÿ
Questionable usage
Ÿ
Redundancy
Ÿ
Spelling
Ÿ
Split infinitive
Ÿ
Subject-verb agreement
Ÿ
Trademarks (xerox vs
photocopy)
Ÿ
Wordy
Ÿ
Conjunctions (plus vs also
as sentence starter)
Ÿ
Gender-specific
Ÿ
Number style
Ÿ
Offensive
Ÿ
One-sentence
paragraphs
Ÿ
Sentence variety
Ÿ
Second-person address
( you vs one).“One” is at least
as ambiguous as “you”.
Ÿ
Ellipsis spaces
(between the dots). Better not to use ellipsis at all.
Ÿ
wrong punctuation
Ÿ
wrong case
Ÿ
incorrect word
separation
Ÿ
lack of
subject-predicate agreement, etc.
Ÿ
Sentence is too long,
contains too many information units
Ÿ
Avoid complex
attributes (Darüberhinaus
wird ein externer, kabelloser, über eine Infrarotverbindung am DIS
angeschlossener Drucker angeboten.)
Ÿ
No more than 14 words
before the verb (Die beiden vom rechten Radhauskanal kommenden
Kraftstoff-Stahlleitungen an den Schlauchanschlüssen zum Kraftstoff-Filter bzw.
zur fahrzeugbodenseitigen Rücklaufleitung abziehen.)
Ÿ
Avoid ambiguous structures (Anlageflächen von Schaumresten
reinigen.)
Ÿ
Rephrase groups of prepositional phrases (Undichtheit
am Kraftstoff-Entlüftungswellrohr von rechter Tankkammer zu Tankeinfüllstutzen
infolge Knickbeschädigung anläßlich der Tankmontage.)
Ÿ
The subject should
come before the verb in the main clause (Das Gras frißt die Kuh.)
Ÿ
Separate main clauses
(Kaltstartprobleme,
DDD-Kontrollampe leuchtet, Motor läuft im Notprogramm.)
Ÿ
Do not insert too
many elements between the parts of the verb
(Dieser
stellt sich beim Beschleunigen aus ca. 1500 U/min. insbesondere im zweiten Gang
unter hoher Last bzw. Vollast als inhomogenes Beschleunigungsverhalten dar.)
Ÿ
Use a conditional
conjunction for conditional clauses (Wird Korrosion festgestellt, sind die betroffenenen
Bauteile auszutauschen.)
Ÿ
Write complete
sentences(Wärmetauscher
undicht?)
References:
Schmidt-Wigger 1998; Reuther 1998.
http://www.iai.uni-sb.de/en/multien.html
Contact person: Ursula Reuther ursel@iai.uni-sb.de
Once this grammar checker is
finished, it should be useful for translatability check. It is expressly
restricted to certain grammatical errors, which is necessary but not sufficient
for improved translatability.
Ÿ
A CL is a form of language with special restrictions on
grammar, style, and vocabulary usage
Ÿ
The objective of a CL is to improve consistency,
readability, translatability, and retrievability
Ÿ
Kant Controlled English from Carnegie Mellon University was
designed with MT in mind. This controlled language aims at balancing the
control of the vocabulary with the control of the grammar. In this way,
the writer is not forced to write very convoluted sentences in order to stay
within the controlled vocabulary.
Ÿ
Limit the meaning per word/part-of-speech to a single
meaning.
Ÿ
Encode synonyms in the lexicon in order to flag deviations
from the single, approved term.
Ÿ
State all ambiguous terms separately in the lexicon in order
to support interactive disambiguation.
Ÿ
The use of determiners is encouraged, whereas the use of
pronouns and conjunctions is limited.
Ÿ
The sense and use of modal verbs is clearly specified.
Ÿ
The use of -ing verbs and -ed verbs is restricted.
Ÿ
Abbreviations
Ÿ
Orthography
Ÿ
Avoid verbs with
particles; use single-word verbs instead
Ÿ
Do not coordinate
verb phrases
Ÿ
Repeat the preposition
in coordinated prepositional phrases
Ÿ
Write relative pronouns explicitly
Ÿ
Avoid ellipsis
Ÿ
All these checks enhance MTranslatability, which is not surprising
since they were designed for the express purpose of improving MTranslatability.
Ÿ
The KANT technology is part of the ClearCheck checker used
by Caterpillar for their controlled language system.
Ÿ
The MAXit AECMA Simplified
English checker offers the following checks:
Ÿ
Abbreviation
Ÿ
Adjective that does not modify a noun
Ÿ
Adverb that does not modify a verb
Ÿ
Subject-verb agreement and subject-pronoun agreement
Ÿ
Contraction or possessive
Ÿ
Awkward sentence
Ÿ
Capitalization
Ÿ
Change verb to noun
Ÿ
Change noun to verb
Ÿ
Missing, superfluous or misplaced comma
Ÿ
Superfluous word
Ÿ
Gerund
Ÿ
Missing or superfluous hyphen
Ÿ
Missing subject or object
Ÿ
Negation
Ÿ
Word not in Simplified English dictionary
Ÿ
Parallelism
Ÿ
Passive voice
Ÿ
Verb with particle
Ÿ
Non-allowed prefix or suffix
Ÿ
Wrong position of preposition
Ÿ
Wrong punctuation
Ÿ
Rephrasing required
Ÿ
Long sentence (> 21 words)
Ÿ
Spelling error
Ÿ
Missing article
Ÿ
Wrong use of terminology
Ÿ
“That” vs “which” vs “who”
Ÿ
Translation problem
Ÿ
Complex verb tense
Ÿ
Wrong word
Ÿ
Noun cluster (> 2 nouns in a row)
Ÿ
Wrong verb
Ÿ
Date format
Ÿ
Wrong word for Simplified English
Ÿ
Vague measurement
Ÿ
Label
Ÿ
Number style
Ÿ
Safety warnings
required
Ÿ
Gender-specific
pronoun
Ÿ
The Boeing
Simplified English Checker is the most complete and accurate checker of Simplified
English requirements. In addition to checking for SE compliance, the
Boeing SE Checker also catches mistakes like lack of subject-verb agreement,
repeated words, misspelled words, and punctuation problems.
Ÿ
The Boeing Plain
English Checker checks for compliance with the U.S. Government’s Plain
Language requirements. (http://www.plainlanguage.gov)
Http://www.boeing.com/assocproducts/sechecker/se.html
Ÿ
IBM’s EEA tool is an
authoring tool that points out ambiguity and complexity, thereby helping
writers produce documents that are more MTranslatable. EEA also does some
standard grammar checking. EEA is used by information developers in
IBM. Some checks that are not directly aimed at improving
MTranslatability are included in order to accommodate corporate writing
guidelines.
Ÿ
Ambiguous nonfinite
verb phrase
Ÿ
Ambiguous conjunction
Ÿ
Ambiguous scope in
coordination
Ÿ
Passive voice and
ambiguous double passives
Ÿ
Long sentence
Ÿ
Long noun string
Ÿ
Nonparsed sentence
Ÿ
Unknown or misspelled
words
Ÿ
Punctuation (missing
commas, hyphens, periods, question marks; comma splice; slash to mean
"and/or"; plural with (s))
Ÿ
Wrong comparative or
superlative form
Ÿ
Lack of subject-verb
agreement
Ÿ
Nonparallel
coordinated phrase
Ÿ
Double negative
Ÿ
Noun phrase with many
prepositions
Ÿ
Potentially wrong
subject for verb phrase
Ÿ
Potentially wrong
modification
Ÿ
Pronoun problems:
Pronoun case and lack of agreement for reflexives
Ÿ
Dangling preposition
Ÿ
Noncapitalization of
first word in a sentence
Ÿ
Duplicated word
Ÿ
Verb contractions
(‘s, which is ambiguous between is, has, and possessive; ‘d, which is ambigous
between had and would)
Ÿ
Missing "that"
Ÿ
Word not in
controlled vocabulary
Ÿ
Incomplete sentence
Ÿ
Latin abbreviation
Ÿ
First occurrence of
abbreviation
Ÿ
Wrong indefinite
article "a" or "an"
Ÿ
Verb contractions (‘m, n’t, ‘re, ‘ll, ‘ve)
Ÿ
Restricted word;
prohibited word
|
|
Ÿ
EEA’s Clarity Index
summarizes the problems that are encountered in a given document as a single
number that indicates the clarity (or MTranslatibility) for the whole
document. The problems are weighted according to severity (impact),
context, and document size.
Ÿ
EEA also includes
ETerms, which collects multinouns and unknown words. These are candidates
for terminology to be added to the user lexicons.
<su>
<np sem=time0> time</np>
<v sem=fly1>flies</v>
<adp>like<np>an
arrow</np></adp>
</su>
where “XML elements such as
<np>...</np>encode parse tree bracketing, and the property sem
disambiguates polysemy of words. ”The word senses here (time0 and fly1)
are based on WordNet senses. The plan is that a growing population of GDA users
will develop their own ontologies for all languages. The way such an XML
tagger improves MTranslatability -- assuming all MT engines are modified to
recognize the tags -- is obvious: Some of the hardest problems for the MT
parser will be solved. Disambiguation on both the syntactic and the semantic
levels will be resolved and proper nouns will be identified.
(2) A post-processing routine converts the output of
the Japanese KNP parser (Kurohasi and Nagao 1994) into LAL format.
The annotations produced in (1) and (2) are used as
input to the annotation editors for English and Japanese.
(3) IBM's English to German, French, Spanish,
Italian, and Japanese translation engines can utilize LAL-annotated input. This
means that ambiguities can be resolved by using the annotation editor to
pre-edit the source text before translation into several languages.
The annotation editor allows
the user to edit the LAL annotation of a text. This editor is interfaced to the
LAL-generating grammar, which provides annotation for each segment. A
human editor can then use the annotation editor's graphical user interface to
check over the automatically-produced annotation and change it as
necessary. The user can do this without having to see the tags by working
on the graphical representation of the tree; the changes are then reflected in
the internal LAL annotation.
She saw a man with a telescope.
She <lal:w
id="w1" lex="see" pos="v"
sense="see1">saw</lal:w>a man<lal:w
mod="w1">with </lal:w>a telescope.
This example
shows that the seeing action is done with the telescope because
"with" modifies the entity having id "w1", i.e.
"see".
IBM
<lal:acronym expan="International Business Machines">IBM</lal:acronym>
In this
example "IBM" is marked as an acronym with expansion
"International Business Machines".
<lal:s>The
<lal:w id="w1">cat</lal:w>chased<lal:w
id="w2">a mouse</lal:w>.</lal:s><lal:s>After<lal:w
ref="w1">it</lal:w>caught
the<lal:w
ref=w2>mouse</lal:w>,<lal:w
ref="w1">it</lal:w>
ate<lal:w
ref="w2">it.</lal:w></lal:s>
Here the id tag
is used in the context of pronoun resolution. "cat" is
assigned id=w1, and "mouse" has id=w2. The human editor can mark the
ref value for the pronouns appropriately.
Developed by IBM Research Division
Contact person: Hideo Watanabe, hiwat@jp.ibm.com
Ÿ
Is often provided with standard grammar checkers (Microsoft
Word2000, Lotus Word Pro 97, WordPerfect)
Ÿ
Is designed for human readability, not MTranslatability
Ÿ
Based on sentence length and word length
Ÿ
Shorter words and shorter segments are considered easier to
read.
Ÿ
But shorter words are often more ambiguous.
Ÿ
And very short segments (4 words or less) are very ambiguous
in English due to the great ambiguity of part of speech in English.
Ÿ
Gross statistical
properties of the document as a whole
This Translatability
Index (TI) is based on gross statistical properties of a document rather than
on parsing the sentences. This was suggested by the fact that there appeared to
be a rough correlation between the quality of raw MT output and certain gross
properties of the text, such as length of the sentences, degree of syntactic
complexity, discourse characteristics, etc. Although the TI score is
derived on the basis of gross sentence properties, sentence-specific
information cannot be provided with any degree of reliability because there’s
no full-scale parsing.
Ÿ
Scoring
procedure
The
program starts off with a score of 7 and then penalizes the sentences for
negative properties. The decision as to the minimum score that a document must
reach in order to be acceptable for gisting or post-editing purposes is
subjective. There is no absolute, objective threshold.
Ÿ
Statistical
data and results
“Negative”
sentences properties are: too long or too short; words not found in the
MT dictionary; short parentheses; coordination; homographs; interrogatives;
unmatched parentheses; embedded clauses; part of speech ambiguities; certain
ambiguous words (such as -ing verbs,as, with, etc.),
and so forth.
Ÿ
Operational
use and benefits
Before
translation, the user can have the document scored by the TI program. It
will return with a score and a recommendation such as This document is not
suitable for MT or This document is conditionally
suitable for MT. The TI would
also suggest why a particular document is not or only conditionally
suitable. It would tell the user, for instance,
The sentences on the whole are too long
Sentence # x is far too long
The document contains many words and compounds that
are not in the dictionary. Run your document through the New-Word-Search
utility and update your dictionary
The document contains many difficult words such as
...
Ÿ
IBM’s Translation Confidence Index automatically provides an
index of the MT system’s own confidence in its translation, for a given
segment. In other words, the TCI returns a translation quality value for
each segment. This value can be used to mark segments that need special
attention during post-editing. The confidence value is calculated during the
various stages of the MTranslation process. It is based on such factors as
parse scores, text characteristics (ambiguity, difficult constructions),
lexical coverage, and success of structural generation (transformations).
These factors can be set on or off in the TCI’s language-pair-specific user
profile. Whereas the TCI was designed to give an overall picture of the
expected quality of the MT output by taking all aspects of the
MTranslation process into account, the parts that deal with source analysis
give a picture of the general MTranslatability. Turning all non-source language-specific
factors off in the user profile in effect gives an MTranslatability score,
independent of the target language. With all aspects taken into
consideration, the TCI score will give the translatability for a particular
language pair for a specific MT system.
Ÿ
Be careful when you create your documents:
Ÿ
Avoid ambiguity
Ÿ
Avoid bad style
Ÿ
Avoid incorrect grammar
Ÿ
Avoid incorrect spelling
Ÿ
Avoid incorrect punctuation
Ÿ
Avoid bad markup
Ÿ
If you expect your documents to be translated by an MT
system, make sure that the MT dictionary is updated to cover adequate parts of
speech and subject area senses for all your terminology.
Ÿ
And remember: What makes life
easier for the human reader is not always useful in the context of MT!