CD-ROMs Library (see
Liz Hall to check them out!)
Warning: this page is somewhat obsolete; corresponing TWiki is more up-to-date -- Ulf
Text:
North American News Text Corpus
North American News Text Corpus (Supplement) and AP Worldstream English
Continuous Speech Recognition Corpus (4 discs, including lots of English text)
Portuguese Newswire Text Corpus
Spanish Language News Corpus
Spanish Newswire Text Corpus
Al-Hayyat Newspaper (Arabic text) "July 2000"
Al-Hayyat Newspaper (Arabic text) "Electronic Archive 1997"
Chosun (Korean newspaper) 1994
Mainichi Newspaper 1995 (Japanese)
something else in Japanese??
European Languages News Corpus
Hansard Corpus (Parallel text in English and French)
Hong Kong Hansards (Parallel text in English and Chinese)
UN Corpus (Parallel text in English, Spanish, French)
Dictionaries:
Lisan Al-Arab 1995 (Arabic/English)
Abgad Hawaz Dictionary (Arabic/English)
Programs:
Language Assistant (Spanish/Portugese/English MT)
PowerTranslator (L&H), translates Sp, Ge, Fr, It, Pt, En, Jp
Al-Mawrid (English/Arabic)
Sakhr Dictionary (Arabic/English)
Ethnologue, Languages of the World
Treebanks:
LDC Penn Treebank (English) Release 2
LDC Penn Treebank (English) Release 3
Conferences:
DARPA TIDES Kickoff Meeting, "Video Moments" (2 disks).
No hilarious bloopers except maybe when a scientist says "optimum" when he
really means "optimal".
Intelligent Collaboration and Visualization Information Management
1999 International Conference on Acoustics, Speech, and Signal Processing
1999 Automatic Speech Recognition and Understanding Workshop