CD-ROMs Library (see Liz Hall to check them out!)
Warning: this page is somewhat obsolete; corresponing TWiki is more up-to-date   -- Ulf

Text:
  • North American News Text Corpus
  • North American News Text Corpus (Supplement) and AP Worldstream English
  • Continuous Speech Recognition Corpus (4 discs, including lots of English text)
  • Portuguese Newswire Text Corpus
  • Spanish Language News Corpus
  • Spanish Newswire Text Corpus
  • Al-Hayyat Newspaper (Arabic text) "July 2000"
  • Al-Hayyat Newspaper (Arabic text) "Electronic Archive 1997"
  • Chosun (Korean newspaper) 1994
  • Mainichi Newspaper 1995 (Japanese)
  • something else in Japanese??
  • European Languages News Corpus
  • Hansard Corpus (Parallel text in English and French)
  • Hong Kong Hansards (Parallel text in English and Chinese)
  • UN Corpus (Parallel text in English, Spanish, French)
    Dictionaries:
  • Lisan Al-Arab 1995 (Arabic/English)
  • Abgad Hawaz Dictionary (Arabic/English)
    Programs:
  • Language Assistant (Spanish/Portugese/English MT)
  • PowerTranslator (L&H), translates Sp, Ge, Fr, It, Pt, En, Jp
  • Al-Mawrid (English/Arabic)
  • Sakhr Dictionary (Arabic/English)
  • Ethnologue, Languages of the World
    Treebanks:
  • LDC Penn Treebank (English) Release 2
  • LDC Penn Treebank (English) Release 3
    Conferences:
  • DARPA TIDES Kickoff Meeting, "Video Moments" (2 disks). No hilarious bloopers except maybe when a scientist says "optimum" when he really means "optimal".
  • Intelligent Collaboration and Visualization Information Management
  • 1999 International Conference on Acoustics, Speech, and Signal Processing
  • 1999 Automatic Speech Recognition and Understanding Workshop