Language-Specific Text Resources at ISI
Warning: this page is somewhat obsolete; corresponing TWiki is more up-to-date   -- Ulf

Chinese resources - Korean resources

Resource Type Resource Current location Contact person
bilingual dictionaries Japanese/English; several dicts merged; 301,331 entries /nfs/mozart4/trans/kb/j.gloss.9.10.96 Kevin
Tetun/English; Dr. Geoffrey Hull; 32,573 entries /nfs/mozart1/EGYPT/tetun/dictionary/hull* (raw) /nfs/mozart1/EGYPT/tetun/corpus-downcased/dict.{e,t} Kevin
German/English (?) Philipp
Arabic/English (Alpnet and Sakhr)  /nfs/arabic/dict/sakhr Yaser
Korean/English; 149,127 entries /nfs/mendels5/ml/korean/resource/word_lookup/kor-eng.merged.dic Ulf
treebanks English (Penn treebank) /nfs/mahler1/IR/TreeBank Ulf
English (modified Penn treebank) /nfs/mendels5/ml/english/corpora Ulf
Japanese (orig. Kyoto treebank) /nfs/isd/koehn/oldstuff/kyoto.tgz Philipp
Japanese (ISI-Kyoto treebank) /nfs/mendels5/ml/japanese/corpora Ulf
Korean (ISI treebank) /nfs/mendels5/ml/korean/corpora Ulf
German (Negra treebank) /nfs/mendels5/ml/german/corpora/negra Ulf
Chinese (Penn treebank) /nfs/webcl1/webcl/corpus/chinese/chinese_treebank_2 Chin-Yew
bilingual text corpora aligned English/English (compressed sentences) Daniel
English/English (articles and human abstracts) Daniel
French/English (old Hansard) /nfs/montev/EGYPT/french/corpora/ldc/ Philipp
Chinese/English (Hong Kong laws)  /nfs/montev/EGYPT/chinese/corpora/ Yaser
Czech/English (Readers Digest) Yaser
Tetun/English (United Nations web sources) /nfs/montev/EGYPT/tetun/corpora/ Philipp
Arabic/English (Koran) /nfs/montev/EGYPT/arabic/corpora/ Yaser
Japanese/English (computer manuals) Kenji
bilingual text corpora unaligned Turkish/English (?)  /nfs/montev/EGYPT/turkish/corpora/ Yaser
French/English (new Hansard) /nfs/montev/EGYPT/french/corpora/new.hansards/ Yaser
multilingual text corpora aligned none yet
multilingual text corpora unaligned English/German/Spanish/... (European economic news) Philipp
monolingual text corpora English (Wall Street Journal) /nfs/text2/corpora/wsj-19*/*
Japanese (TIPSTER)
Japanese (Mainichi Shinbun) Kenji
German (?) Philipp
Arabic (Al-Hayat)  /nfs/arabic/text/hayat/ Yaser
Spanish (?)
Chinese (People's Daily)
Korean (Chosun newspaper) /nfs/mendels5/ml/korean/corpora/sentence-corpus-chosun94 (complete corpus on CD-ROM) Ulf
named entity tagged data MUC (6&7) and MET (1&2) /nfs/montev/MUC/ Yaser
ontologies Sensus /nfs/mozart4/trans/kb Ulf
WordNet /nfs/mahler1/IR/WordNet/wordnet-1.6/dict Chin-Yew
Contex ontologies /nfs/mendels5/ml/english/onto
/nfs/mendels5/ml/japanese/jonto
/nfs/mendels5/ml/korean/konto
/nfs/mendels5/ml/chinese/conto
Ulf