Current Projects

Our Github organization hosts the latest list of tools.

  • AMR-to-English generator

    Converts Abstract Meaning Representations (AMR) into English sentences. Built by Nima Pourdamghani.
  • Bolinas

    Hyperedge replacement transducer package for graphs, built by Jacob Andreas, Daniel Bauer, David Chiang, Karl Moritz Hermann, Bevan Jones, and Kevin Knight.
  • Carmel

    Finite-state transducer package for strings, built by Jonathan Graehl. Latest version on Github.
  • English-to-AMR parser

    Converts English sentences into Abstract Meaning Representations (AMRs). Built by Michael Pust, Ulf Hermjakob, Kevin Knigh, Daniel Marcu, and Jonathan May (Download size = 719Mb).
  • EUREKA

    CPU-based neural LSTM sequence-to-sequence modeling toolkit, built by Ashish Vaswani.
  • Monogiza

    Extracts a word-for-word translation table from non-parallel corpora. Built by Qing Dou.
  • MTData

    A tool capable of retrieving thousands of parallel datasets for machine translation research. Built by Thamme Gowda.
  • NLCodec and NLDb

    A scalable tool for mapping words, characters, BPE subwords into integer sequences, and a storage layer for efficiently storing and retrieving large scale datasets. Built by Thamme Gowda.
  • NPLM

    Neural probabilistic language model toolkit, built by Ashish Vaswani, with contributions from David Chiang and Victoria Fossum.
  • Reader Translator Generator (RTG)

    A feature rich neural machine translation toolkit based on PyTorch, with focus on reproducible experiments. Buily by Thamme Gowda.
  • ReWrite Decoder

    Greedy Decoder for IBM SMT Models. Built by Daniel Marcu and Ulrich Germann.
  • SPADE

    Sentence-level Discourse Parser. Built by Radu Soricut.
  • Tiburon

    Finite-state transducer package for trees, built by Jonathan May.
  • uroman

    Converts texts in any script to Latin alphabet. Our online interface is also available. Built by Ulf Hermjakob.
  • utoken

    Universal tokenizer, i.e. word segmenter for a wide variety of scripts and languages. Built by Ulf Hermjakob.
  • Zoph_RNN

    GPU-based neural LSTM sequence-to-sequence modeling toolkit, built by Barret Zoph.
  • Many-English NMT

    A multilingual NMT model that can translate from 500 source languages to English. Built by Thamme Gowda.
  • Poetry generator

    Creates a poem on any topic. Built by Marjan Ghazvininejad, Xing Shi, Yejin Choi, and Kevin Knight.
  • Poetry password demo and assigner

    Shows poems create from randomly-generated 60-bit passwords. Built by Marjan Ghazvininejad.
  • Portmanteau generator

    Creates a new word (neologism) from two existing words. Built by Aliya Deri.
  • Smatch

    Evaluates output of semantic parsing. Built by Shu Cai.
  • Spolin Bot

    Chat with our improvisation bot!
  • HowToSpeak

    Allows users to speak a language they don't understand, by phonetic rendering. Built by Xing Shi.
  • AMR Editor

    Allows human annotators to type in the meanings of English sentences, using the Abstract Meaning Representation framework. Built by Ulf Hermjakob. AMR Editor Overview video.
  • RST Annotation Tool

    Enables annotators to build Rhetorical Structure Representations for texts. Built by Benjamin Liberman.
  • Shannon Game

    Collects character-level text predictions from people, in order to estimate the entropy of translation. Built by Marjan Ghazvininejad.
  • AMR parsing

    This 2016 SemEval challenge asks participants to write software to convert English into Abstract Meaning Representations. Run by Jonathan May.
  • Bilingual compression challenge

    If we exploit the high redundancy of human translated texts, what is the best compression rate we can achieve for bilingual texts? Run by Barret Zoph, Kevin Knight, and Marjan Ghazvininejad.