Research projects get messy fast. One approach doesn’t work, another unexpectedly does. Training models takes a long time, so you don’t want to lose them. Results that seemed unimportant at first turn out important when the direction of the project changes. Many researchers organize their results with clever filenames, or in spreadsheets, but it’s a manual process, errors creep in, and more often than not you can’t figure out how exactly you produced that final model that performs so well.
AI2 Tango is our solution to this problem. Tango breaks down experiments into discrete, repeatable, and most importantly cacheable steps. In this talk I’ll show you how to use the cache, and capture 75% of the value of Tango. But we’ll also go deeper and explore the larger Tango ecosystem. Tango has a number of pre-built steps for training and evaluating models that cut down on development time. Other components take care of running experiments with as much parallelism as possible, keeping your cluster busy. I will also talk about how the concept of the “Tango Workspace” means you can run your experiments on any computer with an internet connection, whether it’s on your laptop or in your cluster. If we have time, we’ll talk about how this enables multiple researchers and engineers collaborating on a single project.
I am Dirk Groeneveld, principal engineer at AI2. Out of college I worked at Microsoft Search, first SharePoint and later Bing. Then I went to Amazon, working on something called “Keyphrase Extraction”, attempting to automatically extract product features from product descriptions. Somewhere in between I did two startups, neither of which had much to do with machine learning, and finally I started at AI2, where I have been for the last 8 years. I’ve been at various teams there, working as a pure engineer, as a pure researcher, and now I sit somewhere in the middle, as a technical lead for the AllenNLP Platform team. My team builds general tools for the researchers at AI2, but also gets hands on, collaborating with the researchers throughout their projects.
This talk will be recorded and made available within three business days on the USC ISI YouTube channel
Host: Joel Mathew / POC: Alma Nava