NLG Seminars

A weekly meeting of the Natural Language Group. Seminars usually take place on Thursdays from 11:00am until 12:00pm.

Subscribe Add to Calendar

For scheduling a talk scheduling, contact the current organizer Meryem M'hamdi, and Jon May by emailing to nlg-seminar-host(at)isi.edu.

Visitor Information

Virtual visit: Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you're highly encouraged to use your USC account to sign into Zoom. If you're an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu.) beforehand so we'll be aware of your attendance and let you in.

In-person visit: Outside visitors should go to the tenth floor lobby where they will be met and escorted to the appropriate location 5 minutes before the talk. For further directions on how to get to ISI, please click here.

Click on the titles to view the abstracts/time/location/online meeting link:

Date	Speaker	Title
20 Oct 2022	Sewon Min	Understanding and Improving Learning through Inference with Large Language Models Time: 11am - 12:30pm Online Zoom: https://usc.zoom.us/j/94452797669 Location: The audience has the option of watching the speaker in-person in conference rooms #689.
27 Oct 2022	Eric Wallace	TBD Time: 11am - 12:30pm Online Zoom: https://usc.zoom.us/j/98280518938 Location: The audience has the option of watching the speaker in-person in conference rooms #689.
3 Nov 2022	Jonas Pfeiffer	TBD Time: 11am - 12:30pm Online Zoom: https://usc.zoom.us/j/98280518938 Location: The audience has the option of watching the speaker in-person in conference rooms #689.
12 Jan 2023	Arjun Subramonian	Knowledge Grounding in Human-centric AI for effective, explainable, and equitable outcomes Time: 11am - 12:30pm Online Zoom: Location: The audience has the option of watching the speaker in-person in conference rooms #689.

Click on the titles to view the abstracts/time/location/online meeting link:

Date	Speaker	Title
18 Aug 2022	Jaqueline He, Taiwei Shi, Jonne Sälevä	Interns' Presentations: (1) On Exploiting Context Usage in Document-Level Neural Machine Translation (2) Improving Moderation of Online Discussions via Nonviolent Communication (3) Heritage-aware language model adaptation for diasporic languages Time: 11am - 12:30pm Location: The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/98141927068 Recording:https://www.youtube.com/watch?v=kfBXkRhVOps Recording:No Talk details: https://www.isi.edu/events/3010/x
11 Aug 2022	Mitch Mithun (USC ISI)	Quantum Natural Language Processing Time: 11am - 12pm Location: The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/94780987161 Recording:https://www.youtube.com/watch?v=Q4hICz6bbJ8 Abstract: Quantum Natural Language Processing (QNLP) is a very nascent field which deals with using quantum computers to solve natural language processing problems. Quantum advantage (speed-up) for QNLP tasks has already been established in literature and has been attributed to the fact that quantum models for natural language processing canonically incorporate linguistic meanings with rich linguistic structure, most notably grammar. The fact that it takes a quantum-like model to combine meaning and structure, establishes QNLP as quantum-native, on par with simulation of quantum systems. Moreover, the now leading Noisy Intermediate-Scale Quantum (NISQ) paradigm for encoding classical data on quantum hardware, variational quantum circuits, makes NISQ exceptionally QNLP-friendly: linguistic structure can be encoded as a free lunch, in contrast to the apparently exponentially expensive classical encoding of grammar. In this talk, Mitch will first introduce some quantum mechanics enroute to talking about quantum computation and QNLP. Specifically, he will talk about quantum algorithms for incorporating compositionality, providing some basic conceptual and mathematical foundations needed to understand QNLP, and do so in computer scientist friendly terms. He will end with talking about the state of the art in QNLP Mitch is a post-doctoral researcher at ISI working with Marjorie Freedman in the networking and cyber security division. Mitch, recently graduated from his PhD from University of Arizona before joining ISI as a postdoc. Before that, he worked in the software industry for 10+ years. He got his masters and undergrad degrees from Birla Institute of Technology and Science (BITS), Pilani, India. His research interests include natural language processing, cyber security and quantum computation.
04 Aug 2022	Taiwei Shi (Intern @ ISI); Georgia Institute of Technology	Improving Moderation of Online Discussions via Nonviolent Communication Time: 11am - 11:30am Location: The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/97598508852 Recording: Abstract: The growing number of comments makes online discussions problematic to moderate by human moderators only. A crucial limitation of current automated moderation is that the generations are repetitive, generic, and judgmental, which is not effective in terms of changing someone’s mind and behaviors. We seek to build dialogue models that can intervene in an adversarial conversation involving participants that have abandoned reasoned discussion and descended into personal attacks. While also a difficult problem among humans, we would like to explore the effectiveness of Nonviolent Communication (NVC), an approach to restoring breakdowns in communication. In this talk, we will discuss the strategies of incorporating one aspect of NVC called observation without evaluation (O-vs-E) into dialogue models. First, we obtain a sufficiently large set of O-vs-E dialogue data to train an O-vs-E classifier. We then expand this to a sufficiently large set to fine-tune a dialogue model. We also explore text style transfer to rewrite moderation datasets, so the model could actively intervene in toxic conversations while being less judgmental at the same time. Finally, we will discuss the strategies for evaluating the dialogue model and conclude with future directions. Taiwei Shi is a current summer intern for the Natural Language Group at USC ISI under Professors Jonathan May and Xuezhe Ma. He is also an undergraduate student at the Georgia Institute of Technology, majoring in Computer Science and Mathematics. He has previously worked at Georgia Tech’s SALT lab under Professor Diyi Yang. He is working towards a career where he can pursue his interests and make an impact in natural language processing, especially in the fields of computational social science and philosophy.
04 Aug 2022	Jonne Sälevä (Intern @ ISI; Brandeis University)	Linguistic heritage-aware language model adaptation for diasporic languages Time: 11:30am - 12:00pm Location: The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/97598508852 Recording: Abstract: Multilingual language models (MLLMs) have proven their effectiveness as cross-lingual representation learners that perform well on several downstream tasks and a variety of languages, including many lower-resourced and zero-shot ones. Although effective, MLLMs remain somewhat opaque and the nature of their cross-linguistic transfer is difficult to understand. While it seems plausible that higher- and lower-resourced languages should share information within the model, what is less clear is how such transfer is mediated by linguistic relatedness. In this talk, we investigate this problem through the lens of diasporic languages which can be (crudely) understood as a combination of a "co-cultural language" and a "co-territorial language". Specifically, we ask whether augmenting MLLM adaptation using these ancestral languages, or some mixture of them, can improve MLLM performance on a lower-resourced diasporic language, both in terms of perplexity as well as extrinsically on a named entity recognition task. We outline preliminary results on Yiddish, a Germanic language spoken by Ashkenazi Jews, and discuss the effectiveness of using German and Hebrew as ancestral languages. Finally, we contrast regular ancestral pretraining with recent lexicon-based adaptation approaches by Wang et al (2022) and conclude with directions for future work. Jonne Sälevä is a summer intern in the Natural Language Group at USC ISI, working on language modeling for lower-resourced diasporic languages under Prof. Jonathan May. Jonne is also a Ph.D. student in Computer Science at Brandeis University, where he is working on NLP for morphologically rich and lower-resourced languages as part of the Broadening Linguistic Technologies Lab led by Prof. Constantine Lignos. Prior to his doctoral studies, Jonne received his M.S. in Computer Science from Brandeis University and A.B. in Statistics from Harvard College in 2017.
28 Jul 2022	Jacqueline He (Intern @ ISI; Princeton University)	On Exploiting Context Usage in Document-Level Neural Machine Translation Time: 11am - 11:30am Location: The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/93464840222 Recording:This talk will not be recorded. Abstract: A crucial limitation of current sentence-level machine translation systems is their inability to account for context. By processing each sentence in isolation, existing neural machine translation (NMT) systems are prone to missing important document-level cues and demonstrate a poor understanding of inter-sentential discourse properties, resulting in a noticeable quality difference between human-translated and machine-translated text. In this talk, we will discuss ongoing efforts to construct NMT models that can effectively harness context. We primarily focus on the popular IWSLT'17 English-to-French translation task, and compare against a strong concatenation-based Transformer (Vaswani et al., 2017) baseline. First, we corroborate existing findings (Fernandes et al. 2021) that increasing context can improve translation performance, though with diminishing returns. We hypothesize that the Transformer’s self-attention mechanism may be insufficient for handling long-range dependencies across sentences, both inside and outside of the context window. We then explore replacing the Transformer with a novel neural architecture whose attention layer is based on an exponential moving average to exploit both local and global contexts. Finally, we will discuss a chunk-based strategy towards encoding and decoding text, and conclude with future directions. Jacqueline He is a current summer intern for the Natural Language Group at USC/ISI under Professors Jonathan May and Xuezhe Ma. She recently graduated from Princeton University with a bachelor’s degree in Computer Science. Her current research interest orients around contextual-aware neural machine translation, and she has previously worked on interpretability and ethics in NLP.
30 June 2022	Katy Felkner (USC ISI)	Anti-Queer Bias in Large Language Models Time: 3 pm - 4pm Location: The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/96713436677 Recording: youtube.com/watch?v=AMlxce9-Zf8 Abstract: Happy Pride! To close out Pride Month at ISI, this talk will discuss fairness and bias in LLMs as it relates to the LGBTQ+ community. We will explore current methods for detecting and mitigating bias in LLMs, as well as the (lack of) current research focusing specifically on homophobic and transphobic biases. The talk will present recent exploratory work on whether and to what extent biases against queer and trans people are encoded in large language models (LLMs) such as BERT. It will discuss a new method for reducing these biases in downstream tasks: finetuning the models on data written by and/or about queer people. It will also discuss a new benchmark dataset, WinoQueer, modeled after other bias-detection benchmarks but addressing homophobic and transphobic biases. This work was accepted to the Queer in AI workshop at NAACL 2022. Katy Felkner is a rising 3rd year PhD student at USC Information Sciences Institute. Her primary research focus is extremely low-resource machine translation. She is also interest in fairness and bias in large language models. Prior to USC, she received dual bachelor’s degrees in Computer Science and Letters (general humanities) from the University of Oklahoma. Her research is supported by an NSF Graduate Research Fellowship. Katy is passionate about making computer science more welcoming for women and queer students.
23 June 2022	Kyle Gorman (Graduate Center, City University of New York and Google Inc.)	Weighted Finite-State Transducers: The Later Years Time: 11 am - 12 pm Location:The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/92286721746 Recording: https://www.youtube.com/watch?v=BpEqB3Vj4mM Abstract: While the “deep learning tsunami” defines the state of the art in speech and language processing, finite-state transducer grammars developed by linguists and engineers are still widely used in highly-multilingual settings, particularly for “front-end” speech applications. In this talk, I will first briefly review the current state of the OpenFst and OpenGrm finite-state transducer libraries. I will then discuss several recent innovations in the finite-state world. These include algorithms for inducing text normalization and grapheme-to-phoneme grammars from parallel data, heuristic optimization of arbitrary weighted transducers, and an algorithm for efficiently computing the single shortest string of a wider variety of non-deterministic weighted acceptors. Kyle Gorman is an assistant professor of linguistics at the Graduate Center, City University of New York, and director of the master’s program in computational linguistics; he is also a software engineer in the speech and language algorithms group at Google. With Richard Sproat, he is the coauthor of Finite-State Text Processing (Morgan & Claypool, 2021) and the creator of Pynini, a finite-state text processing library for Python. He has also published on statistical methods for comparing computational models, text normalization, grapheme-to-phoneme conversion, and morphological analysis, as well as many topics in linguistic theory.
13 June 2022	Naomi Saphra (NYU)	Sources of Variance in Pretraining and Fine Tuning LLMs Time: 2:00pm-3:00pm Location: The audience has the option of watching the speaker in-person in conference rooms #1135-37. Online Zoom: https://usc.zoom.us/j/98917272824 Recording: https://www.youtube.com/watch?v=Lni4PIlbJjI Abstract: You have engaged in the very modern practice of transfer learning. You pretrained a model on a self-supervised objective, then you finetuned it on a downstream task, and you find excellent performance on the test set. “Aha”, you say. “I found a good pretraining procedure.” Did you? You try finetuning again. The results are terrible! “Aha”, you say. “I found a bad finetuning procedure.” Did you? The random seeds for both pretraining and finetuning stages have a substantial influence on outcome. However, it is computationally expensive to pretrain new models, so measuring the robustness of a procedure across different seeds can be prohibitive. This talk will address, first, the influence that a pretraining seed has on both in-domain and OOD performance. Then we will address the role of the finetuning seed. Much variation in OOD generalization can be ascribed to where the finetuning seeds direct SGD trajectories. In particular, we discuss how to predict generalization behavior in a finetuned model, based on topographic properties of its region of the loss surface. By understanding the degree of influence that random seeds have on performance, we can fairly evaluate a robust training procedure, rather than a single set of parameters. By understanding the mechanism of that influence, we can go further by developing improved training methods. Naomi Saphra's interests relate to NLP learning dynamics: how models learn to encode linguistic structure, and how we can encode useful inductive biases into the training process. Having earned a PhD from University of Edinburgh, they are now a postdoc at NYU. In their spare time, they play roller derby under the name Gaussian Retribution, do standup comedy, and shepherd programmers who can't type into the world of code dictation.
05 May 2022	Lara J. Martin (University of Pennsylvania)	Dungeons and Discourse: Using Computational Storytelling to Look at Natural Language Use Time: 11:00am-12:00pm Zoom: https://usc.zoom.us/j/93114389765 Recording: https://www.youtube.com/watch?v=hldKVQj863o Abstract: Although we are currently riding a technological wave of personal assistants, many of these agents still struggle to communicate appropriately. In particular, these systems lack coherence, the ability to adapt to novel situations, creativity, emotional understanding, and collaboration. My work focuses on creating open-world storytelling systems and developing agents that leverage speech understanding to communicate with humans more effectively. In this talk, I look at how tabletop roleplaying games such as Dungeons & Dragons can be used as motivation for how to improve conversational systems and understand how people communicate. Lara J. Martin is a 2020 Computing Innovation Fellow (CIFellow) postdoctoral researcher at the University of Pennsylvania working with Dr. Chris Callison-Burch. In 2020, she earned her PhD in Human-Centered Computing from the Georgia Institute of Technology, where she worked with Dr. Mark Riedl. She also has a Masters of Language Technologies from Carnegie Mellon University and a BS in Computer Science & Linguistics from Rutgers University—New Brunswick. Dr. Martin’s work resides in the field of Human-Centered Artificial Intelligence with a focus on natural language applications. They have worked in the areas of automated story generation, speech processing, and affective computing, publishing in top-tier conferences such as AAAI and IJCAI. They have also been featured in Wired and BBC Science Focus magazine .
20 Jan 2022	Faeze Brahman (UCSC)	Modeling Key Narrative Elements for Automatic Story Generation Time: 11:00am-12:00pm Zoom: https://usc.zoom.us/j/93970085438 Recording: https://www.youtube.com/watch?v=aMagIdq1t1E Abstract: Narratives are central to how humans reason, make sense of their experiences and communicate. In essence, narratives are a rich source of day-to-day knowledge and preserve many social and moral norms. Instilling human-like communication, commonsense knowledge, and reasoning capabilities in machines by generating coherent and consistent stories has been a long-standing challenge for AI systems. Despite human-level fluency, the generated stories by recent PLMs tend to be off-topic, not engaging enough, or contain unfaithful information. Towards generating stories with global cohesion, I aim to add controllability as well as modeling and incorporating key narrative elements that contribute to a good story, such as plot, characters, emotions, etc. I will discuss human-in-the-loop story generation using a content-inducing approach to build the "plot" incrementally. Next, I will present a new dataset and tasks for modeling and understanding "characters" in narratives as another key element. I also discuss modeling the "emotional development" of characters in neural storytelling. I will conclude with a discussion on future challenges and directions. Faeze Brahman is a Ph.D. candidate at the University of California, Santa Cruz in Computer Science. Previously, she interned at Microsoft Research, working on controllable grounded text generation; at AI2, working on unsupervised rationale generation for non-monotonic reasoning; and at Xerox PARC on RFP response assistant system. She is broadly interested in natural language understanding and generation with the long-term goal of instilling human-like communication, commonsense knowledge, and reasoning capabilities in machines. Her current research interests include (controllable) text generation, (social) commonsense reasoning, and unsupervised learning.
13 Jan 2022	Kenneth Heafield (University of Edinburgh)	Translating faster than a keystroke and dumpster diving for training data Time: 11:00am-12:00pm Zoom: https://usc.zoom.us/j/93223825200 Recording: https://www.youtube.com/watch?v=pOrJyTDERZU Abstract: Machine translation has a deserved reputation for computational cost. But by burning even more GPU time upfront, we can make inference fast enough to translate thousands of words per second on a desktop or a sentence in under 10 ms on one CPU core. I will talk about optimizations from chopping off transformer heads to writing assembly that make this possible. Software is available at translatelocally.com and coming soon as a Firefox extension. Fast translation was also useful for the ParaCrawl project, where we went dumpster diving on the web for translations and found a few COMET/BLEU points. Kenneth Heafield is a Reader (that's Associate Professor in en-US) at the University of Edinburgh working on fast and often good machine translation. He coordinates the Bergamot project adding local translation to Firefox, ran the ParaCrawl project, and was friendly competition with ISI in MATERIAL. He wrote kenlm to do large language models before they were cool.
02 Dec 2021	Manling Li (UIUC)	Event Extraction and Reasoning in Multimedia News Data Time: 11:00am-12:00pm Recording: https://www.youtube.com/watch?v=MLITKOKIHY0 Abstract: Event understanding is an essential ability for humans to acquire information. With the rise of multimedia, automated event understanding and narration require machines to not only obtain the local structures of events from multimedia data (i.e., who, what, where, and when), but also performs global understanding and inference (i.e., what is likely to happen, and why). However, current event understanding is text-only, local, and lacks reasoning. Real events that are multimedia, inter-connected, and probabilistic. This talk will present Multimedia Event Extraction to extract events and their arguments from multimedia data, and use event knowledge to enhance multimedia pretraining models. Based on the extracted knowledge, I will introduce how to induce event schemas (knowledge of complex event patterns) by learning a temporal graph model. After that, I will talk about how to use event knowledge to support real applications, such as timeline summarization. Manling Li is a fourth-year Ph.D. student at the Computer Science Department of University of Illinois Urbana-Champaign. Manling has won the Best Demo Paper Award at ACL'20, the Best Demo Paper Award at NAACL'21, C.L. Dave and Jane W.S. Liu Award, and has been selected as Mavis Future Faculty Fellow. She is a recipient of the Microsoft Research PhD Fellowship. She has more than 30 publications on knowledge extraction and reasoning from multimedia data. Additional information is available at https://limanling.github.io.
18 Nov 2021	Svitlana Volkova (Pacific Northwest National Lab)	How AI-Driven Augmented Intelligence Transforms Cognitive Security and Nonproliferation Time: 11:00am-12:00pm Recording: https://www.youtube.com/watch?v=INmAXBXucnM Abstract: In this talk I will present several examples of how AI models drive augmented intelligence solutions to transform national security mission spaces focusing on cognitive security and nonproliferation. I will start with cognitive security and discuss deep learning and natural language processing models to detect, characterize, and defend against influence operations, misinformation and disinformation campaigns. Specifically, models capable of detecting information micro-narratives, understanding audiences, characterizing the dynamics of the information environment, and discovering causes and effects to explain why some narratives spread and some do not. I will demo our WatchOwl analytics ( https://watchowl.pnnl.gov/) developed to assist decision makers with real-time situational awareness, track policy compliance and characterize the information environment during COVID-19 infodemic. Next, I will present a suite of AI-powered analytics for nonproliferation developed to detect, anticipate, and reason about proliferation expertise and capability evaluation globally by learning from massive-scale unstructured dynamic real-world data. I will showcase our augmented intelligence tools for expertise search and describe how to go beyond descriptive analytics towards predictive and prescriptive intelligence. Predictive models leverage graph neural networks to anticipate future collaboration patterns, authorship behavior, and capability evolution from dynamic heterogenous graphs. Prescriptive analysis uses ensemble models for causal discovery and inference to enable counterfactual reasoning about expertise and capability development. Our AI-driven augmented intelligence aims not only to provide deeper understanding of how publicly available data could be used to detect, monitor, forecast, and potentially prevent proliferation but also discover real-world examples of patterns and behavior to facilitate the investigation of potentially illicit proliferation activity. Dr. Svitlana Volkova is a Chief Scientist in Decision Intelligence and Analytics in the National Security Directorate of PNNL, where she is leading the lab’s internal Mega-AI investment focusing on developing and deploying massive-scale foundation AI models for science and security mission areas. Since joining PNNL in October 2015, Dr. Volkova was responsible for over $10M in direct sales and has served as Principal Investigator or Project Manager on more than ten internally and externally funded projects, including two DARPA and two NNSA projects focusing on advancing various aspect of Artificial Intelligence (AI) such as natural language processing, machine learning, deep learning, AI test and evaluation, and causal discovery and inference. Svitlana has authored more than 70 peer-previewed conference and journal publications. She serves as senior PC member and area chair for top-tier AI conferences and journals including AAAI, WWW, NeurIPS, ACL, EMNLP, NAACL, ICWSM, Nature Scientific Reports, PNAS and Science Advances. In 2016, she received the prestigious National Security Directorate Author of the Year award for her outstanding number of top-tier publications in AI. In 2019, Dr. Volkova received the Ronald L. Brodzinski Early Career Exceptional Achievement Award for her leadership and scientific contribution to the fields of computational linguistics and computational social science. She received her PhD in Computer Science from Johns Hopkins University where she was affiliated with the Center for Language and Speech Processing and the Human Language Technology Center of Excellence.
14 Oct 2021	Vitaly Feldman (Apple AI Research)	Chasing the Long Tail: What Neural Networks Memorize and Why Time: 11:00am-12:00pm Online Meeting Recording: https://www.youtube.com/watch?v=_R8JFXvjnPc Abstract: Deep learning algorithms that achieve state-of-the-art results on image and text recognition tasks tend to fit the entire training dataset (nearly) perfectly including mislabeled examples and outliers. This propensity to memorize seemingly useless data and the resulting large generalization gap have puzzled many practitioners and is not explained by existing theories of machine learning. We provide a simple conceptual explanation and a theoretical model demonstrating that memorization of outliers and mislabeled examples is necessary for achieving close-to-optimal generalization error when learning from long-tailed data distributions. Image and text data are known to follow such distributions and therefore our results establish a formal link between these empirical phenomena. We then demonstrate the utility of memorization and support our explanation empirically. These results rely on a new technique for efficiently estimating memorization and influence of training data points. Vitaly Feldman is a research scientist at Apple AI Research working on foundations of machine learning and privacy-preserving data analysis. His recent research interests include tools for analysis of generalization, distributed privacy-preserving learning, privacy-preserving optimization, and adaptive data analysis. Vitaly holds a Ph.D. from Harvard (2006, advised by Leslie Valiant) and was previously a research scientist at Google Research (Brain team) and IBM Research - Almaden. His work on understanding of memorization in learning was recognized by the 2021 Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies and his research on foundations of adaptive data analysis was featured in CACM Research Highlights and Science. His works were also recognized by COLT Best Student Paper Award in 2005 and 2013 (student co-authored) and by the IBM Research Best Paper Award in 2014, 2015 and 2016. He served as a program co-chair for COLT 2016 and ALT 2021 conferences and as a co-organizer of the Simons Institute Program on Data Privacy in 2019.
7 Oct 2021	Pei Zhou (USC/ISI)	Robust and Implicit Commonsense Inference for Smooth Communication Time: 11:00am-12:00pm Online Meeting Recording: https://www.youtube.com/watch?v=Gx1wKxqRy1c Abstract: Smooth and effective communication requires the ability to make implicit commonsense inferences that are robust to paraphrases. In this talk, I will mainly introduce my work on examining whether pre-trained language models (PTLMs) can perform robust commonsense inferences and whether response generation (RG) models understand why a response sounds coherent. I will briefly present my other work on learning common sense in dialogue response generation. In the pursuit of advancing fluid human-AI communication, we first propose a new challenge, RICA: Robust Inference using Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations. RICA consists of a set of natural language statements in the "premise-conclusion" format that require reasoning using latent (implicit) commonsense relationships. We formulate these abstract commonsense relations between entities in first-order logic and refer to them as commonsense axioms. We also introduce CEDAR: Common Sense in Dialogue Response Generation. CEDAR is a probing framework that aims to understand why RG models respond as they do by probing RG model’s understanding of commonsense reasoning that elicits proper responses. We formalize the problem by framing commonsense as a latent variable in the RG task and using explanations for responses as textual form of commonsense. Pei Zhou is a third year Ph.D. student in Computer Science at the University of Southern California (USC) and Information Sciences Institute (ISI) co-advised by Professors Xiang Ren and Jay Pujara. Pei graduated with a Bachelor of Science degree in Mathematics of Computation from UCLA in 2019, where he worked closely with Profs. Kai-Wei Chang and Yizhou Sun. In summers of 2021 and 2020, Pei interned as an applied scientist at Amazon Alexa AI, dialogue modeling team. Pei's current research focus lies in commonsense reasoning in dialogue response generation. He is also broadly interested in knowledge grounding in language, robustness, and fairness in NLP.
26 Aug 2021	Shanxiu He (ISI Intern)	From Constrained Event Sequences Generation to Text Generation Time: 11:00am-12:00pm Online Meeting Recording: https://www.youtube.com/watch?v=0ZG9d51emsI Abstract: Understanding events is a critical component of natural language understanding (NLU). A key challenge lies in the fact that events can be described in different granularities. A coarse-grained event (e.g., publishing a paper) can often be disseminated into a fine-grained process of events (e.g., writing the paper, passing the peer review, and presenting at the conference). In this work, we tackle the problem of goal-oriented event process generation, where a task goal event, a process that completes this goal is automatically generated. We tackle this task with a constrained generation approach, inferring unobserved event chains based on existing sequences. To leverage prior knowledge to facilitate commonsense reasoning, we employ pre-trained LMs to generate event sequences and to retrieve original stories. Shanxiu He is an undergraduate at UCLA and a member of UCLANLP lab. Prior to the internship, her research interest focuses on pre-trained Vision-and-Language models such as VisualBERT and ClipBert and their applications to various structural learning tasks. During this internship, she researches on event-centric knowledge representation and specifically event sequences generations.
19 Aug 2021	Shira Wein (ISI Intern) Leo Zeyu Liu (ISI Intern)	Leveraging Abstract Meaning Representations to Amplify the Semantic Information Captured in Transformer Models / Improving Multilingual Encoders with Contrastive Objective and Luna Time: 11:00am-12:00pm Abstracts: Leveraging Abstract Meaning Representations to Amplify the Semantic Information Captured in Transformer Models Though state-of-the-art language models perform well on a variety of natural language processing tasks, these models are not exposed to explicit semantic information. We propose that language models’ ability to capture semantic information can be improved through the inclusion of explicit semantic information in the form of meaning representations, thus improving performance on select downstream tasks. We discuss potential ways to incorporate meaning representations and present our preliminary results. Shira Wein is an intern at ISI and a third-year Ph.D. student at Georgetown University, working on semantic representations and multilingual/cross-lingual applications. Her previous work centers around L2 corpora, Abstract Meaning Representations, and information extraction from design documents, which she published on while interning at the Jet Propulsion Lab. Prior to starting her Ph.D., Shira was an undergraduate at Lafayette College, where she received a B.S. in Computer Science and B.A. in Spanish. Improving Multilingual Encoders with Contrastive Objective and Luna Transformers has been successfully adapted to multilingual pretraining. With only token-level losses like masked language model, transformer encoder could produce good token and sentence representations. We propose to explicitly impose sentence-level objectives using contrastive learning to further improve multilingual encoder. Furthermore, we also propose to merge this modification with what a new transformer architecture, Luna, could offer --- disentanglement between token and sentence representations. We will also discuss ways to evaluate the models and present our experimental progress. Leo Zeyu Liu is a Master student in Computer Science at the University of Washington, advised by Noah A. Smith and Shane Steinert-Threlkeld. His research aims at interpretability, pretraining, and intersection between NLP and Linguistics. He completed his bachelor in Computer Science at the University of Washington.
12 Aug 2021	Sabrina J. Mielke (JHU)	Fair Comparisons and Fundamental Ideas for Open-Vocabulary Generative Language and Translation Models Time: 11:00am-12:00pm Online Meeting Recording: https://www.youtube.com/watch?v=zIP8XMCtHuM Abstract: How can we fairly compare the performance of generative language and translation models on multiple languages? We will see how to use probabilistic and information theory-based measures, first to evaluate (monolingual) open-vocabulary language models by total bits and then, considering the case of Translationese, pondering the meaning of “information” and how to use it to compare machine translation models. In both cases, we get a little glimpse at what linguistic and non-linguistic factors might make languages easier or harder for models. The last part of the talk will (if time permits) propose some somewhat opinionated guidelines for open-vocabulary language modeling, and show work-in-progress in taxonomizing tokenization methods and the literature around open-vocabulary modeling. Sabrina is a PhD student at the Johns Hopkins University and a part-time research intern at HuggingFace, currently researching open-vocabulary language modeling for unit discovery in a variety of typologically varying languages. While her pre-PhD work focused on formal language theory applied to parsing and translation, during her PhD she published on morphology, fair language model comparison, stochastic romanization (at Google AI), and metacognition and calibration for chatbots (at Facebook AI Research), co-organized workshops and shared tasks around morphology and typology, and is currently involved in the BigScience summer of large language models workshop.
29 Jul 2021	Nishant Subramani (AI2)	Fantastic Continuous-valued Sentence Representations and How to Find Them Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/pCKBSPDenpc Abstract: Recently, large pretrained language models have seen tremendous success in few-shot and zero-shot learning settings when given appropriate prompting. For these models to excel in this few-shot paradigm, the provided prompts must condition the language model to produce the desired output sequence. We investigate a more general property of this idea: does there exist a continuous, fixed-length vector that can prompt a pretrained and frozen language model to generate any output sentence of interest? To answer this, we develop a language model agnostic sentence representation discovery algorithm, which learns a continuous-valued, fixed-length vector for a sequence by adding the vector at various locations in the language model and optimizing it to maximize the likelihood of the sequence. Experiments reveal that for nearly all English sentences (> 98%) from different genres and corpora, we can find a vector that recovers our sentence of interest exactly without fine-tuning the underlying language model. In addition, we find that our representations can be learned in real-time and are robust to initialization; convergence requires less than 20 iterations on average using stochastic gradient descent with Adam. This talk will mostly cover: https://arxiv.org/abs/2008.09049 Nishant is a predoctoral young investigator on the AllenNLP team at AI2 working with Matt Peters and part of the Masakhane community. He is broadly interested in sentence representation, ML for social good, and out-of-distribution generalization research. Previously, Nishant has spent time in industry working on document analysis, OCR, fake speech detection, and audio-driven facial animation. He also has worked with Kyunghyun Cho, Sam Bowman, and Doug Downey during his time at NYU and Northwestern on NLP and causality research.
28 Jul 2021	Manuel Ciosici (ISI)	Should PTLMs go to school? Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/ZtM7b0ggfvs Abstract: I will present a couple of research vignettes of the working knowledge inside large pre-trained language models. I will use the vignettes to argue for a new task to measure modern language models’ knowledge and ability to learn from textbooks. Unlike machines, humans do not need to read, for example, all of Wikipedia, to learn. For humans, reading a textbook or a manual is often enough to provide working knowledge on the book’s topic. We propose LEFT, a new task to measure a machine’s capacity to learn from the same textbooks that college graduates use to learn about society and history. The task reveals surprising results for current state-of-the-art language models like T5 and GPTNeo. Manuel Ciosici is a postdoc at ISI Boston (Waltham), working with Ralph Weischedel and Marjorie Friedman on understanding the knowledge inside large language models and putting it to use, for example, in filling in sequences of events. He is also interested in Natural Language Processing for languages other than English and has recently released a large corpus of Danish to support training large language models. Before joining ISI, Manuel received his Ph.D. from Aarhus University in Denmark and was a postdoc at the IT University in Copenhagen.
22 Jul 2021	Kevin Yang (Berkeley)	Predictor-Guided Controlled Generation Time: 11:10am-12:00pm Online Meeting Recording: https://youtu.be/3aT3dNLyzec Abstract: I will present two works on controlled generation, with a shared theme of using predictors to guide a generator. Future Discriminators for Generation (FUDGE) is a flexible and modular method for controlled text generation, which learns an attribute predictor operating on a partial sequence, and uses this predictor's outputs to adjust a base generator's original probabilities with no need for re-training or fine-tuning. Switching domains, I will also present Improving Molecular Design by Stochastic Iterative Target Augmentation, a self-training approach for using a strong attribute predictor to guide the training of a generator in a semi-supervised manner. Overall, we find that these predictor-guided approaches to controlled generation substantially outperform prior methods in several text generation tasks, as well as in molecular design and program synthesis. Kevin is a rising third-year PhD student at UC Berkeley advised by Dan Klein within Berkeley NLP and BAIR. He is broadly interested in AI in the context of language and game-playing, particularly in designing more modular and/or language-controllable agents. He is also interested in neural architectures for structured domains such as chemistry. Previously, he worked with Regina Barzilay during his undergrad and M.Eng. at MIT, on natural language processing and chemistry applications of deep learning, especially graph convolutional networks.
1 Jul 2021	Kalpesh Krishna (UMass Amherst)	Advances in Text Generation and the Perils of its Automatic Evaluation Time: 9:00am-10:00am Online Meeting Recording: https://youtu.be/bv95xMBZO_U Abstract: Recent advances in large-scale language modeling have significantly improved the capability of natural language generation (NLG) systems, opening up several new applications. Unfortunately, evaluating NLG systems remains challenging, making it hard to measure meaningful progress. In this talk I will present our recent efforts in building & evaluating NLG systems for 1) unsupervised sentence-level style transfer; 2) paragraph-length abstractive question answering with the ELI5 dataset. We build NLG systems (using large language models with paraphrase generation & retrieval respectively) that significantly outperform prior state-of-the-art using "standard" automatic metrics. Unfortunately, we discover several issues with the current evaluation setups, including trivial baselines (like input copying) which can game these standard metrics, even outperforming real systems. Along the way I will discuss our efforts towards rectifying these issues, and conclude with a brief mention of other projects working towards more robust NLG evaluation. (Links to the papers this talk will primarily discuss --- https://arxiv.org/abs/2010.05700, https://arxiv.org/abs/2103.06332) Kalpesh is a third year PhD student at UMass Amherst, advised by Prof. Mohit Iyyer. He is primarily interested in natural language generation and the security of NLP systems. Before coming to UMass, he completed a bachelors' degree at IIT Bombay, advised by Prof. Preethi Jyothi. He has also spent time interning at Google, TTI-Chicago and Mozilla. His research is supported by a Google PhD Fellowship, which was awarded in 2021.
17 Jun 2021	Xiang Lisa Li (Stanford)	Prefix-Tuning: Optimizing Continuous Prompts for Generation Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/TwE2m6Z991s Abstract: Fine-tuning is the de facto way of leveraging large pretrained language models for downstream tasks. However, fine-tuning modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. Prefix-tuning draws inspiration from prompting for language models, allowing subsequent tokens to attend to this prefix as if it were “virtual tokens”. We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We show that by modifying only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics that are unseen during training. Xiang Lisa Li is a first-year PhD student in computer science at Stanford University, advised by Percy Liang and Tatsunori Hashimoto. She works on controllable text generation/decoding and efficient adaptation of pre-trained language models. Lisa is supported by a Stanford Graduate Fellowship and is the recipient of an EMNLP Best Paper award.
6 May 2021	Zhou Yu (Columbia University)	Interactively Teaching Machines with Natural Languages Time: 11:00am-12:00pm Online Meeting Recording: https://www.youtube.com/watch?v=rNyOspG27Xs Abstract: Humans routinely learn new concepts using natural language communications, even in scenarios with limited or no labeled examples. Interactions are another key aspect of human learning as well. Learning to ask good questions is a key step towards effective learning. Can machines do the same? In this talk, we will talk about how can a machine learn to ask good natural language questions and plan dynamically what questions to ask next to learn tasks effectively in low-resource settings. Zhou Yu joined the CS department at Columbia University in Jan 2021 as an Assistant Professor. Before that, she was an Assistant Professor at UC Davis. She obtained her Ph.D. from Carnegie Mellon University in 2017. Zhou has built various dialog systems that have a real impact, such as a job interview training system, a depression screening system, and a second language learning system. Her research interest includes dialog systems, language understanding and generation, vision and language, human-computer interaction, and social robots. Zhou received an ACL 2019 best paper nomination, featured in Forbes 2018 30 under 30 in Science, and won the 2018 Amazon Alexa Prize.
15 Apr 2021	R. Thomas McCoy (JHU)	Universal Linguistic Inductive Biases via Meta-Learning Time: 11:00am-12:00pm Online Meeting Recording: https://www.youtube.com/watch?v=teu6Tr7e7Ns Abstract: Despite their impressive scores on NLP leaderboards, current neural models fall short of humans in two major ways: They require massive amounts of training data, and they generalize poorly to novel types of examples. To address these problems, we propose an approach for giving targeted linguistic inductive biases to a model, where inductive biases are factors that affect how a learner generalizes. Our approach imparts inductive biases using meta-learning, a procedure through which the model discovers how to acquire new languages more quickly via exposure to many possible languages. By controlling the properties of the languages used during meta-learning, we can control the inductive biases that meta-learning imparts. Using a case study from phonology, we show how this approach enables faster learning and more robust generalization. Tom McCoy is a PhD student in the Johns Hopkins Cognitive Science department, advised by Tal Linzen and Paul Smolensky. He studies the linguistic abilities of neural networks, focusing on inductive biases (the topic of this talk) as well as compositional structure: How can neural networks use their continuous vector representations to encode phrases and sentences?
18 Mar 2021	Daniel Khashabi (AI2)	Leave No Question Behind!: Broadening the Scope of Machine Comprehension Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/O-ttj6CCb44 Abstract: Despite remarkable progress in building Question Answering (QA) models, the scope of progress remains limited to niche dataset-specific domains. How can we expand the scope of the problems that our models can address? In this talk, I discuss two instances of QA system design that cover a broader range of problems. In the first part, I introduce UnifiedQA, a single model that generalizes to multiple different QA formats (multiple-choice QA, extractive QA, abstractive QA, yes-no QA). Then I will introduce ModularQA, a single system that addresses multiple multi-hop reasoning datasets by leveraging existing single-hop modules (systems). For each system, I present empirical evidence on their better generalization and stronger robustness across datasets and domains. Daniel Khashabi is a “Young Investigator” at Allen Institute for AI, Seattle. His interests lie at the intersection of artificial intelligence and natural language processing. He earned his Ph.D. from the University of Pennsylvania and his undergraduate degree from Amirkabir University of Technology (Tehran Polytechnic).
4 Mar 2021	Jason Weston (FAIR, NYU)	LIGHT: Training agents that can act and speak with other models and humans in a rich text adventure game world Time: 11:00am-12:00pm Online Meeting Recording: live broadcast only Abstract: LIGHT is a rich fantasy text adventure game environment featuring dialogue and actions between agents in the world, which consist of both models and humans. I will summarize work on building this research platform, including crowdsourcing and machine learning to build the rich world environment, training agents to speak and act within it, and deploying the game for lifelong learning of agents by interacting with humans. See https://parl.ai/projects/light/ (and the talk!) for more. Jason Weston is a research scientist at Facebook, NY and a Visiting Research Professor at NYU. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2001, he was a researcher at Biowulf technologies. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning, with a focus on reasoning, memory, perception, interaction and communication. Jason has published over 100 papers, including best paper awards at ICML and ECML, and a Test of Time Award for his work "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", ICML 2008 (with Ronan Collobert). He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. He was listed as the 16th most influential machine learning scholar at AMiner and one of the top 50 authors in Computer Science in Science.
25 Feb 2021	Robin Jia (FAIR --> USC CS)	Insights from Re-evaluating NLP Systems Time: 11:00am-12:00pm Online Meeting Recording: live broadcast only Abstract: Although large pre-trained models have achieved exceptional results on standard NLP benchmarks, it is clear that they are still far from actually understanding natural language. This gap highlights the need to develop and embrace more challenging settings for evaluation. In this talk, I will present work that re-evaluates seemingly high-performing NLP systems and derives insights on how these systems can be further improved. First, we will evaluate models under extreme label imbalance, a phenomenon that creates unavoidable train-test mismatch. Here, collecting training data adaptively leads to dramatic improvements over static data collection. Second, we will grapple with adversarial perturbations--label-preserving transformations that can trigger surprising model errors. We will develop training methods to make models certifiably robust to combinatorially large families of perturbations. Finally, we will assess the utility of automatic evaluation metrics for comparing NLG systems. We will show that metrics can be surprisingly competitive with evaluation schemes that rely on human annotators, and highlight reduction of statistical bias against particular NLG systems as an important future direction. Robin Jia will be an Assistant Professor in Computer Science at the University of Southern California starting in Fall 2021. Currently, he is a visiting researcher at Facebook AI Research, working with Luke Zettlemoyer and Douwe Kiela. He received his Ph.D. in Computer Science from Stanford University, where he was advised by Percy Liang. He is interested broadly in natural language processing and machine learning, with a particular focus on building NLP systems that are robust to distribution shift. Robin’s work has received an Outstanding Paper Award at EMNLP 2017 and a Best Short Paper award at ACL 2018.
4 Feb 2021	Jesse Thomason (Amazon Alexa AI --> USC CS)	From Human Language to Agent Action Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/vSLk1T48WTo Abstract: There is a usability gap between manipulation-capable robots and helpful in-home digital agents. Dialog-enabled smart assistants have recently seen widespread adoption, but these cannot move or manipulate objects. By contrast, manipulation-capable and mobile robots are still largely deployed in industrial settings and do not interact with human users. Language-enabled robots can bridge this gap---natural language interfaces help robots and non-experts collaborate to achieve their goals. Navigation in unexplored environments to high-level targets like "Go to the room with a plant" can be facilitated by enabling agents to ask questions and react to human clarifications on-the-fly. Further, high-level instructions like "Put a plate of toast on the table" require inferring many steps, from finding a knife to operating a toaster. Low-level instructions can serve to clarify these individual steps. Through two new datasets and accompanying models, we study human-human dialog for cooperative navigation, and high- and low-level language instructions for cooking, cleaning, and tidying in interactive home environments. These datasets are a first step towards collaborative, dialog-enabled robots helpful in human spaces. Jesse is starting as an Assistant Professor at the University of Southern California in fall 2021, and is currently hanging out at Amazon Alexa AI for a year. Recently, he was a postdoctoral researcher working with Luke Zettlemoyer at the University of Washington. His research focuses on language grounding and natural language processing applications for robotics (RoboNLP). Key to this work is using dialog with humans to facilitate both robot task execution and learning to enable lifelong improvement of robots’ language understanding capabilities. He has encouraged work in RoboNLP through workshop organization at NLP, robotics, and vision conference venues.
28 Jan 2021	Sandeep Soni (Georgia Tech)	Computational Models of Language Change from Diachronic Text Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/4OKrh5uWEr4 Abstract: Natural languages undergo change over time. Modeling language change can help uncover latent social factors that modulate change; in particular, to answer key social science questions such as who talks to whom, who leads, and who follows. In this talk, I'll present our work over the years that uses timestamped (also called diachronic) text to link social influence or leadership with language change, combining methods from computational linguistics, machine learning, and network science. First, I show that network influence exerted through strong ties leads to higher adoption of non-standard terms on Twitter's communication network. Next, I propose a method to identify documents at the forefront of semantic change and further show that such documents are more influential in terms of the citations they get across two domains -- a collection of legal documents and a set of scientific abstracts. Finally, I introduce a method to induce a semantic leadership network between 19th century abolitionist newspapers that helps emphasize quantitatively the important role played by Black and women editors in the abolitionist movement. The combination of these studies demonstrate the variety of domains in which the study of language change is relevant and how computational modeling can help determine the latent influence and leadership relations between sources of interest. Sandeep Soni is a PhD candidate at the Georgia Institute of Technology. His research interests are in computational social science and digital humanities with an emphasis on using text as data and computational linguistic methods. His PhD thesis is focussed on developing methods to use language change as a way to systematically infer latent influence relationships from text data. He is currently on the job market looking for postdoc or permanent research positions.
21 Jan 2021	Christopher Chu (DiDi)	Historical Applications of NLP Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/gLKMthceNIQ Abstract: NLP has vastly improved in the last ten years. These advances can help us better understand history by deciphering old languages and text whose meaning we couldn't understand before. In this talk, I'll be presenting a couple applications of how we can use these techniques. First, we use a known-plaintext attack to decrypt a dictionary code used c.1800 for secret messages between US Army General James Wilkinson and agents of the Spanish Crown. Then, I'll present a method for deciphering Chinese writing, with potential applications to other logographic languages. Christopher Chu is a research engineer working on NLP at DiDi AI Labs. We're located about 500 feet away in the other Marina Tower. He has a BASc in robotics engineering, but decided that people are more interesting to talk to than robots. At DiDi Labs, we primarily work on dialog and translation, but branch out into fun projects like these.
14 Jan 2021	Yezhou Yang (ASU)	Visual Recognition beyond Appearances, and its Robotic Applications Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/lfb5GP-HNRE Abstract: The goal of Computer Vision, as coined by Marr, is to develop algorithms to answer What are Where at When from visual appearance. The speaker, among others, recognizes the importance of studying underlying entities and relations beyond visual appearance, following an Active Perception paradigm. This talk will present the speaker's efforts over the last decade, ranging from 1) reasoning beyond appearance for visual question answering, image understanding and video captioning tasks, through 2) temporal knowledge distillation with incremental knowledge transfer, till 3) their roles in a Robotic visual learning framework via a Robotic Indoor Object Search task. The talk will also feature the Active Perception Group (APG)’s ongoing projects (NSF RI, NRI and CPS, DARPA KAIROS, and Arizona IAM) addressing emerging challenges of the nation in autonomous driving, AI security and healthcare domains, at the ASU School of Computing, Informatics, and Decision Systems Engineering (CIDSE). Yezhou Yang is an Assistant Professor at School of Computing, Informatics, and Decision Systems Engineering, Arizona State University. He is directing the ASU Active Perception Group. His primary interests lie in Cognitive Robotics, Computer Vision, and Robot Vision, especially exploring visual primitives in human action understanding from visual input, grounding them by natural language as well as high-level reasoning over the primitives for intelligent robots. Before joining ASU, Dr. Yang was a Postdoctoral Research Associate at the Computer Vision Lab and the Perception and Robotics Lab, with the University of Maryland Institute for Advanced Computer Studies. He is a recipient of Qualcomm Innovation Fellowship 2011, the NSF CAREER award 2018 and the Amazon AWS Machine Learning Research Award 2019. He receives his Ph.D. from University of Maryland at College Park, and B.E. from Zhejiang University, China.
10 Dec 2020	Thomas Wolf (HuggingFace)	An Introduction to Transfer Learning in NLP and HuggingFace Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/sKfRlPD8DhU Abstract: In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by HuggingFace, in particular our Transformers, Tokenizers and Datasets libraries and our models. Thomas Wolf is co-founder and Chief Science Officer of HuggingFace. His team is on a mission to catalyze and democratize NLP research. Prior to HuggingFace, Thomas gained a Ph.D. in physics, and later a law degree. He worked as a physics researcher and a European Patent Attorney.
3 Dec 2020	Elena Voita (University of Edinburgh, University of Amsterdam)	Information-Theoretic Probing with Minimum Description Length Time: 10:00am-11:00pm Online Meeting Recording: https://youtu.be/CakeVH_svdo Abstract: How can you know whether a model has learned to encode a linguistic property? The most popular approach to measure how well pretrained representations encode a linguistic property is to use the accuracy of a probing classifier (probe). However, such probes often fail to adequately reflect differences in representations, and they can show different results depending on probe hyperparameters. As an alternative to standard probing, we propose information-theoretic probing which measures minimum description length (MDL) of labels given representations. In addition to probe quality, the description length evaluates “the amount of effort” needed to achieve this quality. We show that (i) MDL can be easily evaluated on top of standard probe-training pipelines, and (ii) compared to standard probes, the results of MDL probing are more informative, stable, and sensible. Elena (Lena) Voita is a Ph.D. student at the University of Edinburgh and University of Amsterdam supervised by Ivan Titov and Rico Sennrich, and is currently a Facebook PhD Fellow. Her research focuses on document-level neural machine translation, as well as on understanding what and how neural models learn. Previously, she was a research scientist at Yandex Research and worked closely with the Yandex Translate team. She also teaches NLP at the Yandex School of Data Analysis; the extended public version of (a part of) this course is available at "NLP Course For You".
12 Nov 2020	John Hewitt (Stanford)	The Unreasonable Syntactic Expressivity of RNNs Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/ZtuRyNKCu60 Abstract: In 2015, Andrej Karpathy posted a now famous blog post on The Unreasonable Effectiveness of Recurrent Neural Networks. To summarize this sense of wonder, Karpathy emphasized: 'We’ll train RNNs to generate text character by character and ponder the question “how is that even possible?”' RNNs empirically generate natural language with high syntactic fidelity, but their success is not well-understood theoretically. In this talk, I'll provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. I'll introduce Dyck-(k,m), the language of well-nested brackets (of k types) and m-bounded nesting depth, reflecting the bounded memory needs and long-distance dependencies of natural language syntax. The best previously known results use O(k^(m/2)) memory (hidden units) to generate these languages. I'll prove that an RNN with O(m log k) hidden units suffices, an exponential reduction in memory, by an explicit construction. Finally, I'll show that no algorithm, even with unbounded computation, can suffice with o(m log k) hidden units. John is a 3rd year PhD student in computer science at Stanford University, advised by Chris Manning and Percy Liang. He works on understanding and improving how unsupervised neural networks learn and process human languages. He is supported by a National Science Foundation Graduate Research Fellowship, and is the recipient of an EMNLP Runner Up Best Paper award.
15 Oct 2020	Wei Xu (Georgia Tech)	Natural Language Understanding for Noisy Text Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/pr1HGaE5dAE Abstract: In this talk, I will present some of our recent work that focuses on understanding the meaning of user-generated texts and extracting useful information. First, I will discuss the design of neural pairwise ranking models, and their applications to semantic analysis of hashtags. Our best ranking model, that incorporates multi-task learning and Gaussian feature vectorization, can segment hashtags into meaningful word sequences (e.g., #dtlaartsdistrict → “DTLA Arts District”) with an over 95% accuracy. Second, I will highlight the importance of training customized BERT models for noisy text and zero-shot transfer learning. I will provide two case studies: (1) BERTOverflow model we trained on in-domain data that significantly outperforms off-the-shelf BERT on the new StackOverflow NER corpus; (2) GigaBERT, a bilingual BERT we developed specifically for English and Arabic, which performs better than Google’s multilingual BERT and Facebook’s XLM-RoBERTa for cross-lingual information extraction. I will conclude with our new work on annotating data and training automatic models to extract COVID-19 related events from Twitter. Wei Xu is an assistant professor in the School of Interactive Computing at the Georgia Institute of Technology. Before joining Georgia Tech, she was an assistant professor at The Ohio State University since 2016. Xu’s research interests are in natural language processing, machine learning, and social media. Her recent work focuses on language generation, semantics, information extraction, and reading assistive technology. She has received the NSF CRII Award, Best Paper Award at COLING, CrowdFlower AI for Everyone Award, and Criteo Faculty Research Award. She recently served as a senior area chair for ACL 2020 and an area chair, workshop chair, and publicity chair for EMNLP and NAACL conferences. She has been co-organizing the Workshop on Noisy User-generated Text annually.
27 Aug 2020	Ugur Yavuz (ISI intern)	Translation of Asylum Testimonials from Low-Resource Languages Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/xL9kMb2lTa4 Abstract: Many asylum seekers along the southern border of the United States speak low-resource languages that are not available on commercial translation services. As a result, the translation of their testimonials poses a real challenge to the legal system, as well as non-governmental organizations. We will discuss potential techniques that would facilitate this task, such as transfer learning, domain adaptation, and corpus expansion, and also explain our work in compiling bilingual corpora for Mixtec and Kanjobal languages. Ugur Yavuz is a summer intern working at the Natural Language Group with Dr. Jon May. He is a rising senior at Dartmouth College studying computer science and mathematics.
20 Aug 2020	Weiqiu You (ISI intern)	Qualitative Analysis of Unsupervised Neural Machine Translation Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/TawjeAeLbuA Slides: here Abstract: Supervised neural machine translation (SNMT) models built on all the available parallel data result in higher BLEU on test sets than unsupervised neural machine translation (UNMT) models leveraging all available monolingual data. Recently, UNMT models such as XLM and MASS are reducing the gap with SNMT in terms of BLEU. Prior work has shown that linguistic and domain dissimilarity often hinder UNMT’s performance. The question we're asking in this investigation is, when SNMT and UNMT do have comparable BLEU, do they exhibit qualitative differences, and can these differences be detected by other metrics? Weiqiu You is a rising first year PhD student at University of Pennsylvania who just graduated with MS from UMass Amherst. Her interests include machine translation and natural language processing in general.
13 Aug 2020	Naitian Zhou and Omar Shaikh (ISI interns)	Improving Dialogue Agents with a Social Dimension / Understanding Monolingual Pre-Training for Bilingual Models Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/QS3NVSbunt8 Abstracts: Improving Dialogue Agents with a Social Dimension: The dialogue problem is challenging because a proper response must be conditioned on many different factors: knowledge about the language, knowledge about the world, knowledge about self, and knowledge about the speaker, to name a few. Prior research has focused on language modeling and "persona" modeling, encoding facts about the dialogue agent. In this project, I try to focus on an alternative dimension to dialogue: how do we condition the way we converse, based on our understanding of ourselves and our social relationship with our dialogue partner?Naitian Zhou is a rising junior at the University of Michigan studying computer science and data science. His interests include computational social science and natural language processing. Understanding Monolingual Pre-Training for Bilingual Models: Monolingual embeddings (from models like BERT) are known to help on a variety of downstream tasks in a straightforward way. Usually, these embeddings are plug-and-play — initializing models with BERT embeddings or using them as input representations result in increased model performance. However, supervised NMT tasks don’t appear to benefit equally from traditional pretraining methods. We explore what makes NMT (bilingual) and BERT/LM (monolingual) representations different on several probing tasks, and why certain training methods succeed in extracting performance from BERT embeddings on MT tasks.Omar is a Summer 2020 intern with Dr. Jon May. He’s also a rising junior at Georgia Institute of Technology.
30 Jul 2020	Emily M. Bender (UW)	A Typology of Ethical Risks in Language Technology with an Eye Towards Where Transparent Documentation Can Help Time: 11:00am-12:00pm Online Meeting Recording: https://youtu.be/WIChyzXVNLY Abstract: People are impacted by language technology in various ways: as direct users of the technology (by choice or otherwise), and indirectly, including as the subject of queries, as the subject of stereotypes, and as contributors to corpora. In these roles, risks are borne differentially by different speaker populations, depending on how well the technology works for their language varieties. This talk explores strategies for mitigating these risks based on transparent documentation of training data. Emily M. Bender is a Professor of Linguistics at the University of Washington and the Faculty Director of the Professional Masters in Computational Linguistics (CLMS) program. Her research interests include the interaction of linguistics and NLP, computational semantics, multilingual NLP, and the societal impact of language technology.
16 Jul 2020	Mohit Iyyer (UMass Amherst)	Towards interactive story generation Time: 11:00am-12:00pm Online Meeting Recording: https://www.youtube.com/watch?v=YM-ia3oYjnE Abstract: Story generation is difficult to computationally formalize and evaluate, and there are many important questions to ask when tackling the problem. What should we consider as the base unit of a story (e.g., a sentence? a paragraph? a chapter?) What kind of data should we use to train these models (novels? short stories? overly simplistic mechanically-turked paragraphs?) Is any model architecture currently capable of producing long-form narratives that have some semblance of coherent discourse structure, such as plot arcs and character development? When evaluating the outputs of our models, can we do better than just asking people to rate the text based on vaguely defined properties such as "enjoyability"? In this talk, I'll discuss my lab's ongoing work on story generation by introducing a new dataset and evaluation method that we hope will spur progress in this area, and also describing fine-tuning strategies for large-scale Transformers that produce more coherent and stylistically-consistent stories. A major bottleneck of these models is their memory and speed inefficiency; as such, I'll conclude by discussing heavily-simplified Transformer language models that make training less expensive without sacrificing output quality. Mohit Iyyer is an assistant professor in computer science at the University of Massachusetts Amherst. His research focuses broadly on designing machine learning models for discourse-level language generation (e.g., for story generation and machine translation), and his group also works on tasks involving creative language understanding (e.g., modeling fictional narratives and characters). He is the recipient of best paper awards at NAACL (2016, 2018) and a best demo award at NeurIPS 2015. He received his PhD in computer science from the University of Maryland, College Park in 2017, advised by Jordan Boyd-Graber and Hal Daumé III, and spent the following year as a researcher at the Allen Institute for Artificial Intelligence.
21 May 2020	Liang Huang (Oregon State University/Baidu)	Fighting COVID-19 using Linear-Time Algorithms from Computational Linguistics Time: 11:00am-12:00pm Online Meeting: live broadcast only Abstract: To defeat the current COVID-19 pandemic, which has already claimed 250,000+ deaths as of early May, a messenger RNA (mRNA) vaccine has emerged as a promising approach thanks to its rapid and scalable production and non-infectious and non-integrating properties. However, designing an mRNA sequence to achieve high stability and protein yield remains a challenging problem due to the exponentially large search space (e.g., there are 10^632 possible mRNA sequence candidates for the spike protein of SARS-CoV-2).We describe two on-going efforts at solving this problem, both using linear-time algorithms from my group inspired by my earlier work in parsing. On one hand, the Eterna OpenVaccine project from Stanford Medical School takes a crowd-sourcing approach to let game players all over the world design stable sequences. To evaluate sequence stability (in terms of free energy), they use LinearFold from my group (2019) since it’s the only linear-time RNA folding algorithm available (which makes it the only one fast enough for COVID-scale genomes). On the other hand, we take a computational approach to directly search for the optimal sequence in this exponentially large space via dynamic programming. It turns out this problem can be reduced to a classical problem in formal language theory and computational linguistics (intersection between CFG and DFA), which can be solved in O(n^3) time, just like lattice parsing for speech. In the end, we can design the optimal mRNA vaccine candidate for SARS-CoV-2 spike protein in 1 hour with exact search, or just 11 minutes with a beam of 1000 at the cost of only ~0.6% loss in energy. Liang Huang is currently an Assistant Professor of EECS at Oregon State University and Distinguished Scientist (part-time) at Baidu Research USA. Before that he was Assistant Professor for three years at the City University of New York (CUNY) and a part-time Research Scientist with IBM's Watson Group. He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. Most of his work develops fast algorithms and provable theory to speedup large-scale natural language processing, structured machine learning, and computational structural biology. He has received a Best Paper Award at ACL 2008 (sole author), a Best Paper Honorable Mention at EMNLP 2016, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), a Yahoo! Faculty Research Award (2015), and a University Teaching Prize at Penn (2005). He was a keynote speaker at ACL 2019. His recent interest is to apply computational linguistics to computational biology, where he works on RNA folding & design using his earlier work on incremental parsing.
27 Feb 2020	Ellie Pavlick (Brown University)	What do (and should) language models know about language? Time: 11:00am-12:00pm Location: live-streamed from the MdR 6th floor conference room: #689 (This talk will be given from the ISI Boston office.) Online Meeting Recording: https://bluejeans.com/s/l_z4X/ Abstract: Natural language processing has become indisputably good over the past few years. We can perform retrieval and question answering with purported super-human accuracy, and can generate full documents of text that seem good enough to pass the Turing test. In light of these successes, it is tempting to attribute the empirical performance to a deeper "understanding" of language that the models have acquired. Measuring natural language "understanding", however, is itself an unsolved research problem. In this talk, I will discuss several studies which attempt to illuminate what it is that state-of-the-art models of language are capturing. I will argue that current SOTA models have made significant progress in modeling linguistic form, but have completely failed to capture linguistic meaning. I will discuss recent work which investigates the effect of dataset skew on representation learning, as well as work investigating inconsistencies in human's own representations of "meaning". Ellie Pavlick is an Assistant Professor of Computer Science at Brown University. She received her PhD from University of Pennsylvania in 2017, where her focus was on paraphrasing and lexical semantics. Ellie's current research is on cognitively-inspired approaches to language acquisition, focusing on grounded language learning and on the emergence of structure (or lack thereof) in neural language models. Ellie leads the language understanding and representation (LUNAR) lab, which collaborates with Brown's Robotics and Visual Computing labs and with the Department of Cognitive, Linguistic, and Psychological Sciences.
05 Feb 2020	Gabriel Kahn (USC Annenberg)	Why journalism is broken and how data can help fix it Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/FVVU4/ Abstract: Pizza gate, Russian trolls, deep fakes. We live in an information swamp and it sucks. At its core, the crisis in journalism is about a shifting economic model that has made it difficult for legitimate news organizations to survive. The consequences are dire. But harnessing data in the right ways can provide vital information to communities and can help news organizations do more with less. The future of a healthy news environment requires collaboration between news, data and computer science. Gabriel Kahn outlines the current problems and some potential solutions. Gabriel Kahn has worked as a newspaper correspondent and editor for three decades, including 10 years at The Wall Street Journal, where he served as Los Angeles bureau chief, deputy Hong Kong bureau chief and deputy Southern Europe bureau chief, based in Rome. He has reported from more than a dozen countries on three continents. He joined USC Annenberg in the fall of 2010, where he jointly runs the Media, Economics and Entrepreneurship program. The goal of M{2e} is to bolster students’ understanding of economics and encourage innovation and experimentation with new ideas in communication and journalism. In addition to his teaching and reporting work, Kahn studies the economic models of the news industry and consults with startups and established news companies on strategy. In 2018, he launched Crosstown, which has pioneered a new approach to local news through data.
30 Jan 2020	Sarah Wiegreffe (Georgia Tech)	BlackBox NLP: What are we looking for, and where do we stand? Time: 11:00am-12:00pm Location: 10th floor conference room: #1014 Online Meeting Recording: https://bluejeans.com/s/NqZd0 Abstract: The widespread adoption of deep learning in NLP has led to a new state-of-the-art on many tasks. Neural nets are complex systems that are hard to interpret, leaving researchers with little ability to say why their model is doing so well. As a consequence, interpretability and explainability hold a new relevance. In this talk, I will present case studies in the subfield of interpretability for NLP, as well as the research goals of the subtopics that fall under this umbrella. I will present a case-study of the necessary conditions for attention modules to be used for explaining classification model predictions, as well as a clinical application of attention mechanisms in physician decision support. I will conclude by discussing future directions, including in natural language explanations for reinforcement learning systems. Sarah Wiegreffe is a Computer Science PhD student in the School of Interactive Computing at Georgia Tech. Her research lies at the intersection of machine learning and NLP, with a particular interest in interpretability, explainability, and model robustness. In the past, she has worked in clinical applications of NLP and ML. During her PhD, she has held research internships at Google AI and Sutter Health. She obtained her B.S. in Data Science from the College of Charleston. In her free time, Sarah enjoys rock climbing, traveling, and rock music.
16 Jan 2020	Samee Ibraheem (UC Berkeley)	Leveraging Context for Natural Language Processing Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/GDWdF/ Abstract: Neural networks have allowed for a host of advances in natural language processing, from text classification to machine translation. However, the effects of contextual information, such as speaker gender or race, on NLP tasks is still an active area of research. In this talk, we first explore how such context can affect an NLP system’s accuracy. Next, we investigate methods for incorporating additional context into a machine translation system. Finally, we investigate methods for collecting additional contextual information when the signal is sparse. Samee Ibraheem is a PhD student in Computer Science at UC Berkeley working with John DeNero on incorporating context for NLP applications. He received a Bachelor’s in Neurobiology from Harvard University and is currently supported by an NSF Fellowship.
12 Dec 2019	Soravit (Beer) Changpinyo (Google AI)	Tightly Connecting Vision and Language Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/unxRW/ Abstract: Remarkable progress has been made at the intersection of vision and language. While showing great promise, current vision and language models do not function well in the wild. In this talk, I will present our recent efforts aiming to bridge this gap for the tasks of image captioning and visual question answering. I will first describe several practical limitations of current benchmarks as a yardstick for grounded language understanding and visual reasoning. Then, I will describe our simple approach to transfer learning, where we leverage large-scale ultrafine-grained data as a means to address the long tail of language. Finally, given these results, I will outline future directions and survey a variety of on-going work along the line of making vision and language research useful. Soravit (Beer) Changpinyo is a Software Engineer at Google AI. His research interests are in machine learning with applications to computer vision and natural language processing. Prior to joining Google, he was a PhD candidate and an Annenberg Fellow at the University of Southern California, advised by Fei Sha.
21 Nov 2019	Hoifung Poon (MSR/UW)	Machine Reading for Precision Medicine Time: 11:00am-12:00pm Location: 10th floor conference room: #1014 Online Meeting Recording: here Abstract: The advent of big data promises to revolutionize medicine by making it more personalized and effective, but big data also presents a grand challenge of information overload. For example, tumor sequencing has become routine in cancer treatment, yet interpreting the genomic data requires painstakingly curating knowledge from a vast biomedical literature, which grows by thousands of papers every day. Electronic medical records contain valuable information to speed up clinical trial recruitment and drug development, but curating such real-world evidence from clinical notes can take hours for a single patient. NLP can play a key role in interpreting big data for precision medicine. In particular, machine reading can help unlock knowledge from text by substantially improving curation efficiency. However, standard supervised methods require labeled examples, which are expensive and time-consuming to produce at scale. In this talk, I'll present Project Hanover, where we overcome the annotation bottleneck by combining deep learning with probabilistic logic, and by exploiting self supervision from readily available resources such as ontologies and databases. This enables us to extract knowledge from millions of publications, reason efficiently with the resulting knowledge graph by learning neural embeddings of biomedical entities and relations, and apply the extracted knowledge and learned embeddings to supporting precision oncology. Hoifung Poon is the Director of Precision Health NLP at Microsoft Research and an affiliated professor at the University of Washington Medical School. He leads Project Hanover, with the overarching goal of advancing machine reading for precision health, by combining probabilistic logic with deep learning. He has given tutorials on this topic at top conferences such as the Association for Computational Linguistics (ACL) and the Association for the Advancement of Artificial Intelligence (AAAI). His research spans a wide range of problems in machine learning and natural language processing (NLP), and his prior work has been recognized with Best Paper Awards from premier venues such as the North American Chapter of the Association for Computational Linguistics (NAACL), Empirical Methods in Natural Language Processing (EMNLP), and Uncertainty in AI (UAI). He received his PhD in Computer Science and Engineering from University of Washington, specializing in machine learning and NLP.
24 Oct 2019	Wenjuan Han (ShanghaiTech University/UCLA)	Neural Unsupervised Dependency Parsing Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/6zkTG Abstract: Dependency parsing, as an essential task in Natural Language Processing, is a key step in analyzing and understanding texts. Most of the previous work on unsupervised dependency parsing is based on generative models. In order to effectively induce a grammar, various knowledge priors and inductive biases are manually encoded in the learning process. However, these knowledge priors and inductive biases are mostly local features that can only be defined by experts. Another disadvantage of generative models comes from the context-freeness, which limits the information available to dependencies in a sentence. We proposed several approaches to unsupervised dependency parsing that automatically capture useful information: correlations between tokens, context information and multilingual similarity. I am now a visiting student in UCLA and expected to graduate in January 2020. I will get the PHD Degree at ShanghaiTech University, where I was advised by Kewei Tu. I did my bachelors at the Nanjing University of Posts and Telecommunications. My current research focuses on the study of probabilistic/neural models and follows two researching paths: (1) grammar-based representation, inference, and unsupervised learning; and (2) the application of unsupervised learning approaches with hidden variables in a variety of artificial intelligence areas including grammar induction, POS induction and perceptual grouping.
17 Oct 2019	Peng Qi (Stanford)	Answering Complex Questions in the Wild Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/BmBxP Abstract: Open-domain question answering (open-domain QA) systems greatly improve our access to the knowledge in large text corpora, but most previous work on this topic lacks the ability to perform multi-hop reasoning, limiting how textual knowledge can actually be used. For instance, to answer "What's the Aquaman actor's next movie?", one needs to reason about the entity "Jason Momoa" instead of just comparing the question to a local context, making the task more challenging.In this talk, I will present our recent work on enabling text-based multi-hop reasoning in open-domain question answering. First, I will talk about how we collected one of the first datasets on multi-hop QA, making it possible to train and evaluate systems to perform explainable complex reasoning among millions of Wikipedia articles. Then, I will present a QA system we developed on this dataset. Iterating between finding supporting facts and reading the retrieved context, our model outperforms all previously published approaches, many of which based on powerful pretrained neural networks like BERT. As our model generates natural language queries at each step of its retrieval, it is also readily explainable to humans, and allows for intervention when it veers off course. I will conclude by comparing our model to other recent developments on this dataset, and discussing future directions on this problem. Peng Qi is a PhD student in Computer Science at Stanford University. His research interests revolve around building natural language processing systems that better bridge between humans and the large amount of (textual) information we are engulfed in. Specifically, he is interested in building knowledge representations, (open-domain) question answering, explainable models, and multi-lingual NLP systems. He is also interested in linguistics, and builds tools for linguistic structure analysis applicable to many languages.
03 Oct 2019	Alex Spangher (USC/ISI)	NLP in Computational Journalism: notes from the field at the New York Times Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Slides: here Abstract: "Computational journalism" is an emerging field seeking to enhance traditional journalistic processes -- story finding, production, distribution, funding, evaluation and security -- using computational techniques. Such advances comes at a critical time: journalists' ability to play a watchdog role in society is severely endangered by industry contraction and budget shortfalls.Many exciting developments in computational journalism require research in NLP. In this talk, I'll discuss some prior work at the New York Times, including generative localized news articles, human-in-the-loop chat-bots, personalization, and coverage-pattern modeling. I'll also discuss long-term challenges we identified in a broad survey article done at Stanford University this summer, as well as my current research directions here at USC. Alex Spangher was a data scientist at the New York Times, where he worked with journalists and newsroom stakeholders on data science to improve journalism coverage and revenue. He interned at Microsoft Research and spent a year as a PhD student at Carnegie Mellon University before transferring to the University of Southern California to work with Emilio Ferrara and Nanyun Peng. He has an M.S. in Journalism and an M.S. in Data Science from Columbia University, and received his B.S. from Columbia as well, in neuroscience and computer science. He enjoys playing classical piano and double bass.
26 Sep 2019	Nicola De Cao (University of Amsterdam)	Question Answering by Reasoning Across Documents with Graph Convolutional Networks Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/sgwNF Slides: here Abstract: Most research in reading comprehension has focused on answering questions based on individual documents or even single paragraphs. We introduce a neural model which integrates and reasons relying on information spread within documents and across multiple documents. We frame it as an inference problem on a graph. Mentions of entities are nodes of this graph while edges encode relations between different mentions (e.g., within- and cross-document co-reference). Graph convolutional networks (GCNs) are applied to these graphs and trained to perform multi-step reasoning. Our Entity-GCN method is scalable and compact, and it achieves state-of-the-art results on a multi-document question answering dataset, WikiHop (Welbl et al., 2018). Nicola is a first-year Ph.D. candidate at the Institute for Logic, Language and Computation (ILLC) at the University of Amsterdam. He is appointed at the School of Informatics at the University of Edinburgh supervised by Prof. Ivan Titov, and he is part of the EdinburghNLP group. Nicola’s work focuses on unstructured Machine Reading Comprehension also know as Question Answering.
19 Sep 2019	Seraphina Goldfarb-Tarrant (USC/ISI)	Practical Workshop on AllenNLP Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/OUQy4 Slides: here Abstract: This is a practical talk that highlights some of the areas where AllenNLP (the NLP research library) excels, and gives a look at new features being released. It will focus on the ways that use of the library can enable reproducibility, interpretability, and visualizations. Seraphina Goldfarb-Tarrant is a Research Programmer at ISI, doing work in NLG. She finished her Master's at the University of Washington, and is beginning her PhD at the University of Edinburgh.
05 Sep 2019	Denis Emelin (University of Edinburgh) and Prince Wang	More than the sum of their parts: Translating idioms without destroying their meaning Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/8Lu7w/ Abstract: Translating idioms is hard. As low-frequency linguisticevents with a non-compositional meaning, idiomatic expressions are atodds with contemporary neural machine translation methods.Accordingly, the literal translation of idiomatic phrases which failsto preserve their semantic content represents an often observedfailure case in NMT models. To facilitate future work on idiomtranslation, the current project sets out to compile a large-coverage,multilingual corpus of parallel sentences containing idiomaticexpressions, augmented with their respective monolingual definitions.With this resource in hand, we next aim to propose models which caneffectively exploit idiom definitions to avoid literal translationerrors. As part of the evaluation of the constructed corpus, wedemonstrate that idioms continue to pose a veritable challenge forstate-of-the-art NMT models. Denis is a second-year PhD candidate at the University ofEdinburgh, advised by Dr. Rico Sennrich. His background is in machinetranslation, natural language understanding, and linguistics.
15 Aug 2019	Xusen Yin (USC/ISI)	Comprehensible context-driven text game playing Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/MNCuL/ Abstract: In order to train a computer agent to play a text-based computer game, we must represent each hidden state of the game. A Long Short-Term Memory (LSTM) model running over observed texts is a common choice for state construction. However, a normal Deep Q-learning Network (DQN) for such an agent requires millions of steps of training or more to converge. As such, an LSTM-based DQN can take tens of days to finish the training process. Though we can use a Convolutional Neural Network (CNN) as a text-encoder to construct states much faster than the LSTM, doing so without an understanding of the syntactic context of the words being analyzed can slow convergence. In this paper, we use a fast CNN to encode position- and syntax-oriented structures extracted from observed texts as states. We additionally augment the reward signal in a universal and practical manner. Together, we show that our improvements can not only speed up the process by one order of magnitude but also learn a superior agent. Xusen Yin is a 3rd-year Ph.D. student at USC/ISI, advised by Dr. Jonathan May.
08 Aug 2019	Ekaterina Shutova (University of Amsterdam)	Modelling the interplay of metaphor and emotion, and a peek at the underlying cognitive mechanisms Time: 11:00am-12:00pm Location: 10th floor conference room: CR# 1014 Multipurpose Room Online Meeting Recording: https://bluejeans.com/s/Mws0S/ Abstract: Besides making our thoughts more vivid and filling our communication with richer imagery, metaphor plays a fundamental structural role in our cognition, helping us to organise and project knowledge. For example, when we say "a well-oiled political machine", we view the concept of political system in terms of a mechanism and transfer inferences from the domain of mechanisms onto our reasoning about political processes. Much previous research on metaphor in linguistics and psychology suggests that metaphorical phrases tend to be more emotionally evocative than their literal counterparts. In this talk, I will present our recent work investigating the relationship between metaphor and emotion within a computational framework, by proposing the first joint model of these phenomena. We experiment with several multitask learning architectures for this purpose and demonstrate that metaphor identification and emotion prediction mutually benefit from joint learning, advancing the state of the art in both of these tasks.In the second half of the talk, I will discuss how general-purpose semantic representations can be used to better understand metaphor processing in the human brain. In a series of experiments, we evaluate a range of semantic models (word embeddings, compositional models, visual and multimodal models) in their ability to decode brain activity associated with reading of literal and metaphoric sentences. Our results point to interesting differences in the processing of metaphorical and literal language. Bio: Ekaterina Shutova is an Assistant Professor at the Institute for Logic, Language and Computation at the University of Amsterdam. Her research is in the area of natural language processing with a specific focus on computational semantics, figurative language processing, multilingual NLP and cognitively-driven semantics. Previously, she worked at the University of Cambridge Computer Laboratory and the International Computer Science Institute and the Institute for Cognitive and Brain Sciences at the University of California, Berkeley. She received her PhD in Computer Science from the University of Cambridge in 2011.
11 Jul 2019	Sandy LaTourrette (Northwestern University)	Learning from language: Intersections of infant and machine learning Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting: https://bluejeans.com/499369716 Abstract: Young infants and machine learning algorithms face many of the same fundamental challenges when learning language. Learners often must identify referents in complex scenes, determine the relevance of different object features, and extend labels from previously viewed referents to new ones. In this talk, I examine several ways that infants solve these problems. In some cases, our work reveals word-learning mechanisms that are specific to the infant learner, such as labels' influence on object representations. However, other word-learning mechanisms, like infants' capacity for semi-supervised learning, show striking similarities in the ways that infants and machines overcome the challenges of language learning. Both similarities and differences offer intriguing opportunities for mutually informative, interdisciplinary exchanges. Bio: Sandy LaTourrette is a 5th-year Ph.D. student in Cognitive Psychology at Northwestern University, advised by Dr. Sandra Waxman. He is a NSF Graduate Fellow, and his work focuses on the interactions of language learning and cognition across human development.
21 Jun 2019	Malihe Alikhani (Rutgers University)	Multimodal Communication: A Discourse Approach Time: 3:00-4:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/ypWYX/ Abstract: The integration of textual and visual information is fundamental to the way people communicate. My hypothesis is that despite the differences of the visual and linguistic communication, the two have similar intentional, inferential and contextual properties, which can be modeled with similar representations and algorithms. I present three successful case studies where natural language techniques provide a useful foundation for supporting user engagement with visual communication. Finally, I propose using these findings for designing interactive systems that can communicate with people using a broad range of appropriate modalities. Bio: Malihe Alikhani is a 4th year Ph.D. student in the department of computer science at Rutgers University, advised by Prof. Matthew Stone. She is pursuing a certificate in cognitive science through the Rutgers Center for Cognitive Science and holds a BA and MA in mathematics. Her research aims at teaching machines to understand and generate multimodal communication. She is the recipient of the fellowship award for excellence in computation and data sciences from Rutgers Discovery Informatics Institute in 2018.
30 May 2019	Manuel Rafael Ciosici (Aarhus University)	Quantifying the morphosyntactic content of Brown Clusters Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/diiPj/ Abstract: Brown and Exchange word clusters have long been successfully used as word representations in Natural Language Processing (NLP) systems. Their success has been attributed to their seeming ability to represent both semantic and syntactic information. Using corpora representing several language families, we test the hypothesis that Brown and Exchange word clusters are highly effective at encoding morphosyntactic information. Our experiments show that word clusters are highly capable of distinguishing Parts of Speech. We show that increases in Average Mutual Information, the clustering algorithms' optimization goal, are highly correlated with improvements in encoding of morphosyntactic information. Our results provide empirical evidence that downstream NLP systems addressing tasks dependent on morphosyntactic information can benefit from word cluster features. Manuel is a soon-to-graduate Ph.D. student at Aarhus University, Denmark. His research is focused on understanding the kinds of information encoded in word representations and how that information can be used in downstream NLP applications.
16 May 2019	Martha Palmer (University of Colorado Boulder )	The Blocks World Redux Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/ny0H5 Abstract: This talk will discuss some of the challenges arising from the Blocks World scenario in the DARPA Communicating with Computers program. The actions are very simple and concrete, such as "Add a block to the tower." However, even in this restricted world, getting the appropriate contextual interpretation of a sentence can be challenging, especially with respect to spatial relations. The talk will review the progress we have made so far on collecting useful data and attempting to achieve the goal of contextual interpretation. To do this we bring to bear many resources, ranging from AMR parsing to Jerry Hobbs's axiomatization of object and action definitions, to our recent merger of James Pustejovsky's Generative Lexicon (GL), and VerbNet (VN), i.e., GL-VN. A main focus of the talk will be the ways in which we are expanding AMR annotation to encompass spatial relations and also the recovery of implicit arguments. Both expansions play into the task of maintaining a discourse structure. The talk will conclude with both short term and long term goals for our collaborations on CwC, with respect to both AMR and GL-VN. Bio: Martha Palmer is the Helen & Hubert Croft Endowed Professor of Engineering in the Computer Science Department, and an Arts & Sciences Professor of Distinction in the Linguistics Department, at the University of Colorado, with a split appointment. She is also an Institute of Cognitive Science Faculty Fellow, a co-Director of CLEAR and an Association of Computational Linguistics (ACL) Fellow. She won an Outstanding Graduate Advisor 2014 Award, a Boulder Faculty Assembly 2010 Research Award and was the Director of the 2011 Linguistics Institute in Boulder, CO. Her research is focused on capturing elements of the meanings of words that can comprise automatic representations of complex sentences and documents in English, Chinese, Arabic, Hindi, and Urdu, funded by DARPA and NSF. A more recent focus is the application of these methods to biomedical journal articles and clinical notes, funded by NIH, and the geo- and bio-sciences, funded by NSF. She co-edits LiLT, Linguistic Issues in Language Technology, and has been a co-editor of the Journal of Natural Language Engineering and on the CLJ Editorial Board. She is a past President of ACL, past Chair of SIGLEX, was the Founding Chair of SIGHAN, and has well over 250 peer-reviewed publications.
15 May 2019	Dan Roth (University of Illinois at Urbana-Champaign) Co-op with USC/ISI AI Seminar	Natural Language Understanding with Incidental Supervision Time: 11:00am-12:00pm Location: CR#1014 Online Meeting Recording: https://bluejeans.com/s/1v5oL/ Abstract: The fundamental issue underlying natural language understanding is that of semantics - there is a need to move toward understanding natural language at an appropriate level of abstraction, beyond the word level, in order to support knowledge extraction, natural language understanding, and communication.Machine Learning and Inference methods have become ubiquitous in our attempt to induce semantic representations of natural language and support decisions that depend on it. However, learning models that support high level tasks is difficult, partly since most they are very sparse and generating supervision signals for it does not scale. Consequently, making natural language understanding decisions, which typically depend on multiple, interdependent, models, becomes even more challenging. I will describe some of our research on developing machine learning and inference methods in pursue of understanding natural language text. My focus will be on identifying and using incidental supervision signals in pursuing a range of semantics tasks, and I will point to some of the key challenges as well some possible directions for studying this problem from a principled perspective. Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, and a Fellow of the AAAS, the ACM, AAAI, and the ACL.In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized "for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning."Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Until February 2017 Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR).Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.
18 Apr 2019	Kenton Murray (University of Notre Dame)	Learning Neural Network Hyperparameters for Machine Translation Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/6_8UO Abstract: In recent years, Neural Networks have reached state-of-the-art performance in a variety of NLP tasks, including Machine Translation. However, these methods are very sensitive to selecting optimal hyperparameters. Frequently this is done by large scale experimentation -- often through grid or random searches. However, this is computationally expensive and time consuming. In this talk, I will present a few methods for learning hyperparameters during the training process. Thus, instead of training multiple networks with different hyperparameters, we only need to train one network without large grid search experiments. Our methods yield comparable, and often better, results, but at a faster experimentation rate. Bio: Kenton Murray is a 5th year PhD Candidate at the University of Notre Dame working with David Chiang on methods for improving Neural Machine Translation for Low-Resource and Morphologically Rich Language Pairs. Prior to ND, he was a Research Associate at the Qatar Computing Research Institute focusing on Arabic Machine Translation. He holds a Master's in Language Technologies from Carnegie Mellon University and a Bachelor's in Computer Science from Princeton University.
07 Mar 2019	Rebecca Hwa (University of Pittsburgh)	Separating the Sheep from the Goats: On Recognizing the Literal and Figurative Usages of Idioms Time: 11:00am-12:00pm Location: 6th floor conference room: #689 Online Meeting Recording: There is no recording of this talk. Abstract: Typically, we think of idioms as colorful expressions whose literal interpretations don't match their underlying meaning. However, many idiomatic expressions can be used either figuratively or literally, depending on their contexts. In this talk, we survey both supervised and unsupervised methods for training a classifier to automatically distinguish usages of idiomatic expressions. We will conclude with a discussion about some potential applications. Bio: Rebecca Hwa is an Associate Professor in the Department of Computer Science at the University of Pittsburgh. Her recent research focuses on understanding persuasion from a computational linguistics perspective. Some of her recent projects include: modeling student behaviors in revising argumentative essays, identifying symbolisms in visual rhetorics, and understanding idiomatic expressions. Dr Hwa is a recipient of the NSF CAREER Award. Her work has also been supported by NIH and DARPA.
09 Nov 2018	Waleed Ammar (AI2)	Taming the scientific literature: progress and challenges Time: 3:00 pm - 4:00 pm Location: 6th floor conference room: #689 Online Meeting Recording: https://bluejeans.com/s/vEMME/ Abstract: The magnitude and growth of the scientific literature can be overwhelming even for experienced researchers. Three years ago, the Allen Institute for Artificial Intelligence launched semanticscholar.org to understand and address the information needs of researchers. In this talk, I start by highlighting some of the lessons we learned from our 2M monthly actively users, and some of the key differences between academic and industrial research. Then, I describe three complementary directions for analyzing the scientific literature at scale. In the first direction, we extract meaningful structures such as entities, relationships and figures. In the second direction, we establish connections between different artifacts in the literature to facilitate navigation and enable complex querying capabilities. In the third direction, we try to address controversial questions in the literature by quantifying observable attributes at a large scale. I conclude with a short list of under-explored research opportunities with high potential in this domain. Bio: Waleed Ammar is a senior research scientist at the Allen Institute for Artificial Intelligence where he leads the research efforts in the semantic scholar project. He is interested in developing NLP models with practical applications, especially in the scientific and medical domains and other data-constrained scenarios. Before pursuing his PhD at Carnegie Mellon University, Waleed an engineer at the machine translation group at MSR, a web developer at eSpace technologies, and a teaching assistant at Alexandria University. Waleed co-hosts the NLP highlights podcast with Matt Gardner.
01 Nov 2018	Robin Jia (Stanford)	Exposing Brittleness in Reading Comprehension Systems Time: 11:00am-12:00pm Location: 6th floor conference room #689 Abstract: Reading comprehension systems that answer questions over a context passage can often achieve high test accuracy, but they are frustratingly brittle: they often rely heavily on superficial cues, and therefore struggle on out-of-domain inputs. In this talk, I will describe our work on understanding and challenging these systems. First, I will show how to craft adversarial reading comprehension examples by adding irrelevant distracting text to the context passage. Next, I will present the newest version of the SQuAD dataset, SQuAD 2.0, which tests whether models can distinguish answerable questions from similar but unanswerable ones. Finally, I will share some observations from our recent attempts to use reading comprehension systems as a natural language interface for building other NLP systems. Bio: Robin Jia is a fifth-year PhD student advised by Percy Liang at Stanford University. He is an NSF Graduate Fellow, and has received Outstanding Paper and Best Short Paper Awards from EMNLP and ACL, respectively.
25 Oct 2018	Scott Yih (AI2)	Conversational Question Answering Time: 11:00am-12:00pm Location: 6th Floor Conference Room - 689 Online Meeting Recording: https://bluejeans.com/s/jIoDx/ Abstract: Humans seek information in a conversational manner, by asking follow-up questions for additional information based on what they have already learned. In this talk, I will first introduce the task of sequential question answering [1], which aims to fulfill user's information need by answering a series of simple, but interdependent questions regarding a given table. Treating this task as a semantic parsing problem, we developed a policy shaping mechanism that incorporates prior knowledge and an update equation that generalizes three different families of learning algorithms [2]. After that, I will then talk briefly about QuAC, a new dataset for Question Answering in Context. QuAC targets the scenario where the information source is unstructured text [3] and thus can be viewed as a conversational machine comprehension task. New, unpublished model ideas will also be discussed. [1] Mohit Iyyer, Wen-tau Yih and Ming-Wei Chang. Search-based Neural Structured Learning for Sequential Question Answering. ACL-2017. [2] Dipendra Misra, Ming-Wei Chang, Xiaodong He and Wen-tau Yih. Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations. EMNLP-2018.[3] Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang and Luke Zettlemoyer. QuAC: Question Answering in Context. EMNLP-2018. Bio: Scott Wen-tau Yih is a Principal Research Scientist at Allen Institute for Artificial Intelligence (AI2). His research interests include natural language processing, machine learning and information retrieval. Yih received his Ph.D. in computer science at the University of Illinois at Urbana-Champaign. His work on joint inference using integer linear programming (ILP) has been widely adopted in the NLP community for numerous structured prediction problems. Prior to joining AI2, Yih has spent 12 years at Microsoft Research, working on a variety of projects including email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous representations and neural network models, with applications in knowledge base embedding, semantic parsing and question answering. Yih received the best paper award from CoNLL-2011, an outstanding paper award from ACL-2015 and has served as area co-chairs (HLT-NAACL-12, ACL-14, EMNLP-16,17,18), program co-chairs (CEAS-09, CoNLL-14) and action/associated editors (TACL, JAIR) in recent years. He is also a co-presenter for several tutorials on topics including Semantic Role Labeling (NAACL-HLT-06, AAAI-07), Deep Learning for NLP (SLT-14, NAACL-HLT-15, IJCAI-16), NLP for Precision Medicine (ACL-17, AAAI-18).
12 Oct 2018	Ndapa Nakashole (UC San Diego)	Mapping Functions for Multilingual Word Embeddings Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Online Meeting Recording: https://bluejeans.com/s/KMXdx/ Abstract: Inducing multilingual word embeddings bylearning a linear map between embeddingspaces of different languages achieves remarkableaccuracy on related languages. However,accuracy drops substantially when translatingbetween distant languages. Giventhat languages exhibit differences in vocabulary,grammar, written form, or syntax, onewould expect that embedding spaces of differentlanguages have different structures especiallyfor distant languages. I will present our work on understandingthe behavior of linear maps learned by word translation methods.Additionally, I will present some initial solutions to the shortcomings of such linear maps. Bio :Ndapa Nakashole is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. Prior to UCSD, she was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University. She obtained her PhD from Saarland University, Germany, for research carried out at the Max Planck Institute for Informatics in Saarbrücken. She completed undergraduate studies in Computer Science at the University of Cape Town, South Africa.
04 Oct 2018	Siva Reddy (Stanford University)	CoQA: A Conversational Question Answering Challenge Time: 11:00 am - 12:00 pm Location: 6th Floor Conference Room - 689 Online Meeting Recording: https://bluejeans.com/s/iHu_F/ Abstract: Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. In this talk, I will present our work on CoQA, a novel dataset for building Conversational Question Answering systems. CoQA contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.1%, which is 23.7 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at https://stanfordnlp.github.io/coqa/ Bio : Siva Reddy is a postdoc in Computer Science at Stanford University working with Prof. Christopher Manning. His research focuses on enabling natural communication between humans and machines. Prior to the postdoc, he was a Google PhD Fellow at the University of Edinburgh under the supervision of Prof. Mirella Lapata and Prof. Mark Steedman.
14 Sep 2018	Daniel Fried (UC Berkeley)	Pragmatic Models for Generating and Following Grounded Instructions Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Online Meeting Recording: https://bluejeans.com/s/AScm4 Abstract: To generate language, we model what to say, why not also model how listeners will react? We show how pragmatic inference can be used to both generate and interpret natural language instructions for complex, sequential tasks. Our pragmatics-enabled models reason about how listeners will react upon hearing instructions, and reason counterfactually about why speakers produced the instructions they did. We find that this inference procedure improves state-of-the-art listener models (at correctly interpreting human instructions) and speaker models (at generating instructions correctly interpreted by humans) in diverse settings, including navigating through real-world indoor environments. Bio: Daniel Fried is a PhD student at UC Berkeley, working with Dan Klein on grounded semantics and structured prediction in natural language processing. Previously, he received a BS from the University of Arizona and an MPhil from the University of Cambridge. His work has been supported by a Churchill Scholarship, NDSEG Fellowship, Huawei / Berkeley AI Fellowship, and Tencent Fellowship.
07 Sep 2018	Vivek Srikumar (The University of Utah)	Natural Language Processing in the Wild: Opportunities & Challenges Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Online Meeting Recording: https://bluejeans.com/s/lxaUI/ Abstract: Natural language processing (NLP) sees potential applicability in abroad array of user-facing applications. To realize this potential,however, we need to address several challenges related torepresentations, data availability and scalability.In this talk, I will discuss these concerns and how we may overcomethem. First, as a motivating example of NLP's broad reach, I willpresent our recent work on using language technology to improvemental health treatment. Then, I will focus on some of thechallenges that need to be addressed, with a specific focus onscalability. The motivating question is: How can we systematicallyspeed up the entire NLP pipeline without sacrificing accuracy? Astwo concrete answers to this question, I will describe our recentresults that show techniques for rethinking feature extraction andinference to make trained classifiers significantly faster. Bio: Vivek Srikumar is an assistant professor in the School of Computingat the University of Utah. He obtained his Ph.D. from the Universityof Illinois at Urbana-Champaign in 2013 and was a post-doctoralscholar at Stanford University. His research lies in the areas ofnatural learning processing and machine learning and has primarilybeen driven by questions arising from the need to learn structuredrepresentations of text using little or indirect supervision and toscale NLP to large problems. His work has been published in variousAI, NLP and machine learning venues and received the best paperaward at EMNLP 2014. His work has been supported by grants andawards from NSF, BSF, Google and Intel.
24 Aug 2018	Mozhdeh Gheini, Xinyu Wang (ISI intern)	T1. Constraints for Transfer Learning for Neural Machine Translation T2. Say Yes-and: Building a Specialized Corpus for Digital Improvised Comedy Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Can we detect the parts responsible for a generic behavior in a neural model to transfer it to another? In this talk, we first see why this might be a good idea, especially for low-resource machine translation. Then we focus on our approach to isolating a behavior. In our case, we specifically focus on coverage during machine translation. We present our results across different languages that show how neural models try to ensure coverage. Bio: Mozhdeh Gheini is a last-semester Computer Science master's student at USC Viterbi School of Engineering. At ISI NLP Group, she works on improving neural low-resource machine translation under the supervision of Jonathan May. She will be applying for Ph.D. programs this Fall.Abstract: In improvised comedy, saying "yes, and.. " is a rule-of-thumb that suggests that one person should accept the other person's offer (yes), and then add related information on top of that (and). Collecting a "yes, and.." corpus is not only helpful for building an improv agent, but can also be used for building conversational skill training tool, improving a dialogue system, etc. I will discuss the methods we have used for building such a dataset, data we have got so far and future considerations. Bio: Xinyu is a 2018 summer intern working with Dr. Jonathan May and Dr. Nanyun Peng on computerized improvised comedy. She will be joinging the Language Technologies Institute at Carnegie Mellon University in 2018 fall.
17 Aug 2018	Ronald Cardenas (ISI intern)	Decipherment for Universal Language Tools: a case study for Unsupervised Part of Speech Induction Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Unsupervised Part of Speech induction can be viewed as a two-steps task.The first step infers a sequence of states, while the second step maps this sequence to anactual Part-of-Speech sequence at training or testing time.Hence, this last step requires reference tagged data, a luxury low-resource target languages might not have.In this talk, we present an alternative approach to the second step, modeling it as a decipherment problemin which the ciphered text is the sequence of states and the original text we want to recover is the POS sequence.This approach requires no reference data in the target language and allows to leverage POS sequences inmuch richer languages.Our experiments show that our approach benefits the most from simple strategies for inferring state sequences, such as Brown clustering.This allow our method to obtain reasonable performance in low-resource and limited-time scenarios. Bio: Ronald Cardenas is a Master's student in the Language and Communication Technologies programme at Charles University in Prague. His research interests span morphological analysis and parsing of low-resource languages.At ISI, he works with Jonatan May on developing universal language tools.
10 Aug 2018	James Mullenbach (ISI intern)	Reasoning about objects, their components, and their descriptors Time: 3:00 pm - 4:00 pm Location: 6th Floor Conference Room [689] Abstract: How do adjectives project from a noun to its parts and other aspects? If a motorcycle is red, are its wheels red? Is a sharp knife’s handle sharp? Questions like this are common sense for humans, using our understanding of the world, but difficult for computers. I will describe our process for curating and annotating a large dataset consisting of related object pairs and adjectives, and a set of experiments that aim to discover the extent to which modern approaches can learn these relationships from purely textual sources. Bio: James is a Master’s Student in Computer Science at the Georgia Institute of Technology, where he works on machine learning for healthcare using written electronic health record notes. At ISI, he is working with Jonathan May and Nanyun Peng on building a dataset and models for textual commonsense reasoning. He aims to work on NLP and ML in industry for a year or so before applying for PhD programs.
27 Jul 2018	Matt Gardner (AI2)	A Tale of Two Question Answering Systems Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The path to natural language understanding goes through increasingly challenging question answering tasks. I will present research that significantly improves performance on two such tasks: answering complex questions over tables, and open-domain factoid question answering. For answering complex questions, I will present a type-constrained encoder-decoder neural semantic parser that learns to map natural language questions to programs. For open-domain factoid QA, I will show that training paragraph-level QA systems to give calibrated confidence scores across paragraphs is crucial when the correct answer-containing paragraph is unknown. I will conclude with some thoughts about how to combine these two disparate QA paradigms, towards the goal of answering complex questions over open-domain text. Bio: Matt Gardner is a research scientist at the Allen Institute for Artificial Intelligence (AI2), where he has been exploring various kinds of question answering systems. He is the lead designer and maintainer of the AllenNLP toolkit, a platform for doing NLP research on top of pytorch. Matt is also the co-host of the NLP Highlights podcast, where, with Waleed Ammar, he gets to interview the authors of interesting NLP papers about their work. Prior to joining AI2, Matt earned a PhD from Carnegie Mellon University, working with Tom Mitchell on the Never Ending Language Learning project.
20 Jul 2018	Wei-Lun (Harry) Chao (USC --> OSU)	Visual Question Answering: the Good, the Bad, and the Ugly Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Visual question answering (Visual QA) requires comprehending and reasoning with both visual and language information, a characteristic ability that AI should strive to achieve. Merely in the past three years, over a dozen datasets have been released, together with many learning-based models that have been narrowing the gap between the humans’ performance and the machines’. On one popular dataset VQA, the state-of-the-art model achieves 71.4% accuracy, just 17% shy of that by humans.While seemingly remarkable, it needs a deeper investigation on what knowledge the machine actually learns—does it understand the multi-modal information? Or it relies on and over-fits to the incidental dataset statistics. Moreover, current experimental setups mainly focus on training and testing within the same dataset. It is unclear how the learned model can be applied to the real environment where both the visual and language data might have mismatch.In this talk, I will present our recent studies to answer these questions. We show that the dataset design has a significant impact on what a model learns. Specifically, the resulting model can ignore the visual information, the question, or both while still doing well on the task. We thus propose automatic procedures to remedy such design deficiencies. We then show that the mismatch in language hinders transferring a learned model across datasets. To this end, we develop a domain adaptation algorithm for Visual QA to facilitate knowledge transfer. Finally, I will present a probabilistic framework of Visual QA algorithms to effectively leverage the answer semantics, drastically increasing the transferability. I will conclude the talk with future directions to advance Visual QA. Bio: Wei-Lun (Harry) Chao is a Computer Science PhD candidate at University of Southern California, working with Fei Sha. His research interests are in machine learning and its applications to computer vision, artificial intelligence, and health care. His recent work has focused on transfer learning toward vision and language understanding in the wild. His earlier research includes work on probabilistic inference, structured prediction for video summarization, and face understanding. He will be joining The Ohio State University as an assistant professor in 2019 Fall, following a one-year postdoc at Cornell University.
22 Jun 2018	Rui Yan (Peking University)	Recent Advances and Challenges on Human-Computer Conversational Systems Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Automatic human-computer conversational systems have attracted great attention from both industry and academia. Intelligent products such as XiaoIce (by Microsoft) have been released, while tons of Artificial Intelligence companies have been established. We see that the technology behind the conversational systems is accumulating and now open to the public gradually. With the investigation of researchers, conversational systems are more than scientific fictions: they become real. I would review the recent development of human-computer conversational systems, especially the significant changes brought by deep learning techniques. In the meanwhile, I would share some work conducted by our group. Bio: Dr. Rui Yan is an assistant professor at Peking University, an adjunct professor in Central China Normal University and Central University of Finance and Economics, and he was a Senior Researcher at Baidu Inc. He has investigated several open-domain conversational systems and dialog systems in vertical domains. Till now he has published more than 50 highly competitive peer-reviewed papers. He serves as a (senior) program committee member of several top-tier venues (such as KDD, SIGIR, ACL, WWW, IJCAI, AAAI, CIKM, EMNLP).
11 May 2018	Yulia Tsvetkov (CMU)	Towards Flexible but Controllable Language Generation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: To enable naturalistic, context-aware language generation, the underlying models must be flexible but controllable. They must be flexible enough to account for the rich linguistic diversity of data that the model generates and conditions on. On the other hand, generation must be controlled, to lexicalize the same meaning differently, depending upon the social and the situational context. I'll present model-based approaches to multilingual language modeling and open-vocabulary machine translation, aiming at making language generation more flexible by relaxing the (unreasonable but prevalent in the literature) assumption that a model's vocabulary is constrained to a particular set of most frequent words in a particular language. Then, I'll present an approach to controllable text generation that modulates social variables in generated text. I’ll conclude with an overview of ongoing research projects. Bio: Yulia Tsvetkov is an assistant professor in the Language Technologies Institute at Carnegie Mellon University. Her research interests lie at or near the intersection of natural language processing, machine learning, linguistics, and social science. Her current research projects focus on multilinguality (e.g., open-vocabulary machine translation, polyglot models, entrainment in code-switching), controllable text generation, automated negotiation, and NLP for social good (e.g., identification of microaggressions and dehumanization in online interactions, identification of misinformation and agenda-setting in news, predicting scientific misconduct). Prior to joining LTI, Yulia was a postdoc in the department of Computer Science at Stanford University; she received her PhD from Carnegie Mellon University.
04 May 2018	Marjan Ghazvininejad (ISI)	Neural Creative Language Generation (PhD Defense Practice Talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Natural language generation (NLG) is a well-studied and still very challenging field in natural language processing. One of the less studied NLG tasks is the generation of creative texts such as jokes, puns, or poems. Multiple reasons contribute to the difficulty of research in this area. First, no immediate application exists for creative language generation. This has made the research on creative NLG extremely diverse, having different goals, assumptions, and constraints. Second, no quantitative measure exists for creative NLG tasks. Consequently, it is often difficult to tune the parameters of creative generation models and drive improvements to these systems. Lack of a quantitative metric and the absence of a well-defined immediate application makes comparing different methods and finding the state of the art an almost impossible task in this area. Finally, rule-based systems for creative language generation are not yet combined with deep learning methods. Rule-based systems are powerful in capturing human knowledge, but it is often too time-consuming to present all the required knowledge in rules. On the other hand, deep learning models can automatically extract knowledge from the data, but they often miss out some essential knowledge that can be easily captured in rule-based systems.In this work, we address these challenges for poetry generation, which is one of the main areas of creative language generation. We introduce password poems as a new application for poetry generation. These passwords are highly secure, and we show that they are easier to recall and preferable compared to passwords created by other methods that guarantee the same level of security. Furthermore, we combine finite-state machinery with deep learning models in a system for generating poems for any given topic. We introduce a quantitative metric for evaluating the generated poems and build the first interactive poetry generation system that enables users to revise system generated poems by adjusting style configuration settings like alliteration, concreteness and the sentiment of the poem. The system interface also allows users to rate the quality of the poem. We collect users’ rating for poems with various style settings and use them to automatically tune the system style parameters. In order to improve the coherence of generated poems, we introduce a method to borrow ideas from existing human literature and build a poetry translation system. We study how poetry translation is different from translation of non-creative texts by measuring the language variation added during the translation process. We show that humans translate poems much more freely compared to general texts. Based on this observation, we build a machine translation system specifically for translating poetry which uses language variation in the translation process to generate rhythmic and rhyming translations. Bio: Marjan Ghazvininejad is a Ph.D. student at ISI working with Professor Kevin Knight.
27 Apr 2018	Jay Pujara (ISI)	Extracting and Aligning Quantitative Data with Text Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Quantitative data, such as time series and numerical attribute data, often play a crucial role in understanding the world and validating factual statements. Unfortunately, quantitative datasets are often expressed in diverse formats that exhibit significant variation, posing difficulties to machine reading approaches. Furthermore, the scant context that accompanies these data often makes it difficult to relate the quantitative data with broader ideas. Finally, the vast amount of quantitative data make it difficult for humans to find, understand, or access. In this talk, I highlight my recent work, which focuses on developing general approaches to extracting quantitative data from structured sources, creating high-level descriptions of these sources, and aligning quantitative data with textual and ontological labels. Bio: Jay Pujara is a research scientist at the University of Southern California's Information Sciences Institute whose principal areas of research are machine learning, artificial intelligence, and data science. He completed a postdoc at UC Santa Cruz, earned his PhD at the University of Maryland, College Park and received his MS and BS at Carnegie Mellon University. Prior to his PhD, Jay spent six years at Yahoo! working on mail spam detection, and he has also worked at Google, LinkedIn and Oracle. Jay is the author of over thirty peer-reviewed publications and has received three best paper awards for his work. He is a recognized authority on knowledge graphs, and has organized the Automatic Knowledge Base Construction (AKBC) and Statistical Relational AI (StaRAI) workshops, presented tutorials on knowledge graph construction at AAAI and WSDM, and had his work featured in AI Magazine. For more information, visit https://www.jaypujara.org
20 Apr 2018	Mark Yatskar (AI2)	Language as a Scaffold for Visual Recognition Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk we propose to use natural language as a guide for what people can perceive about the world from images and what ultimately machines should aim to see as well. We discuss two recent structured prediction efforts in this vein: scene graph parsing in Visual Genome, a framework derived from captions, and visual semantic role labeling in imSitu, a formalism built on FrameNet and WordNet. In scene graph parsing, we examine the problem of modeling higher order repeating structure (motifs) and present new state-of-the-art baselines and methods. We then look at the problem semantic sparsity in visual semantic role labeling: infrequent combinations of output semantics are frequent. We present new compositional and data-augmentation methods for dealing with this challenge, significantly improving on prior work. Bio: Mark Yatskar is a post-doc at the Allen Institute for Artificial Intelligence and recipient of their Young Investigator Award. His primary research is in the intersection of language and vision, natural language generation, and ethical computing. He received his Ph.D. from the University of Washington with Luke Zettlemoyer and Ali Farhadi and in 2016 received the EMNLP best paper award and his work has been featured in Wired and the New York Times.
13 Apr 2018	Yuanhang Su (USC)	Finding memory in time Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: For a large number of natural language processing (NLP) problems, we are concerned with finding semantic patterns from input sequences. In recurrent neural network (RNN) based approach, such pattern is “encoded” in a vector called hidden state. Since Elman’s “Finding structure in time” published in 1990, it has long been believed that the “magic power” of RNN’s memory, which is enclosed inside the hidden state, can handle very long sequences. Yet besides some experimental observations, there is no formal definition of RNN’s memory, let alone a rigid mathematical analysis of how RNN’s memory forms.This talk will focus on understanding memory from two viewpoints. The first viewpoint is that memory is a function that maps certain elements in the input sequences to the current output. Such definition, for the first time in literature, allows us to do detailed analysis of the memory of simple RNN (SRN), long short-term memory (ELSTM), and gated recurrent unit (GRU). It also opens the door for further improving the existing RNN basic models. The end results are the proposal of a new basic RNN model called extended LSTM (ELSTM) with outstanding performance for complex language tasks, and a new macro RNN model called dependent bidirectional RNN (DBRNN) with smaller cross entropy than bidirectional RNN (BRNN) and encoder-decoder (enc-dec) models.The second viewpoint is that memory is a compact representation of sparse sequential data. From this perspective, the process of generating hidden state of RNN is simply dimension reduction. Thus, method like principal component analysis (PCA) which does not require labels for training becomes attractive. However, there are two known problems in implementing PCA for NLP problems: the first is computational complexity; the second is vectorization of sentence data for PCA. To deal with this problem, an efficient dimension reduction algorithm called tree -structured multi-linear PCA is proposed. Bio: Yuanhang Su received the dual B.S. degree in Electrical Engineering & Automation and Electronic & Electrical Engineering from University of Strathclyde, Glasgow, U.K. and Shanghai University of Electric Power, Shanghai, China, respectively in 2009, and the M.S. degree in Electrical Engineering from the University of Southern California, Los Angeles, CA, in 2010. From 2011 to 2015, he worked as image/video/camera software and algorithm engineer for a Los Angeles startup named Exaimage, Shanghai Aerospace Electronics Technology Institute in China and Huawei Technology in China consecutively. He joined MCL lab in 2016 spring, and is currently pursing his Ph.D. in computer vision, natural language processing and machine learning.
30 Mar 2018	Mohit Iyyer (AI2, UMass Amherst)	Generating Adversarial Examples with Syntactically Controlled Paraphrase Networks Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Many datasets for natural language processing problems lack linguistic variation, which hurts generalization of models trained on them. Recent research has shown that it is possible to break many learned models by evaluating them on adversarial examples, which are generated by manually introducing lexical, pragmatic, and syntactic variation to existing held-out examples from the data. Automating this process is challenging, as input semantics must be preserved in the face of potentially large sentence modifications. In this talk, I will focus specifically on syntactic variation in discussing our recent work on syntactically controlled paraphrase networks (SCPN) for adversarial example generation.Given a sentence and a target syntactic form (e.g., a constituency parse), an SCPN is trained to produce a paraphrase of the sentence with the desired syntax. We show it is possible to create training data for this task by first doing backtranslation at a very large scale, and then using a parser to label the syntactic transformations that naturally occur during this process. Such data allows us to train a neural encoder-decoder model with extra inputs to specify the target syntax. A combination of automated and human evaluations show that SCPNs generate paraphrases that almost always follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems. Furthermore, they are more capable of generating syntactically adversarial examples that both (1) "fool" pretrained models and (2) improve the robustness of these models to syntactic variation when used for data augmentation. Bio: Mohit Iyyer will be joining UMass Amherst as an assistant professor in Fall 2018. Currently, he is a Young Investigator at the Allen Institute of Artificial Intelligence; prior to that, he received a Ph.D. from the Department of Computer Science at the University of Maryland, College Park, advised by Jordan Boyd-Graber and Hal Daumé III. His research interests lie at the intersection of natural language processing and machine learning. More specifically, he focuses on designing deep neural networks for both traditional NLP tasks (e.g., question answering, language generation) and new problems that involve creative language (e.g., understanding narratives in novels). He has interned at MetaMind and Microsoft Research, and his research has won a best paper award at NAACL 2016 and a best demonstration award at NIPS 2015.
23 Feb 2018	Miriam Posner, Dave Shepard, and Andrew Wallace (UCLA)	Digital Humanities: Lots of Text-Based Corpora, Lots of Questions Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Digital humanities is a field that uses digital tools to explore humanities questions. That work can take many different forms, from maps to data visualization to video-based projects. In this talk, we’ll discuss humanities approaches to large-scale text analysis, with a focus on corpora that may be of interest to computer scientists. We’ll also talk about the distinctive ways that humanists approach text analysis, and some of the “live” questions in the field that might interest NLP researchers. Bio: Miriam Posner is an assistant professor at the UCLA School of Information. She’s also a digital humanist with interests in labor, race, feminism, and the history and philosophy of data. As a digital humanist, she is particularly interested in the visualization of large bodies of data from cultural heritage institutions, and the application of digital methods to the analysis of images and video. She is at work on two projects: the first on what “data” might mean for humanistic research; and the second on how multinational corporations are making use of data in their supply chains. Bio: David Shepard (UCLA) is Lead Academic Developer at UCLA’s Center for Digital Humanities. After receiving his PhD in English from UCLA in 2012, he coauthored the book HyperCities: Thick Mapping in the Digital Humanities and has worked on social media and text mining. His work focuses on large-scale analysis of social media in disasters. Bio: Andrew Wallace is a software developer in the UCLA digital library. He received his PhD in Cognitive Science from Brown University in 2011.
12 Feb 2018	Nima Pourdamghani(USC/ISI)	Non-traditional resources and improved tools for low-resource machine translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Thanks to massive training data, and powerful machine translation techniques, machine translation quality has reached acceptable levels for a handful of languages. However, for hundreds of other languages, translation quality decreases quickly as the size of the available training data becomes smaller. For languages with a few millions or less tokens of translation data (called low-resource languages in this dissertation) traditional machine translation technologies fail to produce understandable translations into English. In this work, I explore various non-traditional sources for improving low-resource machine translation. Bio: Nima Pourdamghani is a phd student at USC/ISI working with professor Kevin Knight. Nima's interests are natural language processing, and applications of machine learning in general. His phd thesis is on building tools to improve machine translation for hundreds of low-resource languages.
09 Feb 2018	Hongning Wang (University of Virginia)	Contextual Bandits in a Collaborative Environment Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offsbetween exploration and exploitation with companion side-information. They have been extensivelyused in various important practical scenarios, such as display advertising and content recommendation.A common practice estimates the unknown bandit parameters pertaining to each user independently.This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especiallyfor the applications that have strong social components.In this talk, I will introduce our newly developed collaborative contextual bandit algorithm, in which theadjacency graph of users is leveraged to share context and payoffs among neighboring users duringonline updating. We rigorously prove an improved upper regret bound of the proposed collaborativebandit algorithm comparing to conventional independent bandit algorithms. More importantly, we alsoprove that user dependency relation is only needed to be time-invariant, such that a sublinear upperregret bound is still achievable in such an algorithm. This enables online user dependency estimation.Extensive experiments on both synthetic and three large-scale real-world datasets verified theimprovement of our proposed algorithm against several state-of-the-art contextual bandit algorithms. Inaddition, I will also cover our recent progress in online matrix factorization, optimizing user long-term engagement, and bandit learning in a non-stationary environment. Bio: Dr. Hongning Wang is now an Assistant Professor in the Department of Computer Science at theUniversity of Virginia. He received his Ph.D. degree in computer science at the University of Illinois atChampaign-Urbana in 2014. His research generally lies in the intersection among machine learning, datamining and information retrieval, with a special focus on computational user intent modeling. His workhas generated over 40 research papers in top venues in data mining and information retrieval areas. Heis a recipient of 2016 National Science Foundation CAREER Award and 2014 Yahoo Academic CareerEnhancement Award.
08 Feb 2018	Manuel Ciosici(Aarhus University, Denmark)	Abbreviation Disambiguation and NLP Deployment in Industrial Settings Time: 11:00 am - 12:00 pm Location: Conference Room [689] Abstract: This talk will cover two topics. The first part will be a brief overview of Manuel's recent project in abbreviation disambiguation. Following, Manuel will give a brief overview of how various NLP methods are used in an industrial setting in a danish company that provides text analytics services for publishers such as Springer-Nature. Bio: Manuel is a 3rd year PhD student at Aarhus University in Denmark. His PhD is focused on applying Data Mining and Machine Learning on large collections of unstructured text documents with the goal of extracting and representing knowledge embedded in the documents.
19 Jan 2018	Ashish Vaswani, Jakob Uszkoreit, and Niki Parmar (Google Brain)	Attention Is All You Need Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Bio: Ashish Vaswani is Research Scientist at Google Brain, where he works with fun people on non-sequential generative models that seem to translate well and generate reasonable images of cars and faces. He's also interested in non-autoregressive models for generating structured outputs. Before Brain, he spent many wonderful years at ISI, first as a PhD student, working on fast training of neural language models and MDL inspired training of latent-variable models with David Chiang and Liang Huang, and later as a scientist. He misses his colleagues in LA but he prefers the weather in San Francisco. Bio: Jakob Uszkoreit is currently a member of the Google Brain research team. There, he works on neural networks generating text, images and other modalities in tasks such as machine translation or image super-resolution. Before joining Brain, Jakob led teams in Google Research and Search, developing neural network models of language that learn from weak supervision at very large scale and designing the semantic parser of the Google Assistant. Prior to that, he worked on Google Translate in its early years. Jakob received his MSc in Computer Science and Mathematics from the Technical University of Berlin in 2008. Bio: Niki Parmar is currently a Research Engineer in Google Brain, where she works on generative modeling for tasks across different modalities like Machine Translation, conditional Image generation and super-resolution. Previous to Brain, she worked within Google Research to experiment and evaluate models for Query Similarity and Question Answering used within Search. Niki recieved her Masters in Computer Science from USC before joining Google.
08 Dec 2017	Nasrin Mostafazadeh (BenevolentAI lab)	[Canceled] Language Comprehension & Language Generation in Eventful Contexts Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Building AI systems that can process user input, understand it, and generate an engaging and contextually-relevant output in response, has been one of the longest-running goals in AI. Humans use a variety of modalities, such as language and visual cues, to communicate. A major trigger to our meaningful communications are "events" and how they cause/enable future events. In this talk, I will present my research about language comprehension and language generation around events, with a major focus on commonsense reasoning, world knowledge, and context modeling. I will focus on multiple context modalities such as narrative, conversational, and visual. Finally, I will highlight my recent work on language comprehension in the biomedical domain for finding cures for major diseases. Bio: Nasrin Mostafazadeh is a senior research scientist at BenevolentAI labs. She recently got her PhD at the University of Rochester working with James Allen in conversational interaction and dialogue research group. During her PhD, she spent about a year at Microsoft and a summer at Google doing research on various NLP problems. Nasrin’s research focuses on language comprehension, mainly studying events to predict what happens next. She has developed models for tackling various research tasks for pushing AI toward deeper language understanding with applications ranging from story generation to vision & language. Recently, she has been working on language comprehension in the biomedical domain, with the goal of finding cures for major diseases such as cancer by leveraging millions of unstructured data.
20 Nov 2017	Margaret Mitchell (Google)	Algorithmic Bias in Artificial Intelligence: The Seen and Unseen Factors Influencing Machine Perception of Images and Language Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The success of machine learning has surged, with similar algorithmic approaches effectively solving a variety of human-defined tasks. Tasks testing how well machines can perceive images and communicate about them have exposed strong effects of different types of bias, such as selection bias and dataset bias. In this talk, I will unpack some of these biases, and how they affect machine perception today. Bio: Margaret Mitchell is a Senior Research Scientist in Google's Research & Machine Intelligence group, working on artificial intelligence. Her research generally involves vision-language and grounded language generation, focusing on how to evolve artificial intelligence towards positive goals. This includes research on helping computers to communicate based on what they can process, as well as projects to create assistive and clinical technology from the state of the art in AI.
17 Nov 2017	Jonathan Gordon (USC/ISI)	Learning and Reading Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In recent years, a dramatic increase in the availability of digital text has created challenges and opportunities for learning for both humans and machines. My talk will describe research on learning commonsense knowledge from text -- despite our Gricean imperative to write down only what other people wouldn't know -- and using this for reasoning about language and the world. It will also address helping people to learn scientific knowledge by using implicit structure in a proliferation of articles, books, online courses, and other educational resources. Bio: Jonathan Gordon is a postdoctoral researcher at the USC Information Sciences Institute, where he works with Jerry Hobbs and colleagues on the problems of learning and organizing knowledge from text. He completed a bachelor's degree in computer science at Vassar College and a Ph.D. in artificial intelligence at the University of Rochester, supervised by Lenhart Schubert.
10 Nov 2017	Anssi Yli-Jyrä	On Real-Time Graph Transducers Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: NLP research has been fluctuating between two extreme models of computation:finite computers and universal computers. Often a practical solution combinesboth of these two extremes because formally powerful models are simulated byphysical machines that approximate them. This is especially true for recurrentneural networks whose activation vector is the key to deeper understanding oftheir emergent finite-state behavior. However, we currently have only a veryloose characterization for the finite-state property in neural networks.In order to construct a hypothesis for a possible bottom-up organization of thestate-space of activation vectors of RNNs, I compare neural networks withbounded Turing machines and finite-state machines, and quote recent results onfinite state models for semantic graphs. These models enjoy the nice closureproperties of weighted finite-state machines.In the end of the talk, I sketch my vision for neural networks that performfinite-state graph transductions in real time. Such transductions would have avast variety of applications in machine translation and semantic informationretrieval involving big data. Anssi Yli-Jyrä has the titles of Adjunct Professor (Docent) in LanguageTechnology at the University of Helsinki and Life Member of Clare Hall College at the University of Cambridge. He is currently a PI and a ResearchFellow of the Academy of Finland in a project concerning universality offinite-state syntax. He has published a handbook on Hebrew and Greek morphemealignments in the Finnish Bible translation together with a group of DigitalHumanists, and then served the Finnish Electronic Library at CSC - IT Centre ofScience where he built an internet harvester and a search engine for the FinnishWWW. In 2005, he earned his PhD from the University of Helsinki and then worked as a coordinator for the Language Bank of Finland at CSC. There he contributed to pushing his employer to what is now known as the CLARIN EuropeanResearch Infrastructure Consortium. He became the first President of SIGFSM in 2009, after fostering and organizing FSMNLP conferences for several years. In 2012-2013, he served as a Subject Head of Language Technology in his home university before visiting the Speech Group at the Department of Engineering, Cambridge University. He has supervised theses and contributed to thetheoretical basis of Helsinki Finite-State Transducer (HFST) library. In hisown research, Yli-Jyrä constantly pursues unexplored areas, applyingfinite-state transducers to graphical language processing tasks such asautosegmental phonology, constraint interaction, and dependency syntax and neural semantics. He is a qualified teacher and interested in the occurrence offlow in agile programming and simultaneous translation.
03 Nov 2017	Kai-Wei Chang (UCLA)	Structured Predictions: Practical Advancements and Applications in Natural Language Processing Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Many machine learning problems involve making jointpredictions over a set of mutually dependent output variables. Thedependencies between output variables can be represented by astructure, such as a sequence, a tree, a clustering of nodes, or agraph. Structured prediction models have been proposed for problems ofthis type. In this talk, I will describe a collectionof results that improve several aspects of these approaches. Ourresults lead to efficient and effective algorithms for learning structuredprediction models, which, in turn, support weak supervision signals and improve training and evaluation speed.I will also discuss potential risks and challenges when using structured prediction models Bio: Kai-Wei Chang is an assistant professor in the Department ofComputer Science at the University of California, Los Angeles. Hehas published broadly in machine learning and natural language processing. Hisresearch has mainly focused on designing machine learning methods forhandling large and complex data. He has been involved in developingseveral machine learning libraries, including LIBLINEAR, VowpalWabbit, and Illinois-SL. He was an assistant professor at the Universityof Virginia in 2016-2017. He obtained his Ph.D. from the University ofIllinois at Urbana-Champaign in 2015 and was a post-doctoral researcher at Microsoft Research in 2016.Kai-Wei was awarded the EMNLP Best Long Paper Award (2017), KDDBest Paper Award (2010), and the Yahoo! Key Scientific Challenges Award(2011). Additional information is available at http://kwchang.net.
13 Oct 2017	Yangfeng Ji (University of Washington)	Context is Everything: From language modeling to language generation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Contextual information is critical for language processing and generation. Particularly for large texts consisting of multiple sentences or paragraphs, how to capture the contextual information beyond sentence boundaries is important for building better language processing systems. This talk will discuss our recent effort on incorporating contextual information to language modeling and generation. It presents three models with each of them corresponds a specific linguistic phenomenon of context shared in written texts: (i) local context from preceding sentences; (ii) semantic and pragmatic relations between adjacent sentences; and (iii) evolving of entities (e.g., characters in novels) through coreference links in texts. The starting point of our model design is sentence-level recurrent neural network language models (RNNLMs). To capture these aspects of contextual information, we extend RNNLMs by either adding extra connections among existing network components, or adding dedicated components particularly to encode specific linguistic information. Evaluation results show that these models outperforms strong baselines and prior work language modeling tasks. Their ability of capturing contextual information is also verified by the quantitative evaluation on each corresponding task, such as identifying the relation between sentences, and resolving coreference ambiguity. Qualitative analysis is also included to demonstrate the ability of these models for text generation. Bio: Yangfeng Ji is a postdoc researcher at University of Washington working with Noah Smith. His research interests lie in the interaction of natural language processing and machine learning. He is interested in designing machine learning models and algorithms for language processing, and also fascinated by how linguistic knowledge helps build better learning models. He completed his Ph.D. in Computer Science at Georgia Institute of Technology in 2016, advised by Jacob Eisenstein. He was one of the area co-chairs on Discourse and Pragmatics in ACL 2017.
08 Sep 2017	Leon Cheung, Nelson Liu (ISI Intern)	1)Improving Low Resource Neural Machine Translation 2)Language-Independent Translation of Out-of-Vocabulary Words Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: 1)Statistical models have outperformed neural models in machinetranslation, until recently, with the introduction of the sequence tosequence neural model. However, this model's performance suffers greatlywhen starved of bilingual parallel data. This talk will discuss severalstrategies that try to overcome this low-resource challenge, includingmodifications to the sequence to sequence model, transfer learning, dataaugmentation, and the use of monolingual data.2)Neural machine translation is effective for language pairs with large datasets, but falls short to traditional methods (e.g. phrase or syntax-based machine translation) in the low-resource setting. However, these classic approaches struggle to translate out-of-vocabulary tokens, a limitation that is amplified when there is little training data. In this work, we augment a syntax-based machine translation system with a module that provides translations of out-of-vocabulary tokens. We present several language-independent strategies for translation of unknown tokens, and benchmark their accuracy on an intrinsic out-of-vocabulary translation task across a typologically diverse dataset of sixteen languages. Lastly, we explore the effects of using the module to add rules to a syntax-based machine translation system on overall translation quality. Bio: Leon Cheung is a second year undergraduate from UC San Diego. Thissummer he has been working with Jon May and Kevin Knight to improveneural machine translation for low resource languages. Nelson Liu is an undergraduate at the University of Washington, where he works with Professor Noah Smith. His research interests lie at the intersection of machine learning and natural language processing. Previously, he worked at the Allen Institute for Artificial Intelligence on machine comprehension---he is currently a summer intern at ISI working with Professors Kevin Knight and Jonathan May.
31 Aug 2017	Yining Chen, Sasha Mayn (ISI Intern)	THURSDAY TALK: 1)Recurrent Neural Networks as Weighted Language Recognizers 2)Gloss-to-English: Improving Low Resource Language Translation Using Alignment Tables Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: 1)We investigate properties of a simple recurrent neural network (RNN) as a formal device for recognizing weighted languages. We focus on the single-layer, ReLU-activation, rational-weight RNN with softmax, a standard form of RNN used in language processing applications. We prove that many questions one may ask about such RNNs are undecidable, including consistency, equivalence, minimization, and finding the highest-weighted string. For consistent RNNs, finding the highest-weighted string is decidable, although the solution can be exponentially long in the length of the input RNN encoded in binary. Limiting to solutions of polynomial length, we prove that finding the highest-weighted string for a consistent RNN is NP-complete and APX-hard.2) Neural Machine Translation has gained popularity in recent years and has been able to achieve impressive results. The only caveat is that millions of parallel sentences are needed in order to train the system properly, and in a low-resource scenario that amount of data simply may not be available. This talk will discuss strategies for addressing the data scarcity problem, particularly using alignment tables to make use of parallel data from higher-resource language pairs and creating synthetic in-domain data. Bio: Yining Chen is an third year undergraduate student at Dartmouth College. She is a summer intern at ISI working with Professor Kevin Knight and Professor Jonathan May.Sasha Mayn is a summer intern at ISI’s Natural Language Group. She is particularly interested in machine translation and language generation. Last summer Sasha interned at the PanLex Project in Berkeley, where she was responsible for pre-processing digital dictionaries and entering them into PanLex's multilingual database. This summer she has been working on improving neural machine translation strategies for low-resource languages under the supervision of Jon May and Kevin Knight.
18 Aug 2017	Marjan Ghazvininejad (USC/ISI)	Neural Creative Language Generation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Natural language generation (NLG) is a well studied and still very challenging field in natural language processing. One of the less studied NLG tasks is the generation of creative texts such as jokes, puns, or poems. Multiple reasons contribute to the difficulty of research in this area. First, no immediate application exists for creative language generation. This has made the research on creative NLG extremely diverse, having different goals, assumptions, and constraints. Second, no quantitative measure exists for creative NLG tasks. Consequently, it is often difficult to tune the parameters of creative generation models and drive improvements to these systems. Finally, rule based systems for creative language generation are not yet combined with deep learning methods.In this work, we address these challenges for poetry generation which is one of the main areas of creative language generation. We introduce password poems as a novel application for poetry generation. Furthermore, we combine finite-state machinery with deep learning models in a system for generating poems for any given topic. We introduce a quantitative metric for evaluating the generated poems and build the first interactive poetry generation system that enables users to revise system generated poems by adjusting style configuration settings like alliteration, concreteness and the sentiment of the poem.In order to improve the poetry generation system, we decide to borrow ideas from human literature and develop a poetry translation system. We propose to study human poetry translation and measure the language variation in this process. we will study how human poetry translation is different from human translation in general and whether a translator translates poetry more freely. Then we will use our findings to develop a machine translation system specifically for translating poetry and proposing metrics for evaluating the quality of poetry translation. Bio: Marjan Ghazvininejad is a PhD student at ISI working with Professor Kevin Knight.
11 Aug 2017	Nima Pourdamghani (USC/ISI)	Improving machine translation from low resource languages Time: 3:00 pm - 4:00 pm Location: 6th Floor Conference Room [689] Abstract: Statistical machine translation (MT) often needs a large corpus of parallel translated sentences in order to achieve good performance. This limits the use of current MT technologies to a few resource-rich languages. Assume an incident happens in an area with a low-resource language. For a quick response, we need to build an MT system with available data, as finding or translating new parallel data is expensive and time consuming. For many languages this means that we only have a small amount of often out-of-domain parallel data (e.g. a Bible or Ubuntu manual). This talk is about ways to improve machine translation in low resource scenarios. I'll talk about use of monolingual data and parallel data from related languages to improve machine translation from the low resource language into English. Bio: Nima Pourdamghani is a fourth year Ph.D. student at ISI. He works with Professor Kevin Knight on machine translation from low resource languages.
21 Jul 2017	Xing Shi (USC/ISI)	Neural Sequence Models: Interpretation and Augmentation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Recurrent neural networks (RNN) have been successfully applied to various Natural Language Processing tasks, including language modeling, machine translation, text generation, etc. However, several obstacles still stand in the way: First, due to the RNN's distributional nature, few interpretations of its internal mechanism are obtained, and it remains a black box. Second, because of the large vocabulary sets involved, the text generation is very time-consuming. Third, there is no flexible way to constrain the generation of the sequence model with external knowledge. Last, huge training data must be collected to guarantee the performance of these neural models, whereas annotated data such as parallel data used in machine translation are expensive to obtain. This work aims to address the four challenges mentioned above.To further understand the internal mechanism of the RNN, I choose neural machine translation (NMT) systems as a testbed. I first investigate how NMT outputs target strings of appropriate lengths, locating a collection of hidden units that learns to explicitly implement this functionality. Then I investigate whether NMT systems learn source language syntax as a by-product of training on string pairs. I find that both local and global syntactic information about source sentences is captured by the encoder. Different types of syntax are stored in different layers, with different concentration degrees.To speed up text generation, I proposed two novel GPU-based algorithms: 1) Utilize the source/target words alignment information to shrink the target side run-time vocabulary; 2) Apply locality sensitive hashing to find nearest word embeddings. Both methods lead to a 2-3x speedup on four translation tasks without hurting machine translation accuracy as measured by BLEU. Furthermore, I integrate a finite state acceptor into the neural sequence model during generation, providing a flexible way to constrain the output, and I successfully apply this to poem generation, in order to control the pentameter and rhyme.Based on above success, I propose to work on the following: 1) Go one further step towards interpretation: find unit/feature mappings, learn the unit temporal behavior, and understand different hyper-parameter settings. 2) Improve NMT performance on low-resource language pairs by fusing an external language model, feeding explicit target-side syntax and utilizing better word embeddings. Bio: Xing Shi is a PhD student at ISI working with Prof. Kevin Knight.
14 Jul 2017	Sorcha Gilroy (University of Edinburgh)	Parsing Graphs with Regular Graph Grammars Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Recently, several datasets have become available which represent natural language phenomena as graphs. Hyperedge Replacement Languages (HRL) have been the focus of much attention as a formalism to represent the graphs in these datasets. Chiang et al. (2013) prove that HRL graphs can be parsed in polynomial time with respect to the size of the input graph. We believe that HRL may be more expressive than is necessary to represent semantic graphs and we propose looking at Regular Graph Languages (RGL; Courcelle, 1991), which is a subfamily of HRL, as a possible alternative. We provide a top-down parsing algorithm for RGL that runs in time linear in the size of the input graph. Bio: Sorcha is a 2nd year PhD student at the University of Edinburgh and is a student in the Center for Doctoral Training in Data Science. Her PhD is focused on formal languages of graphs for NLP and her supervisors are Adam Lopez and Sebastian Maneth. She completed her undergraduate degree in mathematical sciences at University College Cork and her masters degree in data science at the University of Edinburgh. She is at ISI as an intern in the NLP group. Live here: http://webcastermshd.isi.edu/Mediasite/Play/c523b7ef95b443e8b29cfac3092e00081d
07 Jul 2017	Amir Hossein Yazdavar (Wright state University)	Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: With the rise of social media, millions of people express their moods, feelings and daily struggles with mental health issues routinely on social media platforms like Twitter. Un- like traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the analysis of tweets crawled from users with self-reported depressive symptoms in their Twitter profiles, we demonstrate the potential of detecting clinical depression symptoms which emulate the PHQ-9 questionnaire clinicians use today. Our study uses a semi-supervised statistical model to evaluate how the duration of these symptoms and their expressionon Twitter (in terms of word usage patterns and topical preferences) align with the medical findings reported via the PHQ-9. Our proactive and automatic screening tool is able to identify clinical depressive symptoms with an accuracy of 68% and precision of 72%. Bio: Amir is a 2nd year Ph.D. Researcher at Kno.e.sis Center Wright State University, OH under the guidance of Prof. Amit P. Sheth, the founder and executive director of Kno.e.sis Center. He is broadly interested in machine learning (incl. deep learning) and semantic web (incl. creation and use of knowledge graphs) and their applications to NLP/NLU and social media analytics. He has a particular interest in the extraction of subjective information with applications to search, social and biomedical/health applications. At Kno.e.sis Center – He is working on several real world projects mainly focused on studying human behavior on the web via Natural Language Understanding, Social Media Analytics utilizing Machine learning (Deep learning) and Knowledge Graph techniques. In particular, his focus is to enhance statistical models via domain semantics and guidance from offline behavioral knowledge to understand user’s behavior from unstructured and large-scale Social data.
16 Jun 2017	Mayank Kejriwal (ISI)	From Noisy Information Extraction to Rich Information Retrieval in Unusual Domains Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Information Extraction (IE) or the algorithmic extraction of named entities, relations and attributes of interest from text-rich data is an important natural language processing task. In this talk, I will discuss the relationship of IE to fine-grained Information Retrieval (IR), especially when the domain of interest is unusual i.e. computationally under-studied, socially consequential and difficult to analyze. In particular, such domains exhibit a significant long-tail effect, and their language models are obfuscated. Using real-world examples and results obtained in recent DARPA MEMEX evaluations, I will discuss how our search system uses semantic strategies to usefully facilitate complex information needs of investigative users in the human trafficking domain, even when IE outputs are extremely noisy. I briefly report recent results obtained from a user study conducted by DARPA, and the lessons learned thereof for both IE and IR research. Bio: Mayank Kejriwal is a computer scientist in the Information integration group at ISI. He received his Ph.D. from the University of Texas at Austin under Daniel P. Miranker. His dissertation involved domain-independent linking and resolving of structured Web entities at scale, and was published as a book in the Studies in the Semantic Web series. At ISI, he is involved in the DARPA MEMEX, LORELEI and D3M projects. His current research sits at the intersection of knowledge graph construction, search, inference and analytics, especially over Web corpora in unusual social domains.
09 Jun 2017	Benjamin Girault (USC)	Introduction to Graph Signal Processing: Tools for Harmonic Analysis on Irregular Structures. Time: 3:00 pm - 4:00 pm Location: 6th Floor Conference Room [689] Abstract: During the past few years, graph signal processing has been extending the fieldof signal processing on Euclidean spaces to irregular spaces represented bygraphs. We have seen successes ranging from the Fourier transform, towavelets, vertex-frequency (time-frequency) decomposition, sampling theory,uncertainty principle, or convolutive filtering. This presentation introducesthe field, the type of signals involved, and how harmonic analysis isperformed. Bio: Benjamin Girault received his License (B.Sc.) and his Master (M.Sc.) in Francefrom École Normale Supérieure de Cachan, France, in 2009 and 2012 respectivelyin the field of theoretical computer science. He then received his PhD incomputer science from École Normale Supérieure de Lyon, France, in December2015. His dissertation entitled "Signal Processing on Graphs - Contributionsto an Emerging Field" focuses on extending the classical definition ofstationary temporal signals to stationary graph signal. Currently, he is apostdoctoral scholar with Professors Antonio Ortega and Shri Narayanan at theUniversity of Southern California continuing his work on graph signalprocessing with a focus on applying these tools to understanding humanbehavior.
26 May 2017	Yannis Konstas (UW)	Building Adaptable and Scalable Natural Language Generation Systems Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Traditionally, computers communicate with humans by converting computer-readable input to human-interpretable output, for example via graphical user interfaces. My research focuses on building programs that automatically generate textual output from computer-readable input. The majority of existing Natural Language Generation (NLG) systems use hard-wired rules or templates in order to capture the input for every different application and rely on small manually annotated corpora. In this talk, I will present a framework for building NLG systems using Neural Network architectures. The approach makes no domain-specific modifications to the input and benefits from training on very large unannotated corpora. It achieves state-of-the-art performance on a number of tasks, including generating text from meaning representations and source code. Such a system can have direct applications to intelligent conversation agents, source code assistant tools, and semantic-based Machine Translation. Bio: Ioannis Konstas is a postdoctoral researcher at the University of Washington, Seattle, collaborating with Prof. Luke Zettlemoyer since 2015. His main research interest focuses on the area of Natural Language Generation (NLG) with an emphasis on data-driven deep learning methods.He has received BSc in Computer Science from AUEB (Greece) in 2007, and MSc in Artificial Intelligence from the University of Edinburgh (2008). He continued his study at the University of Edinburgh and received his Ph.D. degree in 2014. He has previously worked as a Research Assistant at the University of Glasgow (2008), and as a postdoctoral researcher at the University of Edinburgh (2014).
05 May 2017	Sayan Ghosh (USC/ICT)	Representation Learning for Human Affect Recognition (PhD Proposal Practice Talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Recent advances in end-to-end representation learning have made impressive strides in achieving state-of-the-art results in perception problems on speech, image and natural language. However, the area of affect understanding has mostly relied on off-the-shelf features to solve problems in emotion recognition, multi-modal fusion and generative modeling of affective speech and language. The potential impact of representation learning approaches to this area remains ripe for exploration. My thesis proposal is an important step in this direction. Firstly, I present an overview of my work on AU (Action Unit) detection, speech emotion recognition and glottal inverse filtering through speech modeling. Secondly, I introduce Affect-LM, a novel neural language model for affective text generation which exploits prior knowledge through a dictionary of emotionally colored words (such as the LIWC tool). Finally, I state some upcoming problems in representation learning for affect from speech and multi-modal language modeling which I plan to work on for the remainder of my degree. Sayan is a fourth-year PhD student at the University of Southern California, working at the Behavior Analytics and Machine Learning Group at the ICT(Institute for Creative Technologies) with Prof. Stefan Scherer. He is working on research towards building learning systems for better sensing of human behavior and emotion, and integrating deep learning techniques with human affect. His areas of interest include, but are not limited to deep learning, machine perception, affective computing, speech/signal processing, and generative modeling.
28 Apr 2017	Andreas Stuhlmüller (Stanford)	Modeling Dialog using Probabilistic Programs Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: How can we effectively explore the space of automated dialog systems? In this talk, I introduce WebPPL, a probabilistic programming language that provides a wide range of inference and optimization algorithms out of the box. This language makes it easy to express and combine probabilistic models, including regression and categorization models, highly structured cognitive models, models of agents that make sequential plans, and deep neural nets. I show that this also includes recent sequence-to-sequence architectures for dialog. I then use this framework to implement dialog automation using workspaces, a variation on these architectures that is aimed at dialogs that require sufficiently deep reasoning between utterances that it is difficult to learn how to automate them from transcripts alone. Bio: Andreas Stuhlmüller is a post-doctoral researcher at Stanford, working in Prof. Noah Goodman's Computation & Cognition lab, and founder of Ought Inc. Previously, he received his Ph.D. in Brain and Cognitive Sciences from MIT, where he was part of Prof. Josh Tenenbaum's Computational Cognitive Science group. He has worked on the design and implementation of probabilistic programming languages, on their application to cognitive modeling, and recently on dialog systems. He is broadly interested in leveraging machine learning to help people think.
21 Apr 2017	Kallirroi Georgila (USC/ICT)	Reinforcement learning of negotiation dialogue policies Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The dialogue policy of a dialogue system decides on what dialogue move (also called “action”) the system should make given the dialogue context (also called “dialogue state”). Building hand-crafted dialogue policies is a hard task, and there is no guarantee that the resulting policies will be optimal. This issue has motivated the dialogue community to use statistical methods for automatically learning dialogue policies, the most popular of which is reinforcement learning (RL). However, to date, RL has mainly been used to learn dialogue policies in slot-filling applications (e.g., restaurant recommendation, flight reservation, etc.) largely ignoring other more complex genres of dialogue such as negotiation. This talk presents challenges in reinforcement learning of negotiation dialogue policies. The first part of the talk focuses on applying RL to a two-party multi-issue negotiation domain. Here the main challenges are the very large state and action space, and learning negotiation dialogue policies that can perform well for a variety of negotiation settings, including against interlocutors whose behavior has not been observed before. Good negotiators try to adapt their behaviors based on their interlocutors’ behaviors. However, current approaches to using RL for dialogue management assume that the user’s behavior does not change over time. In the second part of the talk, I will present an experiment that deals with this problem in a resource allocation negotiation scenario.Kallirroi Georgila is a Research Assistant Professor at the Institute for Creative Technologies (ICT) at the University of Southern California (USC) and at USC’s Computer Science Department. Before joining USC/ICT in 2009 she was a Research Scientist at the Educational Testing Service (ETS) and before that a Research Fellow at the School of Informatics at the University of Edinburgh. Her research interests include all aspects of spoken dialogue processing with a focus on reinforcement learning of dialogue policies, expressive conversational speech synthesis, and speech recognition. She has served on the organizing, senior, and program committees of many conferences and workshops. Her research work is funded by the National Science Foundation and the Army Research Office.
14 Apr 2017	Kevin Knight (USC/ISI)	Why is it harder to build a tic-tac-toe playing robot than a tic-tac-toe playing program? Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: I wanted to understand why it's so hard to build working robots, so I programmed one to play tic-tac-toe. Now I understand a lot better! I thought I'd relate my experience right now, just in case I later become more knowledgeable and impossible to understand. Kevin Knight is a Research Director at the Information Sciences Institute (ISI) of the University of Southern California (USC), and a Professor in the USC Computer Science Department. He received a PhD in computer science from Carnegie Mellon University and a bachelor's degree from Harvard University. Dr. Knight’s research interests include statistical machine translation, natural language generation, automata theory, and decipherment of historical manuscripts.
07 Apr 2017	Reihane Boghrati (USC)	ConversAtion level Syntax SImilarity Metric (CASSIM) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The syntax and semantics of human language can illuminate many individual psychological differences and important dimensions of social interaction. Thus, analysis of language provides important insights into the underlying psychological properties of individuals and groups. Accordingly, psychological and psycholinguistic research has begun incorporating sophisticated representations of semantic content to better understand the connection between word choice and psychological processes. While the majority of language analysis work in psychology has focused on semantics, psychological information is encoded not just in what people say, but how they say it. We introduce ConversAtion level Syntax SImilarity Metric (CASSIM), a novel method for calculating conversation-level syntax similarity. CASSIM estimates the syntax similarity between conversations by automatically generating syntactical representations of the sentences in conversations, estimating the structural differences between them, and calculating an optimized estimate of the conversation-level syntax similarity. Also, we conduct a series of analyses with CASSIM to investigate syntax accommodation in social media discourse. Further, building off of CASSIM, we propose ConversAtion level Syntax SImilarity Metric-Group Representations (CASSIM-GR). This extension builds generalized representations of syntactic structures of documents, thus allowing researchers to distinguish between people and groups based on syntactic differences. Bio: Reihane is a forth year Ph.D student at USC, working with Morteza Dehghani in Computational Social Science Laboratory. She is interested in introducing new methods and computational models to psychology, and more broadly to social sciences. Her work spans the boundary between natural language processing and psychology, as does her intellectual curiosity.
31 Mar 2017	Danqi Chen (Stanford)	Towards the Machine Comprehension of Text Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Enabling a computer to understand a document so that it can answer comprehension questions is a central, yet unsolved goal of NLP. The task of reading comprehension (i.e., question answering over unstructured text) has received vast attention recently, and some progress has been made thanks to the creation of large-scale datasets and development of attention-based neural networks.In this talk, I’ll first present how we advance this line of research. I’ll show how simple models can achieve (nearly) state-of-the-art performance on recent benchmarks, including the CNN/Daily Mail datasets and the Stanford Question Answering Dataset. I’ll focus on explaining the logical structure behind these neural architectures and discussing advantage as well as limits of current approaches.Lastly I’ll talk about our recent work on scaling up machine comprehension systems, which attempt to answer open-domain questions at the full Wikipedia scale. We demonstrate the promise of our system, as well as set up new benchmarks by evaluating on multiple existing QA datasets. Bio: Danqi Chen is a Ph.D. candidate in Computer Science at Stanford University, advised by Prof. Christopher Manning. Her main research interests lie in deep learning for natural language processing and understanding, and she is particularly interested in the intersection between text understanding and knowledge reasoning. She has been working on machine comprehension, question answering, knowledge base population and dependency parsing. She is a recipient of Facebook fellowship and Microsoft Research Women’s Fellowship and an outstanding paper award in ACL'16. Prior to Stanford, she received her B.S. from Tsinghua University in 2012.
27 Mar 2017	Stephen Kobourov (Arizona)	Analyzing the Language of Food on Social Media Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We investigate the predictive power behind the language offood on social media. We collect a corpus of over three millionfood-related posts from Twitter and demonstrate that many latentpopulation characteristics can be directly predicted from this data:overweight rate, diabetes rate, political leaning, and homegeographical location of authors. For all tasks, our language-basedmodels significantly outperform the majority- class baselines.Performance is further improved with more complex natural languageprocessing, such as topic modeling. We analyze which textual featureshave most predictive power for these datasets, providing insight intothe connections between the language of food, geographic locale, andcommunity characteristics. Lastly, we design and implement an onlinesystem for real-time query and visualization of the dataset.Visualization tools, such as geo-referenced heatmaps,semantics-preserving wordclouds and temporal histograms, allow us todiscover more complex, global patterns mirrored in the language offood.Stephen Kobourov is a Professor of Computer Science at the Universityof Arizona. He completed BS degrees in Mathematics and ComputerScience at Dartmouth College in 1995, and a PhD in Computer Science atJohns Hopkins University in 2000. He has worked as a ResearchScientist at AT&T Research Labs, a Hulmboldt Fellow at the Universityof Tübingen in Germany, and a Distinguished Fulbright Chair at CharlesUniversity in Prague.
24 Mar 2017	Sameer Singh (UCI)	Intuitive Interactions with Black-box Machine Learning Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Machine learning is at the forefront of many recent advances in natural language processing, enabled in part by the sophisticated models and algorithms that have been recently introduced. However, as a consequence of this complexity, machine learning essentially acts as a black-box as far as users are concerned. It is incredibly difficult to understand, predict, or "fix" the behavior of NLP models that have been deployed. In this talk, I propose interpretable representations that allow users and machine learning models to interact with each other: enabling machine learning models to provided explanations as to why a specific prediction was made and enabling users to inject domain knowledge into machine learning. The first part of the talk introduces an approach to estimate local, interpretable explanations for black-box classifiers and describes an approach to summarize the behavior of the classifier by selecting which explanations to show to the user. I will also briefly describe work on "closing the loop", i.e. allowing users to provide feedback on the explanations to improve the model, for the task of relation extraction, an important subtask of natural language processing. In particular, we introduce approaches to both explain the relation extractor using logical statements and to inject symbolic domain knowledge into relational embeddings to improve the predictions. I present experiments to demonstrate that an interactive interface is effective in providing users an understanding of, and an ability to improve, complex black-box machine learning systems. Bio: Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine. He is working on large-scale and interactive machine learning applied to information extraction and natural language processing. Till recently, Sameer was a Postdoctoral Research Associate at the University of Washington. He received his PhD from the University of Massachusetts, Amherst in 2014, during which he also interned at Microsoft Research, Google Research, and Yahoo! Labs on massive-scale machine learning. He was selected as a DARPA Riser, was awarded the Adobe Research Data Science Award, won the grand prize in the Yelp dataset challenge, has been awarded the Yahoo! Key Scientific Challenges fellowship, and was a finalist for the Facebook PhD fellowship. Sameer has published more than 30 peer-reviewed papers at top-tier machine learning and natural language processing conferences and workshops.
17 Mar 2017	Kuan Liu (USC/ISI)	Heterogeneous Attribute Embedding and Sequence Modeling for Recommendation with Implicit Feedback Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Incorporating implicit feedback into a recommender system is a challenging problem due to sparse and noisy observations. I will present our approaches that exploit heterogeneous attributes and sequence properties within the observations. We build a neural network framework to embed heterogeneous attributes in an end-to-end fashion, and apply the framework to three sequence-based models. Our methods achieve significant improvements on four large-scale datasets compared to state-of-the-art baseline models (30% to 90% relative increase in NDCG). Experimental results show that attribute embedding and sequence modeling both lead to improvements and, further, that our novel output attribute layer plays a crucial role. I will conclude with our exploratory studies that investigate why sequence modeling works well in recommendation systems and advocate its use for large scale recommendation tasks. Bio: Kuan Liu is a fifth year Ph.D. student at ISI/USC working with Prof. Prem Natarajan. Before that, He received a bachelor degree from Tsinghua University with a major in Computer Science. His research interests include machine learning, large scale optimization, deep learning, and applications to recommender systems, network analysis.
10 Mar 2017	He He (Stanford)	Learning agents that interact with humans Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The future of virtual assistants, self-driving cars, and smart homes require intelligent agents that work intimately with users. Instead of passively following orders given by users, an interactive agent must actively collaborate with people through communication, coordination, and user-adaptation. In this talk, I will present our recent work towards building agents that interact with humans. First, we propose a symmetric collaborative dialogue setting in which two agents, each with some private knowledge, must communicate in natural language to achieve a common goal. We present a human-human dialogue dataset that poses new challenges to existing models, and propose a neural model with dynamic knowledge graph embedding. Second, we study the user-adaptation problem in quizbowl - a competitive, incremental question-answering game. We show that explicitly modeling of different human behavior leads to more effective policies that exploits sub-optimal players. I will conclude by discussing opportunities and open questions in learning interactive agents. He He is a post-doc at Stanford University, working with Percy Liang. Prior to Stanford, she earned her Ph.D. in Computer Science at the University of Maryland, College Park, advised by Hal Daumé III and Jordan Boyd-Graber. Her interests are at the interface of machine learning and natural language processing. She develops algorithms that acquire information dynamically and do inference incrementally, with an emphasis on problems in natural language processing. She has worked on dependency parsing, simultaneous machine translation, question answering, and more recently dialogue systems.
07 Mar 2017	Alessandro Achille (UCLA)	Information Dropout: Learning Optimal Representations Through Noisy Computation Time: 11:00 am - 12:00 pm Location: 6th Floor Conference Room [689] Abstract: The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of dropout. We show that our regularized loss function can be efficiently minimized using Information Dropout, a generalization of dropout rooted in information theoretic principles that automatically adapts to the data and can better exploit architectures of limited capacity. When the task is the reconstruction of the input, we show that our loss function yields a Variational Autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Finally, we prove that we can promote the creation of disentangled representations simply by enforcing a factorized prior, a fact that has been observed empirically in recent work. Our experiments validate the theoretical intuitions behind our method, and we find that information dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.arXiv: https://arxiv.org/abs/1611.01353 Bio: Alessandro Achille is a PhD student in Computer Science at UCLA, working with Prof. Stefano Soatto. He focuses on variational inference, representation learning, and their applications to deep learning and computer vision. Before coming to UCLA, he obtained a Master's degree in Pure Math at the Scuola Normale Superiore in Pisa, where he studied model theory and algebraic topology with Prof. Alessandro Berarducci.
03 Mar 2017	Lili Mou (Peking University)	Coupling distributed and symbolic execution for natural language queries Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk, Lili will introduce his work "Coupling distributed and symbolic execution for natural language queries," which was done during his internship at Huawei Technologies (Hong Kong), supervised by Dr. Zhengdong Lu. The study proposes a unified perspective of neural and symbolic execution for semantic parsing, and shows how we can make use of both neural and symbolic worlds. Lili Mou received his BS degree in computer science from Peking University in 2012. He is now a Ph.D. student, supervised by Profs. Zhi Jin, Ge Li, and Lu Zhang. His recent research interests include deep learning applied to natural language processing as well as programming language processing. He has publications at top conferences like AAAI, ACL, CIKM, COLING, EMNLP, IJCAI, and INTERSPEECH.
23 Feb 2017	Nanyun Peng (Johns Hopkins)	Representation Learning with Joint Models for Information Extraction Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: There is abundant knowledge out there carried in the form of natural language texts, such as social media posts, scientific research literature, medical records, etc., which grows at an astonishing rate. Yet this knowledge is mostly inaccessible to computers and overwhelming for human experts to absorb. Information extraction (IE) processes raw texts to produce machine understandable structured information, thus dramatically increasing the accessibility of knowledge through search engines, interactive AI agents, and medical research tools. However, traditional IE systems assume abundant human annotations for training high quality machine learning models, which is impractical when trying to deploy IE systems to a broad range of domains, settings and languages. In this talk, I will present how to leverage the distributional statistics of characters and words, the annotations for other tasks and other domains, and the linguistics and problem structures, to combat the problem of inadequate supervision, and conduct information extraction with scarce human annotations. Nanyun Peng is a PhD candidate in the Department of Computer Science at Johns Hopkins University, affiliated with the Center for Language and Speech Processing and advised by Dr. Mark Dredze. She is broadly interested in Natural Language Processing, Machine Learning, and Information Extraction. Her research focuses on using deep learning for information extraction with scarce human annotations. Nanyun is the recipient of the Johns Hopkins University 2016 Fred Jelinek Fellowship. She has completed two research internships at IBM T.J. Watson Research Center, and Microsoft Research Redmond. She holds a master's degree in Computer Science and BAs in Computational Linguistics and Economics, all from Peking University.
10 Feb 2017	Yonatan Bisk (USC/ISI)	The Limits of Unsupervised Syntax and the Importance of Grounding in Language Acquisition Time: 3:00 pm - 4:00 pm Location: 6th Floor Conference Room [689] Abstract: The future of self-driving cars, personal robots, smart homes, and intelligent assistants hinges on our ability to communicate with computers. The failures and miscommunications of Siri-style systems are untenable and become more problematic as machines become more pervasive and are given more control over our lives. Despite the creation of massive proprietary datasets to train dialogue systems, these systems still fail at the most basic tasks. Further, their reliance on big data is problematic. First, successes in English cannot be replicated in most of the 6,000+ languages of the world. Second, while big data has been a boon for supervised training methods, many of the most interesting tasks will never have enough labeled data to actually achieve our goals. It is, therefore, important that we build systems which can learn from naturally occurring data and grounded, situated interactions.In this talk, I will discuss work from my thesis on the unsupervised acquisition of syntax which harnesses unlabeled text in over a dozen languages. This exploration leads us to novel insights into the limits of semantics-free language learning. Having isolated these stumbling blocks, I’ll then present my recent work on language grounding where we attempt to learn the meaning of several linguistic constructions via interaction with the world. Yonatan Bisk’s research focuses on Natural Language Processing from naturally occurring data (unsupervised and weakly supervised data). He is a postdoc researcher with Daniel Marcu at USC’s Information Sciences Institute. Previously, he received his PhD from the University of Illinois at Urbana-Champaign under Julia Hockenmaier and his BS from the University of Texas at Austin.
03 Feb 2017	Melissa Roemmele (UCS/ICT)	Recurrent Neural Networks for Narrative Prediction Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Narrative prediction involves predicting ‘what happens next’ in a story. This task has a long history in AI research but is now getting more recognition in the NLP community. In this talk I’ll describe three different evaluation schemes for narrative prediction, one of which (the Story Cloze Test) is the shared task for this year’s LSDSem workshop at EACL. I’ll present my ongoing efforts to develop Recurrent Neural Network-based models that succeed on these evaluation frameworks, and discuss the particular challenges posed by each of them. Bio: I’m a PhD candidate at USC’s Institute for Creative Technologies advised by Andrew Gordon in the Narrative Group. My thesis research explores machine learning approaches to automatically generating text-based stories. I’m interested in using this research to stimulate people’s creativity in writing. More broadly, I’m excited by any opportunity to use automated analysis of text data to give people new insights and ideas.
20 Jan 2017	Jonathan May (USC/ISI)	How I Learned to Stop Worrying and Love Evaluations (and Keep Worrying) Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Bake-offs, shared tasks, evaluations: these are names for short,high-stress periods in many CS researchers' lives where theiralgorithms and models are exposed to unseen data, often withreputations and funding on the line. Evaluations are sometimesperceived to be the bane of much of our work lives. Wegrouse about metrics, procedures, glitches, and all thetime "wasted" chasing scores, rather than doing RealScience (TM). In this talk I will argue that despite valid criticismsof the approach, coordinated evaluation is a net benefit to NLPresearch and has led to accomplishments that might not have otherwisearisen. This argument will frame a more in-depth discussion of severalpieces of recent evaluation-grounded work: rapid generation oftranslation and information extraction for low-resource surpriselanguages (DARPA LORELEI) and organization of SemEval sharedtasks in semantic parsing and generation. Jonathan May is a Research Assistant Professor at the University ofSouthern California's Information Sciences Institute(USC/ISI). Previously, he was a research scientist at SDL Research(formerly Language Weaver) and a scientist at Raytheon BBNTechnologies. He received a Ph.D. in Computer Science from theUniversity of Southern California in 2010 and a BSE and MSE inComputer Science Engineering and Computer and Information Science,respectively, from the University of Pennsylvania in 2001. Jon'sresearch interests include automata theory, natural languageprocessing, machine translation, and machine learning.
10 Jan 2017	David Chiang (Notre Dame)	Speech-to-Translation Alignment for Documentation of Endangered Languages Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: I will give an overview of this project, focusing on the pieces that my student, Antonios Anastasopoulos, and I have been most involved in. Our work is based on the premise that spoken language resources are more readily annotated with translations than with transcriptions. A first step towards making such data interpretable would be to automatically align spoken words with their translations. I'll present a neural attentional model (Duong et al., NAACL 2016) and a latent-variable generative model (Anastasopoulos and Chiang, EMNLP 2016) for this task. David Chiang (PhD, University of Pennsylvania, 2004) is an associate professor in the Department of Computer Science and Engineering at the University of Notre Dame. His research is on computational models for learning human languages, particularly how to translate from one language to another. His work on applying formal grammars and machine learning to translation has been recognized with two best paper awards (at ACL 2005 and NAACL HLT 2009). He has received research grants from DARPA, CIA, NSF, and Google, has served on the executive board of NAACL and the editorial board of Computational Linguistics and JAIR, and is currently on the editorial board of Transactions of the ACL.
06 Jan 2017	Kenton Murray (Notre Dame)	Learning Neural Network Structures for Natural Language Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In recent years, deep learning has had a huge impact on natural language processing surpassing the performance of many other statistical and machine learning methods. One of the many promises of deep learning is that features are learned implicitly and that there is no need to manually engineer features for good performance. However, neural network performance is highly dependent on network architecture and selection of hyper-parameters. In many ways, architecture engineering has supplanted feature engineering in NLP tasks. In this talk, I will focus on two ways neural network structures can be learned while concurrently training models. First, I'll present a regularization scheme for learning the number of neurons in a neural language model during training (Murray and Chiang 2015) and show how it can be used in a Machine Translation task. Then, I'll move onto a Visual Question Answering task where denotations are selected by executing a probabilistic program that models non-determinism with neural networks (Murray and Krishnamurthy 2016).Kenton Murray is a PhD student in the Natural Language Processing Lab at the University of Notre Dame's Computer Science and Engineering Department working with David Chiang. His research is on neural methods for human languages, particularly machine translation and question answering. Prior to Notre Dame, he was a Research Associate at the Qatar Computing Research Institute (QCRI) and received a Master's in Language Technologies from Carnegie Mellon University and a Bachelor's in Computer Science from Princeton University.
09 Dec 2016	Radu Soricut (Google)	Multimodal Machine Comprehension: Tasks and Approaches Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The ability of computer models to achieve genuine understanding of information as presented to humans (text, images, etc) is a long-standing goal of Artificial Intelligence. Along the way towards this goal, the research community has proposed solving tasks such as machine reading comprehension and computer image understanding. In this talk, we introduce two new tasks that can help us move closer to the goal. First, we present a multi-choice reading comprehension task, for which the goal is to understand a text passage and choose the correct summarizing sentence from among several options. Second, we present a multi-modal understanding task, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a visual scene, given several similar options. We present several baseline and competitive learning approaches based on neural network architectures, illustrating the utility of the proposed tasks in advancing both image and language comprehension. We also present human evaluation results, which inform a performance upper-bound on these tasks, and quantify the remaining gap between computer systems and human performance (spoiler alert: we are not there yet). Radu Soricut is a Staff Research Scientist in the Research and Machine Intelligence group at Google. Radu has a PhD in Computer Science from University of Southern California, and has been with Google since 2012. His main areas of interest are natural language understanding, multilingual processing, natural language generation (from multimodal inputs), and general machine learning techniques for solving these problems. Radu has published extensively in these areas in top-tier peer-reviewed conferences and journals, and has won the Best Paper Award at the North American Association for Computational Linguistics Conference (NAACL) in 2015. Radu's current project looks at bridging natural language understanding and generation using neural techniques, in the context of Google's focus on making natural language an effective way of interacting with the world and the technology around us.
02 Dec 2016	Yejin Choi (UW)	Procedural Language and Knowledge Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Various types of how-to-knowledge are encoded in natural language instructions: from setting up a tent, to preparing a dish for dinner, and to executing biology lab experiments. These types of instructions are based on procedural language, which poses unique challenges. For example, verbal arguments are commonly elided when they can be inferred from context, e.g., ``bake for 30 minutes'', not specifying bake what and where. Entities frequently merge and split, e.g., ``vinegar’’ and ``oil’’ merging into ``dressing’’, creating challenges to reference resolution. And disambiguation often requires world knowledge, e.g., the implicit location argument of ``stir frying'' is on ``stove''. In this talk, I will present our recent approaches to interpreting and composing cooking recipes that aim to address these challenges.In the first part of the talk, I will present an unsupervised approach to interpreting recipes as action graphs, which define what actions should be performed on which objects and in what order. Our work demonstrates that it is possible to recover action graphs without having access to gold labels, virtual environments or simulations. The key insight is to rely on the redundancy across different variations of similar instructions that provides the learning bias to infer various types of background knowledge, such as the typical sequence of actions applied to an ingredient, or how a combination of ingredients (e.g., ``flour'', ``milk'', ``eggs'') becomes a new entity (e.g, ``wet mixture'').In the second part of the talk, I will present an approach to composing new recipes given a target dish name and a set of ingredients. The key challenge is to maintain global coherence while generating a goal-oriented text. We propose a Neural Checklist Model that attains global coherence by storing and updating a checklist of the agenda (e.g., an ingredient list) with paired attention mechanisms for tracking what has been already mentioned and what needs to be yet introduced. This model also achieves strong performance on dialogue system response generation. I will conclude the talk by discussing the challenges in modeling procedural language and acquiring the necessary background knowledge, pointing to avenues for future research. Bio: Yejin Choi is an assistant professor at the Computer Science & Engineering Department of University of Washington. Her recent research focuses on language grounding, integrating language and vision, and modeling nonliteral meaning in text. She was among the IEEE’s AI Top 10 to Watch in 2015 and a co-recipient of the Marr Prize at ICCV 2013. Her work on detecting deceptive reviews, predicting the literary success, and learning to interpret connotation has been featured by numerous media outlets including NBC News for New York, NPR Radio, New York Times, and Bloomberg Business Week. She received her Ph.D. in Computer Science at Cornell University.
18 Nov 2016	Ramesh R Manuvinakurike (USC/ICT)	Incremental spoken dialogue system for reference resolution in images Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk, I will be speaking about our ongoing effort in thedevelopment of Eve, state-of-the-art incremental reference resolutionin images based spoken dialogue agent. Incrementality is central todeveloping a naturally conversing spoken dialogue systems.Incrementality makes the conversations more natural and efficientcompared to non-incremental alternatives. The performance of the Evewas found to be comparable to human performance and she convenientlyoutperforms alternative non-incremental architectures. However,building such a system is not trivial. It needs high-performancearchitectures and dialogue components (ASR, dialogue policies,language understanding etc.). I will also speak about future plans forenhancing Eve's capability. I also take a slight deviation and explorea different word level natural language understanding model forreference resolution in images in a dialogue setting. Bio: Ramesh Manuvinakurike is a Ph.D. student at USC Institute forCreative Technologies working with Prof. David DeVault and Prof.Kallirroi Georgila. He is interested in developing conversationalsystems and has developed various such systems. His work with hiscolleagues on agent Eve won 'Best paper' award at Sigdial 2015.
28 Oct 2016	Yu Su (UCSB)	Learning from Zero: Recent Advances in Bootstrapping Semantic Parsers using Crowdsourcing Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Semantic parsing, which parses natural language into formal languages, has been applied to a wide range of structured data like relation databases, knowledge bases, and web tables. To learn a semantic parser for a new domain, the first challenge is always how to collect training data. While data collection using crowdsourcing has become a common practice in NLP, it's a particularly challenging and interesting problem when it comes to semantic parsing, and is still in its early stages. Given a domain and a formal language, how can we generate meaningful logical forms in a configurable way? How to design the annotation task so that crowdsourcing workers, who do not understand formal languages, can handle with ease? How can we exploit the compositional nature of formal languages to optimize the crowdsourcing process? In this talk I will introduce some recent advances in this direction, and present some preliminary answers to the above questions. The covered works mainly concern knowledge bases, but we will also cover some ongoing work concerning web APIs. Yu Su is a fifth year PhD candidate in the Computer Science Department at UCSB, advised by Professor Xifeng Yan. Before that, He received a bachelor degree from Tsinghua University in 2012, with a major in Computer Science. He is interested in the interplay between language and formal meaning representations, including problems like semantic parsing, continuous knowledge representation, and natural language generation. He also enjoys applying deep learning on these problems.
21 Oct 2016	Marjan Ghazvininejad and Yonatan Bisk (USC/ISI)	EMNLP practice talk: 1) Generating Topical Poetry & 2) Unsupervised Neural Hidden Markov Models Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: 1) In this talk I describe Hafez, a program that generates any number of distinct poems on a user-supplied topic. Poems obey rhythmic and rhyme constraints. I describe the poetry-generation algorithm, give experimental data concerning its parameters, and show its generality with respect to language and poetic form.2) In this work, we present the first results for neuralizing an Unsupervised Hidden Markov Model. We evaluate our approach on tag induction. Our approach outperforms existing generative models and is competitive with the state-of-the-art though with a simpler model easily extended to include additional context. Marjan Ghazvininejad is a PhD student at ISI working with Prof. Kevin Knight.Yonatan Bisk is a Postdoc at ISI working with Prof. Daniel Marcu.
14 Oct 2016	Xing Shi (USC)	EMNLP practice talk: Understanding Neural Machine Translation: length control and syntactic structure Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Neural Machine Translation is powerful but we know little about the black box. We conduct the following two investigations to gain a better understanding: First, we investigate how neural, encoder-decoder translation systems output target strings of appropriate lengths, finding that a collection of hidden units learns to explicitly implement this functionality. Second, we investigate whether a neural, encoderdecoder translation system learns syntactic information on the source side as a by-product of training. We propose two methods to detect whether the encoder has learned local and global source syntax. A fine-grained analysis of the syntactic structure learned by the encoder reveals which kinds of syntax are learned and which are missing. Bio: Xing Shi is a PhD student at ISI working with Prof. Kevin Knight.
26 Sep 2016	Andrea Gagliano (UC Berkeley)	Poetry at the Metaphorical Intersection Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: This talk will discuss a technique to create figurative relationships using Mikolov et al.’s word vectors. Drawing on existing work on figurative language, we start with a pair of words and use the intersection of word vector similarity sets to blend the distinct semantic spaces of the two words. We conduct preliminary quantitative and qualitative observations to compare the use of this novel intersection method with the standard word vector addition method for the purpose of supporting the generation of figurative language. To showcase this technique, we use it to write computer generated sonnets.BioAndrea Gagliano is a masters student at UC Berkeley's School of Information and the Berkeley Center for New Media. Her research explores the use of computation for creativity - both tools to support creative practices and generation of creative works. Recently, she has been focusing in the field of natural language processing by working on poetry and metaphor generation. Previously, Andrea received her BS in Mathematics and BA in Business Administration from the University of Washington in 2013. During her studies, she spent time with the Creative Writing department studying poetry.
19 Sep 2016	Burr Settles (Duolingo)	Duolingo: Improving Language Learning and Assessment with Data Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Duolingo is a language education platform with more than 150 million students worldwide. Our flagship learning app is the #1 way to learn a language online, and is the most-downloaded education app for both Android and iOS devices. It is also completely free. In this talk, I will describe the Duolingo system and several empirical projects, which mix machine learning with computational linguistics and psychometrics to improve learning, engagement, and even language proficiency assessment through our products. Burr Settles is a scientist, engineer, and head of research at Duolingo: the most widely used education application in the world, teaching 20 languages to more than 150 million users worldwide. He is also the principal developer of the Duolingo English Test: a computer-adaptive proficiency exam that aims to disrupt and democratize the global certification marketplace through highly accessible mobile technology. Before joining Duolingo, he earned a PhD in computer sciences at University of Wisconsin-Madison, and then worked as a postdoctoral research scientist at Carnegie Mellon University, where his work has spanned machine learning, natural language processing, and computational social science. His 2012 book Active Learning is now the standard text on learning algorithms that are adaptive, curious, or exploratory (if you will). Burr gets around by bike and (among other things) plays guitar in the pop band delicious pastries.
16 Sep 2016	Zachary Chase Lipton (UCSD)	Efficient Exploration for Dialog Policy Learning with BBQ Networks & Replay Buffer Spiking Time: 1:30 pm - 2:30 pm Location: 6th Floor Large Conference Room [689] Abstract: When rewards are sparse and efficient exploration essential, deep Q-learning with ϵ-greedy exploration tends to fail. This poses problems for otherwise promising domains such as task-oriented dialog systems, where the primary reward signal, indicating successful completion, typically occurs only at the end of each episode but depends on the entire sequence of utterances. A poor agent encounters such successful dialogs rarely, and a random agent may never stumble upon a successful outcome in reasonable time. We present two techniques that significantly improve the efficiency of exploration for deep Q-learning agents in dialog systems. First, we demonstrate that exploration by Thompson sampling, using Monte Carlo samples from a Bayes-by-Backprop neural network, yields marked improvement over standard DQNs with Boltzmann or ϵ-greedy exploration. Second, we show that spiking the replay buffer with a small number of successes, as are easy to harvest for dialog tasks, can make Q-learning feasible when it might otherwise fail catastrophically. Bio: I am a graduate student in the Artificial Intelligence Group at the University of California, San Diego on leave for two quarters at Microsoft Research Redmond. I work on machine learning, focusing on deep learning methods and applications. In particular, I work on modeling sequential data with recurrent neural networks and sequential decision-making processes with deep reinforcement learning. I'm especially interested in research impacting medicine and natural language processing. Recently, in Learning to Diagnose with LSTM RNNs, we trained LSTM RNNs to accurately predict patient diagnoses using only lightly processed time series of sensor readings in the pediatric ICU. Before coming to UCSD, I completed a Bachelor of Arts with a joint major in Mathematics and Economics at Columbia University. Then, I worked in New York City as a jazz musician. I have interned with Amazon's Core Machine Learning team and Microsoft Research's Deep Learning Team.
09 Sep 2016	Nada Aldarrab (USC)	How we Cracked the “Borg” Cipher + First Steps Towards Deciphering from Images Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: European libraries are filled with undeciphered historical manuscripts from the 16th-18th centuries. These documents are enciphered with classical methods, which puts their contents out of the reach of historians who are interested in the history of that era.In this talk, we show how we automatically cracked a 400-page book from the 17th century. We also describe a system aimed at deciphering from camera-phone images. We show initial results for different ciphers. Bio: Nada is a graduate student at USC, working on her thesis under the supervision of Prof. Kevin Knight. She is currently working on the decipherment of historical documents (joint project with Uppsala University, Sweden). Her research interests include natural language processing, machine learning, decipherment and machine translation.1
26 Aug 2016	Ke Tran (ISI Intern)	Unsupervised learning linguistic structures with deep neural networks Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: We present a general framework for unsupervised learning that combines probalistic graphical models with the power of deep nets. We employ a neuralized expectation miminization algorithm for learning. We apply this framework for unsupervised sequential tagging and show some interesting results. Bio: Ke is a PhD candidate at University of Amsterdam. He is interning at ISI, working with Yonatan Bisk, Ashish Vaswani, Kevin Knight, and Daniel Marcu. His research focuses on deep learning and machine translation.
19 Aug 2016	Xiang Li (ISI Intern)	Event extraction from AMR representations Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: How to use NLP techniques to help medical researchers is crucial now. And making use of millions of medical passages is a good starting point. By doing this, we can extract useful information from these papers and help medical researchers a lot.I’ll introduce a simple method to extract relations between proteins using AMR. By using this rule-base system, we can get AMR representation to simplified AMR(SMR) which only contains protein relation information. Bio: Xiang Li(Lorraine) is a 2016 summer intern under the supervision of Prof Kevin Knight and Prof Daniel Marcu. She is also going to be a PhD student at the University of Massachusetts Amherst in Andrew McCallum’s research group in this coming Fall. She got her B.S at the East China Normal University, Shanghai, China and got her M.S at the University of Chicago. Her research interest mainly focused on natural language processing and machine learning.
03 Aug 2016	Angeliki Lazaridou (University of Trento)	Can machines understand and generate stories? Time: 11:00 am - 11:59 am Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Computational creativity is an emerging field of AI, with linguistic creativity being an interesting test-bed for developing and evaluating machines with reasoning capabilities. A concrete example is story generation and understanding, a task which unlike the vast majority of traditional NLP that treats sentences in isolation, requires deep understanding of the general context and discourse of stories.In this talk, I will present some preliminary steps towards this goal and show how sequence-to-sequence models can be applied to this task. Overall, our results on story understanding are on par with current state-of-the-art (that nevertheless have no generative capabilities), while at the same time producing sometimes rather amusing story endings. Bio: Angeliki is a final year PhD student at the Center for Mind/Brain Sciences of the University of Trento. She received her MSc from the Saarland University, where she worked with Ivan Titov and Caroline Sporleder on Bayesian models for sentiment and discourse. She is currently working at the intersection between language and vision under the supervision of Marco Baroni. Webcast: http://webcastermshd.isi.edu/Mediasite/Play/6f51b67c1a304a0c83297dd2f9b453921d
29 Jul 2016	Sebastian Mielke (ISI Intern)	Let's not be clever: simple pre- and post-processing tricks in machine translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Today's machine translation system are highly complex and extending them often means leaving highly sophisticated solutions and established algorithms behind. Therefore it is attractive to try to extend the process outside of the translation system: in pre- and post-processing steps.I will show a pre-processing step for helping to translate tweets and a post-processing step that helps "guess" the translations of unknown and thus untranslated words in arbitrary sentences using dictionaries and other resources. Bio: Sebastian is currently pursuing a CS masters degree in Dresden, Germany with Prof. Heiko Vogler, taking a break from studying to work on low-resource machine translation with Prof. Kevin Knight and Prof. Daniel Marcu as an ISI intern in 2016.
22 Jul 2016	Stephen Rawls / Huaigu Cao (ISI)	Title: LSTM's for OCR Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: We present ongoing research into OCR for both machine print and handwriting recognition. We utilize a neural network along with LSTM's to perform OCR directly from pixel intensity. We are exploring a few novel improvements, including using a CNN for feature extraction prior to the LSTM, and combining reinforcement learning into our training to directly optimize word error rate in our test-time decoding procedure, which utilizes a (non-differentiable) language-model based decoding of the LSTM output. Finally, we present the design of the OCR system we used to win a pilot project with the US Census for recognizing handwritten first and last names. Bio: Stephen Rawls is a research programmer and a PhD student at USC/ISI advised by Dr. Prem Natarajan. He works in the Computer Vision group at ISI on face recognition and OCR, among other projects. Huaigu Cao is a computer scientist at USC ISI. His interest of research includes image processing and pattern recognition.
15 Jul 2016	Xiang Li (ISI Intern)	Title: Commonsense Knowledge Base Completion Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Abstract: We enrich a curated resource of commonsense knowledge by formulating the problem as one of knowledge base completion (KBC). Most work in KBC focuses on knowledge bases like Freebase that relate entities drawn from a fixed set. However, the tuples in ConceptNet (Speer and Havasi, 2012) define relations between an unbounded set of phrases. We develop neural network models for scoring tuples on arbitrary phrases and evaluate them by their ability to distinguish true held-out tuples from false ones. We find strong performance from a bilinear model using a simple additive architecture to model phrases. We manually evaluate our trained model’s ability to assign quality scores to novel tuples, finding that it can propose tuples at the same quality level as medium- confidence tuples from ConceptNet. Bio: Xiang Li is a 2016 summer intern under the supervision of Prof Kevin Knight and Prof Daniel Marcu. She is also going to be a PhD student at the University of Massachusetts Amherst in Andrew McCallum’s research group in this coming Fall. She got her B.S at the East China Normal University, Shanghai, China and got her M.S at the University of Chicago. Her research interest mainly focused on natural language processing and machine learning. This work is done when she was in Chicago working with Prof Kevin Gimpel at TTIC(Toyota Technological Institute at Chicago)
08 Jul 2016	Aliya Deri (USC/ISI)	Title: Grapheme-to-Phoneme Models for (Almost) Any Language Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Grapheme-to-phoneme (g2p) models are rarely available in low-resource languages, as the creation of training and evaluation data is expensive and time-consuming. We use Wiktionary to obtain more than 650k word-pronunciation pairs in more than 500 languages. We then develop phoneme and language distance metrics based on phonological and linguistic knowledge; applying those, we adapt g2p models for high-resource languages to create models for related low-resource languages. We provide results for models for 229 adapted languages. Bio: Aliya Deri is a PhD candidate in Computer Science at USC, advised by Professor Kevin Knight.
23 Jun 2016	Yue Zhang (Singapore University of Technology and Design)	Title: Neural network models for structured prediction Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Abstract: Transition-based methods leverage non-local features for structured tasks. When combined with beam search and global structure learning, they give high accuracies for a number of NLP tasks. We investigate the effectiveness of neural network models for transition-based parsing and Chinese word segmentation. Results show that automatic features induced by neural models give higher accuracies than carefully designed manual features. The beam search and perceptron learning framework of Zhang and Clark (2011) can be used with neural network models. However, large margin training does not always work. When the number of labels are many, a maximum likelihood training objective with contrastive estimation learning gives better accuracies. Bio: Yue Zhang is currently an assistant professor at Singapore University of Technology and Design. Before joining SUTD in July 2012, he worked as a postdoctoral research associate in University of Cambridge, UK. Yue Zhang received his DPhil and MSc degrees from University of Oxford, UK, and his BEng degree from Tsinghua University, China. His research interests include natural language processing, machine learning and artificial Intelligence. He has been working on statistical parsing, parsing, text synthesis, machine translation, sentiment analysis and stock market analysis intensively. Yue Zhang serves as the reviewer for top journals such as Computational Linguistics, Transaction of Association of Computational Linguistics and Journal of Artificial Intelligence Research. He is also PC member for conferences such as ACL, COLING, EMNLP, NAACL, EACL, AAAI and IJCAI. Recently, he was the area chairs of COLING 2014, NAACL 2015 and EMNLP 2015.
10 Jun 2016	Yoav Goldberg (Bar Ilan University)	Title: Doing stuff with LSTMs Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: While deep learning methods in NLP are arguably overhyped, recurrent neural networks (RNNs), and in particular LSTM networks, emerge as very capable learners for sequential data. Thus, my group started using them everywhere. After briefly explaining what they are and why they are cool, I will describe some recent work in which we use LSTMs as a building block: learning a shared representation in a multi-task setting; learning feature representations for syntactic parsing; and learning to detect hypernyms in a large corpus. Most work achieve state of the art results. I will also describe a work which reviewers seem to hate but I really like in which we try to shed some light on what's being captured by LSTM-based sentence representations. Bio: Yoav Goldberg is a senior lecturer in Computer Science at Bar Ilan University, Israel, working on natural language processing. Prior to that he was a research scientist at Google. Before deep learning took over he used to work on syntactic parsing and structured prediction. He still does, but now he uses some new shiny tools which he is trying to understand and refine. Live here: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=3d82a6274df44b89a94f376c0c9630f71d
03 Jun 2016	Ke Tran (University of Amsterdam)	Title: Memorization and Exploration in Recurrent Neural Language Models Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: In this talk, I will focus on two important aspects in language modeling: memorization and exploration.First, I will present Recurrent Memory Network, a recurrent language model augmented with an external memory block. I will show that by explicitly addressing the memory, RMN not only amplifies the power of recurrent neural network but also facilitate our understanding of its internal functioning and allows us to discover underlying patterns in data. Furthermore, our experiments demonstrate that using external memory allows RMN capturing sentence coherence better than previous models on sentence completion task.In context of language generation (e.g. using conditional recurrent language models), memorization might hurt the performance of the whole system especially when recurrent models start hallucinating. In the second part, I will present preliminary findings in training neural machine translation (NMT) to avoid this pitfall. Particularly, we allow NMT to explore during training using REINFORCE/deep Q-network/minimum risk training. Bio: Ke is a third year PhD candidate at University of Amsterdam, advised by Christof Monz and Arianna Bisazza. Before that, he received Msc degree from University of Groningen and Charles University in Prague. He is interested in neural machine translation.
20 May 2016	Yonatan Bisk (ISI)	Title: Natural Language Communication with Computers Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Abstract: We propose a framework for devising testable algorithms for bridging the communication gap between humans and robots. We begin with a setting in which humans give instructions to robots using unrestricted language commands, with instruction sequences aimed at building complex goal configurations in a blocks world. I will present details of our data-collection effort, and preliminary results on action understanding. Time permitting, I will present new baseline results for flipping the semantic parsing paradigm to address the problem of language generation, where a human performs commands produced by a machine to demonstrate basic two-way communication. Bio: Yonatan Bisk received his PhD from UIUC in 2015 under Professor Julia Hockenmaier and is now a Postdoc with Daniel Marcu at ISI.
13 May 2016	Angeliki Lazaridou (University of Trento)	Towards Multi-Agent Communication-Based Language Learning Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: One of the most ambitious goals of AI is to develop intelligent conversational agents able to communicate with humans and assist them in their tasks. Thus, communication and interaction should be at the core of the learning process of these agents; failure to integrate communication as their main building block raises concerns regarding their usability.In this talk, I will propose an interactive multimodal framework for language learning. Instead of being passively exposed to large amounts of natural text, our learners (implemented as feed-forward neural networks) engage in cooperative referential games starting from a tabula rasa setup, and thus develop their own language from the need to communicate in order to succeed at the game. Preliminary experiments provide promising results, but also suggest that it is important to ensure that agents trained in this way do not develop an ad-hoc communication code only effective for the game they are playing. Bio: Angeliki is a final year PhD student at the Center for Mind/Brain Sciences of the University of Trento. She received her MSc from the Saarland University, where she worked with Ivan Titov and Caroline Sporleder on Bayesian models for sentiment and discourse. She is currently working at the intersection between language and vision under the supervision of Marco Baroni.
06 May 2016	Gully Burns (ISI)	Title: The TechKnAcq Project: Building Pedagogically Tuned Reading Lists from Technical Corpora Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: This work is geared towards developing pedagogically-tuned information retrieval systems to help learners select the most informative documents as a reading list for a given query over a given technical corpus. This work will enable learners to understand complex subjects more quickly. I will discuss our overall methodology, our efforts to study dependency between topics within a technical corpus and improvements to evaluating topic quality. I will describe ongoing efforts to study a document's pedagogical value to the end user and future directions for this enterprise. Bio: Gully Burns' focus is to develop pragmatic knowledge engineering systems for scientists in collaboration with experts from the field of AI. He was originally trained as a physicist at Imperial College in London before switching to do a Ph.D. in neuroscience at Oxford. He came to work at USC in 1997, developing the 'NeuroScholar' project in Larry Swanson's lab before joining the Information Sciences Institute in 2006. He is as Research Lead at ISI.
29 Apr 2016	Zhengping Che (USC)	Deep learning solutions to computational phenotyping in health care Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Exponential growth in electronic health care data has resulted in new opportunities and urgent needs to discover meaningful data-driven representations and patterns of diseases. Recent rise of this research field with more available data and new applications also has introduced several challenges. In this talk, we will present our deep learning solutions to address some of the challenges. First, health care data is inherently heterogeneous, with a variety of missing values and from multiple data sources. We propose variations of Gated Recurrent Unit (GRU) to explore and utilize the informative missingness in health care data, and hierarchical multimodal deep models to utilize the relations between different data sources. Second, model interpretability is not only important but necessary for care providers and clinical experts. We introduce a simple yet effective knowledge distillation approach called interpretable mimic learning to learn interpretable gradient boosting tree models while mimicking the performance of deep learning models. Bio: Zhengping Che is a third year PhD candidate in the Computer Science Department at the University of Southern California, advised by Professor Yan Liu. Before that, he received his bachelor degree in Computer Science from Pilot CS Class (Yao Class) at Tsinghua University, China. His primary research interest lies in the area of deep learning and its applications in health care domain, especially on multivariate time series data.
15 Apr 2016	Morteza Dehghani (USC)	Decoding Neuro-Semantic Representation of Stories across Languages Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Understanding how conceptual knowledge is represented and organized in the human brain is one of the core problems of cognitive science, and many studies have aimed at exploring and understanding the similarities of neuro-semantic representations of concepts. A general approach that has been particularly fruitful in this domain is the investigation of the relationship between various corpus statistics of words and neural activity during exposure to those words. In this work, we examine the neuro-semantic representations of stories across three different languages. We demonstrate that using new advances in vector-based representation of text and paragraphs, fMRI signals can be reliably mapped to story representations. We also show that such representations can capture common neuro-semantic representation of stories across different languages. Finally, performing search-light analysis using over a billion regressions, we show that activation patterns in the default mode network of the brain are the most reliable features for decoding stories. Bio: Morteza is an Assistant Professor of psychology, computer science and the Brain and Creativity Institute at University of Southern California. His research spans the boundary between psychology and artificial intelligence, as does his education. His work investigates properties of cognition by using documents of the social discourse, such as narratives, social media, transcriptions of speeches and news articles, in conjunction to behavioral studies.
08 Apr 2016	Hao Wu (USC/ISI)	Learning Distributed Representations from Network Data and Human Navigation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: The increasing growth of network data such as linked documents on the Web and social networks, has imposed great challenges on automatic data analysis. We study the problem of learning representations of network data, which is of critical for applications including data classification, ranking and link prediction. We present neural network embedding algorithms to learn distributed representations of network data that capture the deep context of each data point, and human cognition in navigation data. To improve the scalability of our algorithms, we use efficient optimization and sampling methods. Bio: Hao Wu is a PhD student at USC/ISI, advised by Kristina Lerman.
01 Apr 2016	Julian McAuley (UCSD)	Harnessing reviews to build richer models of opinions Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Online reviews are often our first port of call when considering products and purchases online. Yet navigating huge volumes of reviews (many of which we might disagree with) is laborious, especially when we are interested in some niche aspect of a product. This suggests a need to build models that are capable of capturing the complex and idiosyncratic semantics of reviews, in order to build richer and more personalized recommender systems. In this talk I'll discuss three such directions: First, how can reviews be harnessed to better understand the dimensions (or facets) of people's opinions? Second, how can reviews be used to answer targeted questions, that may be subjective or require personalized responses? And third, how can reviews themselves be synthesized, so as to predict what a reviewer would say, even for products they haven't seen yet? Bio: Dr. McAuley has been an Assistant Professer in the Computer Science Department at the University of California, San Diego since 2014. Previously he was a postdoctoral scholar at Stanford University after receiving his PhD from the Australian National University in 2011. His research is concerned with developing predictive models of human behavior using large volumes of online activity data.
25 Mar 2016	Jonathan Kummerfeld (Berkeley)	Capturing More Linguistic Structure with Graph-Structured Parsing Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: The correct interpretation of any sentence is obscured by a vast array of alternatives. Previous work on disambiguating meaning has focused on representations of syntax using tree structures. Simplifying syntax in this way often means leaving out long-distance relations between words, providing less information to downstream tasks such as dialog and question answering. We propose a new algorithm that is able to efficiently search over graph structures, fully capturing argument structures as a directed acyclic graph. Our dynamic program uniquely decomposes structures, and is sound and complete with respect to the class of one-endpoint crossing graphs. Bio: Jonathan is a Ph.D. candidate at UC Berkeley working on natural language processing with Dan Klein. His research focuses on new algorithms for interpreting text and analyzing system behavior. In particular, he has built search-based error analysis tools for syntactic parsing and coreference resolution, and a graph-based syntactic parser.
11 Mar 2016	Sahil Garg (USC/ISI)	Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: We advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surface- and syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions. Bio: Sahil Garg is a PhD student, advised by Prof. Aram Galstyan, in computer science department of Viterbi school of engineering at University of Southern California. He is interested in problem oriented research. In the past, he developed machine learning, information theoretic algorithms for real world problems such as sensing environmental dynamics using mobile robotic sensors. In this talk, he is going to discuss his recent work on extracting bio-molecular interactions from bio-medical text using semantic parsing, especially in relevance to Cancer disease.
04 Mar 2016	David Jurgens (Stanford)	Linguistic Annotation Using Video Games with a Purpose Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Building systems that understand human language often requires access to large amounts of text annotated with all the features and nuances of human communication. However, building these annotated corpora is often prohibitive due to the time, cost, and expertise required to annotate. While crowdsourcing the work can help, untrained workers still incur costs and the workers may not be as motivated to answer correctly. In this talk, I will describe how to solve this annotation bottleneck using video games in which traditional annotation tasks are transformed into core video game mechanics and embedded in the kinds of games you might play on your mobile phone. Our video games are not only fun to play but are capable of annotating a wide variety of linguistic phenomena at costs lower that crowdsourcing and have quality equal to that of experts. Using four games, I will demonstrate how their creation process can be distilled into reusable design patterns to create new games for different types of tasks in linguistics and beyond. Bio: David Jurgens is postdoctoral scholar in the department of Computer Science at Stanford University. He received his PhD in Computer Science from UCLA in 2014 and has been a visiting researcher at HRL Laboratories, research scientist at Sapienza University of Rome and postdoctoral scholar at McGill University. His research focuses on two areas: natural language processing, where he works on new methods for understanding the meaning of text, and computational social science where he investigates population dynamics through peoples' language and demographics. He is currently a co-chair of the International Workshops on Semantic Evaluation (SemEval) and of the workshop on Natural Language Processing and Computational Social Science. His research has been featured in Forbes, MIT Technology Review, Business Insider, and Schneier on Security.
26 Feb 2016	Angel Chang (Stanford)	Interactive scene design using natural language Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Designing 3D scenes is currently a creative task that requires significant expertise and effort in using complex 3D design interfaces. This design process starts in contrast to the easiness with which people can use language to describe real and imaginary environments. We present an interactive text to 3D scene generation system that allows a user to design 3D scenes using natural language. A user provides input text from which we extract explicit constraints on the objects that should appear in the scene. Given these explicit constraints, the system then uses a spatial knowledge base learned from an existing database of 3D scenes and 3D object models to infer an arrangement of the objects forming a natural scene matching the input description. Using textual commands the user can then iteratively refine the created scene by adding, removing, replacing, and manipulating objects. Bio: Angel Chang recently received her PhD after working in the Stanford NLP group where she was advised by Chris Manning. Her research focuses on the intersection of natural language understanding, computer graphics, and AI. She is currently a visiting expert at Tableau Research. More details at http://stanford.edu/~angelx/ Webcast link: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=735bfbb4ba1a4b749fe591958f837ccb1d
19 Feb 2016	Ehsan Ebrahimzadeh (UCLA)	Chasing vaccination in social media: Narrative discovery from an unstructured corpus of text Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The 2014-2015 measles outbreak in California was a serious public health crisis. Health officials attributed the outbreak to the increasing number of children whose parents had secured exemptions from vaccination for various vaccine-preventable diseases (VPDs). We believe that exemption seeking is part of a broader culture of distrust driven in large part by stories circulating in social media. An under- standing of the dynamics of this broader culture is necessary if we are to develop health policies that do not simply address outcomes but rather the cultural basis for decisions leading to those outcomes. We reveal the dynamics of exemption seeking and the greater culture of distrust endemic to these sites by developing a generative statistical-mechanical model where stories are represented as net- works with actants such as parents, medical professionals, and religious institutions as nodes, and their various relationships as edges. We estimate the latent but unknown stories circulating on these sites by modeling the posts as a sampling of the hidden story graph. Working with a data set of ≈2 million posts crawled from parent- ing sites over a ≈5 year period, we uncover a strong, persistent story signal in which parents, driven by a distrust of government and medical institutions, devise strategies to secure exemptions for their children from required vaccinations. In these stories, it is the vaccines and not the VPDs that pose a threat to the children. Our method of analyzing social media conversations and the exchange of stories at scale can provide an alert mechanism to health officials, help lay the groundwork for devising community-specific messaging interventions, and inform policy making. Bio: Ehsan Ebrahimzadeh is a PhD candidate in the Electrical Engineering Department of UCLA, where he is simultaneously working towards my his degree in Applied Mathematics. Broadly speaking, he is interested in Statistics, Applied probability, and Data Analytics. Before joining UCLA in 2013, he received his MASc degree in Electrical Engineering from University of Waterloo, and BSc degrees in Mathematics and Electrical Engineering from Isfahan University of Technology.
12 Feb 2016	Thang Luong (Stanford)	Recent Advances in Neural Machine Translation Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Neural Machine Translation (NMT) is a simple new architecture for getting machines to learn to translate. At its core, NMT is a single big recurrent neural network that is trained end-to-end with several advantages such as simplicity and generalization. Despite being relatively new, NMT has already been showing promising results in various translation tasks. In this talk, I will give an overview of NMT and highlight my recent work on (a) how to address the rare word problem in NMT, (b) how to improve the attention (alignment) mechanism, and (c) how to leverage data from other modalities to improve translation. Bio: Thang Luong is currently a 5th-year PhD student in the Stanford NLP group under Prof. Chris Manning. In the past, he has published papers on various different NLP-related areas such as digital library, machine translation, speech recognition, parsing, psycholinguistics, and word embedding learning. Recently, his main interest shifts towards the area of deep learning using sequence to sequence models to tackle various NLP problems, especially neural machine translation. He has built state-of-the-art (academically) neural machine translation systems both at Google and at Stanford.
05 Feb 2016	Linhong Zhu (ISI)	Deciphering Dark Web through k-partite Graph Summarization Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Facts and their relations extracted from web are commonly modeled as graphs with different types of vertices. In this work, we focus on the problem of revealing latent entities from a $k$-partite graph, by co-clustering $k$ types of different vertices. We propose a CoSum approach, which creates a summary graph, where each super node (a cluster of original vertices) represents a hidden entity and the weighted edges encode important relations among extracted entities. The resulted summary graph also allows for investigation and interpretation of hidden entities. Evaluation verifies that CoSum outperforms several baselines in terms of entity coherence, query supporting and recovering hidden victims in the applied human trafficking domain. Bio: Linhong Zhu is currently a computer scientist at Information Sciences Institute, University of Southern California, where she also received training as a Postdoctoral Research Associate. Before that, she worked as a Scientist-I in data analytics department at Institute for Infocomm Research, Singapore. She obtained her Ph.D. degree in computer engineering from Nanyang Technological University, Singapore in 2011. Her research interests are large-scale graph analytics with applications to social network analysis, social media analysis, and predictive modeling. She has been awarded with University of Southern California Postdoctoral travel and training award in 2014 and her paper has been selected as two of the best papers in SIGMOD 2010.
29 Jan 2016	Reid Swanson (USC/ICT)	Leveraging the Social Web to Enable Open-Domain Interactive Storytelling Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Storytelling is an integral part of human interaction and critical to nearly all forms of entertainment. Since the introduction of TALE-SPIN over thirty years ago, automating the process of storytelling has been an active area of research. However, despite the incredible advances in other areas of computer science, such as 3D graphics and computational physics, that have enabled dazzling immersive interactive environments, there has been little progress in delivering automated stories that have the richness and complexity we expect in this genre of discourse.In this talk I will primarily discuss work done during my thesis that leverages the vast amounts of knowledge hidden implicitly in the social web in order to enable a text-based open-domain interactive storytelling system. In this system the human and computer take turns writing sentences of an emerging fictional story on any topic the author chooses. The system uses an architecture inspired by case-based reasoning with a knowledge base of over a million personal stories about the daily lives and experiences of ordinary people. At each turn the system selects a sentence from the corpus that tries to maximize the semantic and discourse coherence given the text of the story so far.I will also describe how crowd-sourcing communities were used to collect thousands of collaborative stories with the system and tens of thousands of ratings from hundreds of participants on several subjective evaluation criteria. The best models show significant improvements over the baseline and are judged to be indistinguishable from entirely human written weblog stories from a held out part of the collection.I will conclude with some more recent and ongoing research that examines additional methods of evaluation and new models of narrative generation based on Recurrent Neural Networks. Bio: Reid Swanson received his PhD in Computer Science from the University of Southern California in 2010 where he focused on a large-scale text-based interactive storytelling system. His primary research interest is in large-scale open-domain interpretation and generation of interactive narratives.After graduating he spent a year at the Walt Disney Imagineering Research & Development lab in Glendale, CA. At Disney he worked with an interdisciplinary team of industry engineers, academics, artists and performers to develop technologies for bringing persistent interactive storytelling to select groups of guests at their theme parks and resorts.From 2011 until 2015, Reid worked as a postdoc at UC Santa Cruz where he participated in a range of different projects. As part of the SIREN project, with Arnav Jhala, he investigated games for teaching conflict resolution management. On the SSIM project, with Michael Mateas, he helped research and develop virtual training environments targeting the military and law enforcement agencies to help prevent conflict escalation in unknown social environments. With Marilyn Walker, he also investigated automated methods for analyzing and mining prototypical arguments on internet debate forums about controversial topics such as gun control, gay marriage and evolution.In August of 2015 he rejoined the Institute of Technologies as a Research Scientist where he is researching the role of narrative structure in the persuasiveness of an intended message embedded in the story across different cultures.
22 Jan 2016	Jiwei Li (Stanford)	Extracting User Information from Online Social Media Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: The overwhelming popularity of online social media creates an unprecedented opportunity to display aspects of oneself. Inferring information about these users has the potential to benefit manydownstream applications such as recommendation engines and targeted advertising. In this talk I will show how to extract important personal information such as major life events and personal attributes (e.g., gender, education, job) from social evidence such as the text produced by users and their friends and from properties of their social network. I will describe algorithms making use of a variety of frameworks, including distant supervision, and a deep learning architecture that learns user representations by integrating many heterogeneous social signals. Bio: Jiwei Li is a PH.D. student in the computer science department at Stanford University, working with Prof. Dan Jurafsky. His research interests include discourse, language generation, and social networks, with a focus on deep learning methods. Jiwei receives his B.S. from Peking University in 2012. He was rewarded the Facebook Fellowship in 2015. Webcast link: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=6b5348f2f8dc4a4dbb595eca444410d51d
15 Jan 2016	Gabor Angeli (Stanford)	Learning Open Domain Knowledge From Text Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: The increasing availability of large text corpora holds the promise of acquiring an unprecedented amount of knowledge from this text. However, current techniques are either specialized to particular domains or do not scale to large corpora. This dissertation develops a new technique for learning open-domain knowledge from unstructured web-scale text corpora. A first application aims to capture common sense facts: given a candidate statement about the world and a large corpus of known facts, is the statement likely to be true? We appeal to a probabilistic relaxation of natural logic -- a logic which uses the syntax of natural language as its logical formalism -- to define a search problem from the query statement to its appropriate support in the knowledge base over valid (or approximately valid) logical inference steps. We show a 4x improvement at retrieval recall compared to lemmatized lookup, maintaining above 90% precision. This approach is extended to handle longer, more complex premises by segmenting these utterance into a set of atomic statements entailed through natural logic. We evaluate this system in isolation by using it as the main component in an Open Information Extraction system, and show that it achieves a 3% absolute improvement in F1 compared to prior work on a competitive knowledge base population task. A remaining challenge is elegantly handling cases where we could not find a supporting premise for our query. To address this, we create an analogue of an evaluation function in gameplaying search: a shallow lexical classifier is folded into the search program to serve as a heuristic function to assess how likely we would have been to find a premise. Results on answering 4th grade science questions show that this method improves over both the classifier in isolation and a strong IR baseline, and achieves the best published results on the task. Bio: Gabor is a new graduate from Chris Manning's natural language processing lab. He graduated with a BS in electrical engineering/computer science from UC Berkeley in 2010, and defended his Ph.D. in the fall of 2015. His research focuses on natural language understanding, ranging from relation extraction and knowledge base population, textual entailment, common-sense reasoning, and question answering. He has led the Stanford knowledge base population project for the past three years, with Stanford ranking 5th, 1st, and 1st (tied) among teams participating in the TAC-KBP competition over those three years. In addition to publications at ACL, EMNLP and NAACL, he co-authored an EMNLP best dataset paper on collecting a large dataset for textual entailment. Outside of academia, he was the NLP architect for Baarzo in 2014 (acquired by Google), and is currently a fellow at XSeed Capital. In his free time, Gabor enjoys hiking, board games, and binge-watching Netflix shows.
04 Dec 2015	Eli Pincus (USC / ICT)	What Can We Learn From An Agent that Plays Word-Guessing Games? Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: In this talk I will discuss an agent that can play a simple word-guessing game with a user. The fast-paced, multi-modal, and interactive nature of the dialogue that takes place in word-guessing games are challenging for today’s dialogue systems to emulate. The agent serves as a research testbed to explore issues of fast-paced incremental interaction and user satisfaction in such a setting. I will trace how the agent's design was motivated by a human-human corpus as well as discuss two empirical studies involving the agent. The first study was designed to learn an algorithm to automatically select effective clues (clues likely to elicit a correct guess from a human). The second study was an evaluation of several synthetic voices and 1 human voice which showed how participant's subjective perceptions and objective task performances fluctuated based on the voice used and the duration of the participant's exposure to the voice. Bio: Eli Pincus is a 3rd year USC PhD student and a graduate research assistant in the Natural Dialogue Group at USC Institute for Creative Technologies. He is advised by Professor David Traum. Eli's main research is in human-computer dialogue. Since joining USC he has been working on improving virtual human dialogue. He won the best computer science department TA award in spring 2015. He was a research intern in the NLP and AI group at Nuance Communications in summer 2015.
20 Nov 2015	Jia Xu (Chinese Academy of Sciences	Better Bootstraps, Better Translation. Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Bagging [Breiman, 96] and its variants is one of the most popular methods in aggregating classifiers and regressors. Its original analysis assumes that the bootstraps are built from an unlimited, independent source of samples. In the real world this analysis fails because there is a limited number of training samples.We analyze the effect of intersections between bootstraps to train different base predictors, which shows that the real-world bagging behaves very differently than its ideal analog [Breiman, 96]. Most importantly, we provide an alternative subsampling method called design-bagging based on a new construction of combinatorial designs. We prove that this is universally better than bagging. Our analytical results are backed up by experiments on general classification and regression settings, and significantly improved all machine translation systems we used in the NIST-15 C-E competition. Bio: Jia Xu is an associate professor at ICT/CAS, after being an assistant professor in Tsinghua University and a senior researcher at DFKI lecturing at Saarland University in Germany. She worked at IBM Watson and MSR Redmond during her Ph.D. advised by Hermann Ney at RWTH-Aachen University. Her current research interests are in Machine Learning with a focus towards highly competitive machine translation systems, where she led and participated in teams winning first place in WMT-11, TC-Star -05-07 and NIST-08. In NIST-15 she led one more team that won 4th place, which is the 1st among academic institutions.
13 Nov 2015	Satish Kumar Thittamaranahalli (USC)	Notes on the Constraint Composite Graph Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: In this talk, I will present the idea of the constraint composite graph (CCG) associated with any combinatorial problem modeled as a weighted constraint satisfaction problem (WCSP). The CCG constitutes the first mathematical framework for simultaneously exploiting the numerical structure of the weighted constraints as well as the graphical structure of the variable-interactions in a WCSP. I will discuss a number of important applications of the CCG including its role in: (a) identification of tractable classes of WCSPs; (b) kernelization techniques for combinatorial problems; and (c) understanding the scope of incremental computation for hard combinatorial problems. Bio: Dr. Satish Kumar Thittamaranahalli (T. K. Satish Kumar) is a Research Scientist at the University of Southern California. He has published extensively on numerous topics in Artificial Intelligence spanning such diverse areas as Constraint Reasoning, Planning and Scheduling, Probabilistic Reasoning, Combinatorial Optimization, Approximation and Randomization, Heuristic Search, Model-Based Reasoning, Knowledge Representation and Spatio-Temporal Reasoning. He has served on the Program Committees of many international conferences in Artificial Intelligence and is a co-winner of the Best Student Paper Award from the 2005 International Conference on Automated Planning and Scheduling. Dr. Kumar received his PhD in Computer Science from Stanford University in March 2005. In the past, he has also been a Visiting Student at the NASA Ames Research Center, a Postdoctoral Research Scholar at the University of California, Berkeley, a Research Scientist at the Institute for Human and Machine Cognition, a Visiting Assistant Professor at the University of West Florida, and a Senior Research and Development Scientist at Mission Critical Technologies.
06 Nov 2015	Fabrizio Morbini (USC / ICT)	Text generation from abductive interpretations Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Abduction is an inference method often used to formalize what the process of interpretation is. In this talk i'll describe a system that generates a textual description of an abductive proof and its evaluation when applied to the interpretations generated for a set of 100 movies from the Heider-Simmel Interactive Theater project. The goal of the system is to generate text that explains the system's interpretation fluently without having to read or understand a proof graph and first order logic.
23 Oct 2015	Farshad Kooti (USC / ISI)	Fine-grained Temporal Patterns of Online Content Consumption Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Online activity is characterized by diurnal and weekly patterns, reflecting human circadian rhythms, sleep cycles, and social patterns of work and leisure. Using data from online social networking site Facebook, we uncover temporal patterns that take place at far shorter time scales. Specifically, we demonstrate fine-grained, within-session behavioral changes, where a session is defined as a period of time a user engages with Facebook before choosing to take a break. We show that over the course of a session, users spend less time consuming some types of content, such as textual posts, and preferentially consume more photos and videos. Moreover, users who spend more time engaging with Facebook have different patterns of session activity than the less-engaged users, a distinction that is already visible at the start of the session. We study activity patterns with respect to users’ demographic characteristics, such as age and gender, and show that age has a strong impact on within-session behavioral changes. Finally, we show that the temporal patterns we uncover help us more accurately predict the length of sessions on Facebook. Bio: I am a third-year Computer Science PhD student at the University of Southern California (USC), Information Sciences Institute (ISI) working under the supervision of Kristina Lerman. My main research interest is the study of large and complex datasets, especially data from online social networks, which includes the measurement and analysis of users' behavior in OSNs. I'm currently a Data Science intern at Facebook in Menlo Park.Before joining USC, I got my master's from Max Planck Institute for Software Systems (MPI-SWS), Germany. I worked with Krishna Gummadi as my advisor and also with Meeyoung Cha (KAIST) and Winter Mason (Facebook) during my master's. Before MPI, I got my bachelor's in Computer Engineering (Software) from University of Tehran, Iran.
09 Oct 2015	Liron Cohen (USC)	Using Highways for Bounded-Suboptimal Multi-Agent Path Finding Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Multi-agent path-finding (MAPF) is important for applications such as the kind of warehousing done by Kiva systems. Solving the problem optimally is NP-hard, yet finding low-cost solutions is important. Bounded-suboptimal MAPF algorithms, such as enhanced conflict-based search (ECBS), often do not perform well in warehousing domains with many agents. We therefore develop bounded-suboptimal MAPF algorithms, called CBS+HWY and ECBS+HWY, that exploit the problem structure of a given MAPF instance by finding paths for the agents that include edges from user-provided highways, which encourages a global behavior of the agents that avoids collisions. On the theoretical side, we develop a simple approach that uses highways for MAPF and provides suboptimality guarantees. On the experimental side, we demonstrate that ECBS+HWY can decrease the runtimes and solution costs of ECBS in Kiva-like domains with many agents if the highways capture the problem structures well. Bio: Liron received a B.S. in Computer Engineering in 2007 and an M.S. in Computer Science in 2012, both from the Hebrew University of Jerusalem. Liron is interested in combinatorial problems related to constraint-based reasoning and symbolic planning. Specifically, he is looking at novel algorithmic techniques for exploiting structure in such combinatorial problems.
02 Oct 2015	David Kale (USC / ISI)	Automated Deep Multi-Phenotyping with Noisy Labels Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: The increasing volume of electronic health records (EHR) data has spurred significant interest in the development of algorithmic phenotyping, used to identify patient cohorts in massive databases. Data-driven phenotyping, which formulates phenotyping as a statistical learning problem, offers superior scalability and generalization. Building upon previous work at Stanford, we propose a deep multi-phenotyping model: we train a single multi-task neural network to recognize multiple phenotypes, trained on noisy labels generated via an automatic process. We present preliminary results on classifying over 30 different phenotypes on a data set of over one million patients from the Stanford clinical system. This is joint work with Nigam Shah at Stanford University Center for Biomedical Informatics Research.BIO: Dave Kale is a fourth year PhD student in Computer Science and an Alfred E. Mann Innovation in Engineering Fellow at the University of Southern California. He is advised by Greg Ver Steeg. Before joining USC and ISI, he worked in the Whittier VPICU at Children's Hospital LA and co-founded the Meaningful Use of Complex Medical Data (MUCMD) Symposium. Dave holds a BS and MS from Stanford University.
11 Sep 2015	Guido Zarrella (MITRE)	Neuromorphic Language Understanding Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Recurrent neural networks are effective tools for processing natural language. RNNs can be effectively trained to perform sequence processing tasks such as translation, classification, language modeling, and paraphrase detection. However despite major gains in fields related to these power hungry artificial neural networks, it remains difficult to construct functional models of cognition inspired by biological nervous systems. In this talk I'll describe how RNNs can be trained to excel at language understanding tasks and then adapted to run on ultra-low power neuromorphic hardware which simulates the spiking of individual neurons. The result is an interactive embedded system that uses recurrent neural networks to process language while consuming an estimated .000048 watts (48 microwatts). Bio: Guido Zarrella is a Principal Artificial Intelligence Engineer at the MITRE Corporation in Denver, Colorado. He leads a R&D effort pursuing advances in deep learning for language understanding. He is a former President of the Association for Computational Linguistics, having served in this role on December 5th, 2011.
04 Sep 2015	Barret Zoph (USC/ISI)	How Much Information Does a Human Translator Add to the Original? Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: We ask how much information a human translator adds to an original text, and weprovide a bound. We address this question in the context of bilingual text com-pression: given a source text, how many bits of additional information are required to specify the target text produced by a human translator? We develop new compression algorithms and establish a benchmark task.
28 Aug 2015	Sudha Rao (Maryland / ISI Intern)	Distant supervision for relation extraction using AMR Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: In this talk I will present the work I did with Prof Daniel Marcu and Prof Kevin Knight at ISI over the summer. In this work, we show how we can improve relation extraction for biomedical text using distant supervision from existing knowledge sources like BioPax. We label the data using heuristics from AMR which obviates the need for expensive manual annotation and allows us to make use of large amounts of data for training. I will also talk about some ongoing work on training a simpler model that exploits linguistic information stored in the path via the least common ancestor in an AMR. Bio: I am a PhD student from University of Maryland, College Park working under Prof. Hal Daume III and Prof. Philip Resnik. My recent project on "Dialogue focus tracking for zero pronoun resolution" appeared at NAACL 2015. At ISI, I am working with Prof. Daniel Marcu and Prof. Kevin Knight on application of Abstract Meaning Representation (AMR) to biology literature. Specifically we will be developing techniques for constructing text level AMRs from sentence level AMRs and then assess its impact on reading-against-a-model molecular biology tasks. In my spare time, I enjoy singing, dancing and watching movies.
25 Aug 2015	Wenduan Xu (Cambridge / ISI Intern)	Using HyTER networks for short-answer scoring Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: This talk summarizes my work so far on investigating the usefulness ofHyTER networks for short-answer scoring. I will first introduce thetask and the approach we take in this project. And finally I will showsome initial results we have. Bio: Wenduan Xu is a graduate student in Cambridge advised by Stephen Clark, working on CCG parsing.
14 Aug 2015	Qing Dou (USC / ISI)	Beyond Parallel Data - A Decipherment Approach for Better Quality Machine Translation Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Thanks to the availability of parallel data and advances in machinelearning techniques, we have seen tremendous improvement in the fieldof machine translation over the past 20 years. However, due to lack ofparallel data, the quality of machine translation is still far fromsatisfying for many language pairs and domains. In general, it iseasier to obtain non-parallel data, and much work has tried to learntranslations from non-parallel data. Nonetheless, improvements tomachine translation have been limited. In this work, I follow adecipherment approach to learn translations from non parallel data andachieve significant gains in machine translation.I apply slice sampling to Bayesian decipherment. Compared with thestate- of-the-art algorithm, the new approach is highly scalable andaccurate, making it possible to decipher billions of tokens withhundreds of thousands of word types at high accuracy for the firsttime. When it comes to deciphering foreign languages, I introducedependency relations to address the problems of word reordering,insertion, and deletion. Experiments show that dependency relationshelp improve Spanish/English deciphering accuracy by over 5-fold.Moreover, this accuracy is further doubled when word embeddings areused to incorporate more contextual information.Moreover, I decipher large amounts of monolingual data to improve thestate- of-the-art machine translation systems in the scenario ofdomain adaptation and low density languages. Through experiments, Ishow that decipherment find high quality translations forout-of-vocabulary words in the task of domain adaptation, and helpimprove word alignment when the amount of parallel data is limited. Iobserve up to 3.8 point and 1.9 point BlEU gain in Spanish/French andMalagasy/English machine translation experiments respectively. Bio: Qing is a PhD candidate at USC. His research interests focus onapplication of machine learning techniques to help computer betterunderstand human languages. He is working with Kevin Knight on variousproblems related to Machine Translation and Decipherment. Prior tothat, he has worked on computational phonology, including stressprediction and transliteration. He is interested in continuing hisresearch in industrial settings to solve exciting large scale problems.
07 Aug 2015	Marius Pasca (Google)	Understanding the World's Compositional Concepts Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: Compositional topics ("Swiss passport", "German grammar") ofinterest to Web users may be available as entries within structuredknowledge resources. But such topics are not necessarily connected to,let alone represented in relation to, entries of the constituenttopics ("Switzerland" and "Passport", or "German language" and"Grammar") from which their approximate meaning could be aggregated.Web documents - more precisely, encyclopedic articles - and Web searchqueries are shown to be useful in complementary tasks relevant tounderstanding compositional topics. The tasks are the decomposition ofpotentially compositional topics into zero, one or more constituenttopics; and the interpretation of the role ("issued by", "of") playedby constituents ("Swiss", "German") within ambiguous compositionalphrases that might refer to compositional topics. Bio: Marius Pasca is a research scientist at Google in Mountain View,California. Current research interests include factual informationextraction from unstructured text within documents and queries, andits applications to Web search.
24 Jul 2015	Sudha Rao (Maryland / ISI Intern)	Dialogue focus tracking for zero pronoun resolution Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: We take a novel approach to zero pronoun resolution in Chinese: our model explicitly tracks the flow of focus in a discourse. Our approach, which generalizes to deictic references, is not reliant on the presence of overt noun phrase antecedents to resolve to, and allows us to address the large percentage of “non-anaphoric” pronouns filtered out in other approaches. We furthermore train our model using readily available parallel Chinese/English corpora, allowing for training without hand-annotated data. Our results demonstrate improvements on two test sets, as well as the usefulness of linguistically motivated features. Bio: I am a PhD student from University of Maryland, College Park working under Prof. Hal Daume III and Prof. Philip Resnik. My recent project on "Dialogue focus tracking for zero pronoun resolution" appeared at NAACL 2015. At ISI, I am working with Prof. Daniel Marcu and Prof. Kevin Knight on application of Abstract Meaning Representation (AMR) to biology literature. Specifically we will be developing techniques for constructing text level AMRs from sentence level AMRs and then assess its impact on reading-against-a-model molecular biology tasks. In my spare time, I enjoy singing, dancing and watching movies.
17 Jul 2015	Wenduan Xu (Cambridge / ISI Intern)	Shift-Reduce CCG Parsing with a Dependency Model Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: CCG is able to derive typed dependency structures, providing a usefulapproximation to the underlying predicate-argument relations of “whodid what to whom” and dependency structures form an integral part ofCCG. In this talk, I will first cover some essential background onCCG, its dependency structures and CCG parsing; I will then discuss arecent dependency model we developed for shift-reduce CCG parsing. Achallenge arises in this model from the fact that the oracle needs tokeep track of exponentially many gold-standard derivations, which areall hidden. And we solve this by integrating a packed parse forestwith the beam-search decoder and introduce a novel technique forquerying an exponentially-sized oracle on-the-fly during beam-searchdecoding. Bio: Wenduan Xu is a graduate student in Cambridge advised by Stephen Clark,working on CCG parsing.
10 Jul 2015	Deniz Yuret (Koç University / ISI Visitor)	Parsing with word vectors Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: We investigate the use of distributed word representations instead of word forms and parts of speech in syntactic parsing. Distributed representations are dense, low-dimensional, and real valued vector representations (embeddings) for words. Instead of ad-hoc feature conjunctions, we use kernels and neural networks for non-linearity, greatly simplifying feature engineering. We show that dense representations offer both computational and learning advantages compared to sparse one-hot vector representations. We introduce context vectors, distributed representations for word contexts, and show that they can replace or complement parts of speech in parsing models. We show that distributed representations give accuracies comparable to the state-of-the-art word form and part-of-speech based feature sets. Bio: Deniz Yuret is an associate professor of Computer Engineering at Koç University in Istanbul working at the Artificial Intelligence Laboratory since 2002. Previously he was at the MIT AI Lab (1988-1999) and later co-founded Inquira, Inc., a company commercializing question answering technology (2000-2002). He has worked on supervised and unsupervised approaches to syntax, morphology, lexical semantics and lexical categories. His most recent work is on creation and applications of continuous word embeddings.
26 Jun 2015	Vivi Nastase (FBK / ISI Visitor)	Metonymy resolution with multi-faceted knowledge from Wikipedia Time: 3:00 pm - 4:00 pm Location: 10th Floor Classroom [1016] Abstract: Metonymic words stand-in for concepts closely related to the words' literal interpretation. Resolving metonymies would then require identifying potentially metonymic words, finding closely related concepts, and determining which one fits the local(grammatically-related) and global context best. Each of these tasks can be resolved best by using different types of resources: a network of concepts for finding related concepts; a grammatically analyzed corpus (and, ideally, an ontology) for computing selectional preferences for the local context; a large corpus for computing co-occurrence probabilities, to factor in the global context. Within NLP we do have all these types of resources, but because of their different requirements -- e.g. relational models of meaning rely on differentiating word senses, while distributional representations do not (cannot) make such distinctions -- they are separate from one another. By using Wikipedia and exploiting its various structured/semi-structured sources of information, we can build a resource that combines the three types of meaning representations mentioned above. I will discuss the task of metonymy resolution and show how the combination of representations extracted from Wikipedia makes possible an unsupervised approach to this task. Bio: Vivi Nastase is a researcher at the Fondazione Bruno Kessler in Trento, working mainly on lexicalsemantics, semantic relations, knowledge acquisition and language evolution. She holds a Ph.D.from the University of Ottawa, Canada, and has previously worked at the Heidelberg Institute of Theoretical Studies (HITS) and the University of Heidelberg.
23 Jun 2015	Sravana Reddy (Dartmouth / ISI Visitor)	Automated tools for analyzing sociophonetic variation Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: The phenomenal amount of text on social media has recently spawned endeavors on computational methods to study language variation and change. However, we also have access to an unprecedented quantity of speech -- from Youtube video blogs to podcasts to recordings of radio and television shows, spanning several different accents and dialects. This data is a boon to sociophoneticians, who have traditionally relied on small-scale interviews to study systematic variation in speech. At the same time, it presents a challenge: the usual manual speech analysis methods do not scale.I will present ongoing work on an application that allows sociophoneticians to identify dialect features from potentially noisy speech data without the need for manual transcription.
12 Jun 2015	Yan Liu (USC/MELADY)	Group Anomaly Detection in Social Media Analysis Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Traditional anomaly detection on social media mostly focuses on individual point anomalies while anomalous phenomena usually occur in groups. Therefore it is valuable to study the collective behavior of individuals and detect group anomalies. Existing group anomaly detection approaches rely on the assumption that the groups are known, which can hardly be true in real world social media applications. In this paper, we take a generative approach by proposing a hierarchical Bayes model: Group Latent Anomaly Detection (GLAD) model. GLAD takes both pair-wise and point-wise data as input, automatically infers the groups and detects group anomalies simultaneously. To account for the dynamic properties of the social media data, we further generalize GLAD to its dynamic extension d-GLAD. We conduct extensive experiments to evaluate our models on both synthetic and real world datasets. The empirical results demonstrate that our approach is effective and robust in discovering latent groups and detecting group anomalies. Bio: Yan Liu is an assistant professor in Computer Science Department at University of Southern California from 2010. Before that, she was a Research Staff Member at IBM Research. She received her M.Sc and Ph.D. degree from Carnegie Mellon University in 2004 and 2007. Her research interest includes developing scalable machine learning and data mining algorithms with applications to social media analysis, computational biology, climate modeling and healthcare analytics. She has received several awards, including NSF CAREER Award, Okawa Foundation Research Award, ACM Dissertation Award Honorable Mention, Best Paper Award in SIAM Data Mining Conference, Yahoo! Faculty Award and the winner of several data mining competitions, such as KDD Cup and INFORMS data mining competition.
29 May 2015	Aliya Deri (USC/ISI)	How to Make a Frenemy: Multitape FSTs for Portmanteau Generation Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: A portmanteau is a type of compound word that fuses the sounds and meanings of two component words; for example, “frenemy” (friend + enemy) or “smog” (smoke + fog). We develop a system, including a novel multitape FST, that takes an input of two words and outputs possible portmanteaux. Our system is trained on a list of known portmanteaux and their component words, and achieves 45% exact matches in cross-validated experiments. Bio: Aliya Deri is a PhD candidate at USC/ISI.
22 May 2015	Marjan Ghazvininejad (USC/ISI)	How to Memorize a Random 60-Bit String Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: User-generated passwords tend to be memorable, but not secure. A random, computer-generated 60-bit string is much more secure.However, users cannot memorize random 60-bit strings. In this paper, we investigate methods for converting arbitrary bit strings into English word sequences (both prose and poetry), and we study their memorability and other properties. Bio: Marjan Ghazvininejad is a second year PhD student in Computer Science at University of Southern California (USC). She is working with Professor Kevin Knight at the Information Sciences Institute (ISI). She is interested in natural language processing, especially the application of machine learning techniques in this area.
15 May 2015	Dehua Cheng (USC/Melady)	Exploring LDA: Parallel Inference and Model Selection Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Latent Dirichlet allocation (LDA) and its Bayesian nonparametric generalization hierarchical Dirichlet processes (HDP) have been proven successful in modeling large, complex, real-world domains. However, inference on LDA/HDP is challenging and it has received notable attention from the researchers. In this talk, we present two algorithmic advances for LDA/HDP inference by examining their mathematical properties. We will first present an effective parallel Gibbs sampling algorithm for LDA/HDP by exploring the equivalency between the Dirichlet-multinomial hierarchy and the Gamma-Poisson hierarchy. Secondly, we will show how to provably select the number of topics for LDA by studying the spectral space of its second order moments (bi-gram statistics). Bio: Dehua Cheng is a third year Ph.D. student in the CS department at USC, advised by Professor Yan Liu. Prior to that, he received his B.S. degree in Mathematics and Physics from Tsinghua University, China. His research interests include randomized numerical algorithm in machine learning and parallel inference for probabilistic graphical model.
24 Apr 2015	David Kauchak (Pomona )	Learning To Simplify Text One Sentence at a Time Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Information can now be found on almost any topic ranging from news to do-it-yourself guides to health-related articles. Unfortunately for readers, the complexity and readability of these texts can vary widely. Even if the concepts of an article are accessible, the language and structure of the text can prohibit a person from understanding these concepts.Text simplification techniques are aimed at reducing the reading and grammatical complexity of text while retaining the meaning and are one approach to increasing information accessibility. Motivated by both corpus analyses and human experiments, I will introduce a number of recent text simplification techniques ranging from semi-automated approaches, that require a human in the loop, to automated approaches, including word-level, phrase-level and syntax-level models. Bio: David Kauchak is currently an assistant professor in the Computer Science Department at Pomona College. Previously, he was at Middlebury College and has worked at Google, ISI, PARC and Adchemy. He received his Ph.D. in Computer Science from University of California, San Diego.
17 Apr 2015	Longhua Qian (Soochow / ISI)	Exploiting Bilingual Corpora for Relation Extraction Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Large-scale labeled corpora are always critical for Natural Language Processing tasks using statistical machine learning methods, but at the great expense of human labor of annotation. While we have various labeled corpora in different languages at hand, such as English and Chinese, either resource-rich or resource-poor, can these corpora be taken full advantage of for NLP tasks in different languages to help each other? The difficult lies in the fact that parallel corpora with aligned NLP entities are hard to acquire. In this talk, I shall first discuss how to generate pseudo-parallel corpora for relation extraction via machine translation and entity alignment techniques, and then I will proceed to apply this corpora to statistical ML-based relation extraction in terms of the degree of supervision: (1) supervised learning; (2) bilingual co-training; (3) bilingual active learning. This talk is chiefly based on the ACL-2014 paper “Bilingual Active Learning for Relation Classification via Pseudo Parallel Corpora”. Bio: Longhua Qian is a visiting researcher from the School of Computer Science and Technology, Soochow University, China. He joined the Natural Language Group and will work with Professor Kevin Knight and his team members for one year. He will participate in on-going projects on Abstract Meaning Representation (AMR) and Machine Reading etc. His mainly focuses on information extraction, relation extraction, and entity linking etc. He is also interested in extracting information from clinical medical records and building social networks from free text.
10 Apr 2015	Atefeh Farzindar (NLP Technologies)	TRANSLI, NLP-based social media analytics and monitoring Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: NLP Technologies have developed a technology for automated analysis of social media data. TRANSLI Social Media Analytics and monitoring, is an online visual analytics system designed to provide social intelligence from news and other events from Twitter. During this seminar, Dr. Atefeh Farzindar will give a presentation on TRANSLI-SM where the system features an intuitive user interface and is designed to browse and visualise the results of the semantic analysis of social discussion on specific events from Twitter. The user can obtain the information not only limited to the main event of interest but also to the intelligence for the sub events.NLP Technologies Inc. is a Canadian company founded in 2005 and that expanded to California in 2014. The company specialises in natural language processing, NLP-based search engines, translation technologies and services, social media analytics, and automatic summarization.http://www.nlptechnologies.ca/ Bio: Dr. Atefeh Farzindar, CEO NLP Technologies Inc.and Adjunct professor at University of MontrealDr. Atefeh Farzindar is the co-founder and CEO of NLP Technologies. She received her PhD in Computer Science from the University of Montreal and her Doctorate in automatic summarization of legal documents from Paris-Sorbonne University in 2005. She has been an Adjunct professor at the Department of Computer Science at the University of Montreal since 2010, and she was Honorary Research Fellowship at the Research Group in Computational Linguistics at the University of Wolverhampton, UK (2010-2012).Dr. Farzindar has been Action Editor in the international journal of Computational Intelligence since 2011. She co-edited two special issues on social media analysis for the International Journal of Computational Intelligence (CI) and Journal TAL, an international journal on natural language processing.She co-authored an upcoming book on Natural Language Processing for Social Media [Morgan & Claypool Publishers, 2014], and authored a book chapter in Social Network Integration in Document Summarization, Innovative Document Summarization Techniques: Revolutionizing Knowledge Understanding, IGI Global publisher January 2014. In 2013, Dr. Farzindar won Femmessor-Montréal’s contest, Succeeding with a balanced lifestyle, in the Innovative Technology and Information and Communications Technology category because of her involvement in the arts. Her paintings have recently been published in a book titled One Thousand and One Nights, in which the palette of vivid colours and her unique contemporary style revolved around on the place of women in modern society (Vernissage & Artist Book Launch April, Montréal, Galerie 203 https://www.youtube.com/watch?v=TLCghx1mvzY)
03 Apr 2015	Don Metzler (Google)	Keeping Topic Models Fresh: Technical and Practical Challenges Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Topic models are statistical models that can be used to infer the most likely topics that some piece of text is about. Such models are useful for applications that rely on semantic representations of text, such as query classification, document understanding, and measuring semantic similarity. These models are widely used within Google. In this talk, I will first describe the details of one of these models -- one that learns over a million topics covering just about every language. I will then describe a number of technical and practical challenges involved in keeping such a model fresh and up-to-date within real-world applications. Bio: Donald Metzler is a Staff Software Engineer at Google Inc. Prior to that, he was a Research Assistant Professor at the University of Southern California (USC) and a Senior Research Scientist at Yahoo!. He has served as the Program Chair of the WSDM, ICTIR, and OAIR conferences and sat on the editorial boards of the major journals. He has published over 40 research papers, has been awarded 4 patents, and co-authored the textbook Search Engines: Information Retrieval in Practice.
20 Mar 2015	Tomer Levinboim (Notre Dame)	Multitask Word Alignment with Random-Walk Regularizers Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Suppose we translate a word from English to French and back. Should we get the original English word? That is, is translation invertible?Alternatively, suppose we translate an English word e to Spanish and then from Spanish to French, obtaining a word f.Should e-f be a valid entry in an English-French dictionary? That is, is translation transitive?Intuitively, if translation is done carefully, we expect to answer both these questions with "Yes, with high probability".In this talk, I will discuss how to formulate our intuition about invertibility/transitivity with random-walks, using translation probability distributions.I will then present two random-walk based regularization techniques that we recently used in a multitask word alignment setting:(1) Model Invertibility Regularization (MIR) - a concave regularizer for bi-directional models which can be applied even without parallel data.(2) Triangulation based Dirichlet prior - a method that capitalizes on parallel data with a pivot language, to construct and learn better translation priors.This talk is based on joint work with Prof. David Chiang (ND) and Dr. Ashish Vaswani (ISI). Bio: Tomer Levinboim is a PhD student at the University of Notre Dame, working with Prof. David Chiang on developing machine learning techniques for improving machine translation and NLP of low resource languages.He is generously hosted by Kevin Knight at USC/ISI.
06 Mar 2015	Neda Jahanshad (USC/ISI)	Multi-site genetic analysis of the brain’s white matter: ENIGMA-DTI Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: The functioning regions of the brain are connected through a complex network of fibers, described by the brain’s white matter. Non-invasive imaging using MRI-based diffusion imaging can help capture important characteristics of the connections by describing the strength and directionality profile of water diffusion along white matter fibers. Variability in these connections have been noted in many neurological, degenerative, and psychiatric disorders where ultimately information transfer from on brain region to the other may be weakened or completely compromised. To discover genetic risk factors for altered connectivity and common genetic variants which put the brain at subtle risk for weakened connections, we find power in sample size and pool multiple datasets from around the world to determine common effects in all populations. However, there is no standard method for acquiring diffusion images and standardizing measures across datasets is an ongoing challenge. The Enhancing Neuro Imaging Genetics through Meta Analysis group on Diffusion Tensor Imaging has established a set of basic protocols to overcome a portion of these challenges, which I will describe, along with works-in-progress to tackle additional obstacles to reveal critical details of the brains network. Bio: Neda Jahanshad is an assistant professor of Neurology at USC in the Imaging Genetics Center at ISI. She received her PhD in Biomedical Engineering at UCLA in 2012 where she worked on optimizing diffusion imaging protocols to map structural brain connections in large populations. She has since extended the work to explore methods of pooling such imaging data from across the world and determine genetic and environmental contributions to the connectivity of the brain and determine how these effects vary across the lifespan. She is coordinating one of the largest studies of the brain's white matter through the ENIGMA Consortium http://enigma.ini.usc.edu.
20 Feb 2015	Jonathan May (USC/ISI)	Semantic Parsing as Machine Translation Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: We cast the generation of semantic graphs from natural language text as a machine translation problem, where the source language is English and the target language is a labeled graph representing a semantic interpretation, known as an Abstract Meaning Representation (AMR). Via a series of data transformations we create a training set that is amenable to a string-to-tree syntax mt decoder. Previous work in SBMT and AMR parsing is combined to yield a trainable system that achieves state-of-the-art parsing results. Bio: Jonathan May is a computer scientist at USC-ISI, where he also received a PhD in 2010. His current focus areas are in machine translation, machine learning, and natural language understanding. Jonathan co-developed and patented a highly portable method for optimizing thousands of features in machine translation systems that has since been incorporated into all leading open source MT systems. He has previously worked in automata theory and information extraction and at SDL Language Weaver and BBN Technologies.
13 Feb 2015	Dogan Can (USC/SAIL)	Efficient Computation of Substring Posteriors from Lattices using Weighted Factor Automata Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Efficient computation of substring posteriors from lattices has applications in the estimation of document frequencies in spoken corpora and lattice-based minimum Bayes-risk decoding in statistical machine translation. In this talk, we present a new algorithm for exact substring posterior computation that leverages the following observations to speed up computation: i) the set of substrings for which the posteriors will be computed typically comprises all n-grams in the lattice up to a certain length, ii) posterior probability is equivalent to expected count for substrings that do not repeat on any path of the input lattice, iii) there are efficient algorithms for computing expected counts from lattices. We present experimental results comparing our algorithm with the best known algorithm in literature as well as a baseline algorithm based on finite state automata operations. Bio: Dogan Can is a fifth year Ph.D. student at USC SAIL (Signal Analysis and Interpretation Lab). He works with Professor Shrikanth Narayanan on a range of topics including lattice indexing for spoken information retrieval, concurrent/online speech processing architectures and statistical modeling of psychotherapy sessions. His research interests include weighted finite state automata, automatic speech recognition, information retrieval, dialogue modeling and behavioral informatics.
30 Jan 2015	Derrek Hibar (USC/INI)	Neuroimaging Genetics in the ENIGMA Consortium Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: The highly complex structure of the human brain is strongly shaped by genetic influences. Subcortical brain regions act jointly with cortical areas to coordinate movement, memory, motivation, reinforcement and learning. To investigate how common genetic variants affect the structure of these brain regions, we conducted genome-wide association studies (GWAS) of the volumes of seven subcortical regions and intracranial volume, derived from magnetic resonance images (MRIs) of 30,717 individuals. By identifying genetic influences on brain structure, we can begin to map the genetic architecture underlying variability in human brain development and function, a process that will help elucidate the dysfunctions that lie at the core of neuropsychiatric disorders. Bio: Derrek Hibar is an assistant professor in the Department of Neurology in the Keck School of Medicine of USC where he studies common genetic influences on brain structure and susceptibility to psychiatric disorders. He is currently coordinating one of the largest studies of brain structure to date as part of the ENIGMA Consortium (http://enigma.ini.usc.edu).
23 Jan 2015	Devin Griffiths (USC/Dornsife)	Understanding Analogies: Theory and Method Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Analogies allow us to make connections between different domains of knowledge and to apply what we already know to new situations. For this reason, they're important to developing new theories and new understandings of the social and natural world, and have often been seen as an important task for machine learning. In my talk, I'll explore how different theories of how analogy works shape the different approaches that research teams take when modeling analogical thinking. Specifically, I'll contrast what I term "formal" or "top-down" theories of analogy with a "serial" or "bottom-up" approach. Finally, I'll describe a syntactic and semantic method for searching out analogies within corpora. I'm convinced that understanding analogies better, and being able to find locate new analogies in historical documents, can help us understand where new ideas come from. Bio: Devin Griffiths is an assistant professor in the English Department at USC, where he studies nineteenth-century British literature and scientific history. His current book project, titled "Between the Darwins," explores how analogies were used in the nineteenth century to create new theories of evolution and social progress. His areas of research include science and literature, poetics, book history, and the digital humanities. Webcast Link: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=56439af4a5cb41f49a2c5faef5683cd11d
16 Jan 2015	Jonathan Gordon (USC/ISI)	Towards the Interpretation of Metaphoric Language Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Understanding what people mean when they use metaphoric language is a central problem in natural language understanding. Metaphors give a partial understanding of one kind of experience in terms of another, highlighting similarities and hiding differences. In this talk, I give an overview of the problems posed by metaphoric language. I then describe ongoing crosslinguistic work on the knowledge-based interpretation of metaphors by abductive inference. This work moves us toward a better understanding not only of what people are saying with metaphors but also how the metaphors used by groups of people (e.g., the supporters and opponents of gun control) expose their different world views. Bio: Jonathan Gordon is a postdoctoral researcher at the USC Information Sciences Institute, where he is advised by Jerry Hobbs. His 2014 doctoral dissertation, 'Inferential Commonsense Knowledge from Text', was supervised by Lenhart Schubert at the University of Rochester. Jonathan's research interests include natural language understanding, semantics, and knowledge extraction.
05 Dec 2014	Kingson Man (USC/BCI)	Multisensory integration in a neural framework for concepts Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: How are concepts represented in the brain? When we hear the ringing of a bell, or watch a bell swinging back and forth, is there a shared "BELL" pattern of neural activity in our brains? Philosophers have debated the nature of concepts for centuries, but recent technical advances have allowed neuroscientists to make contributions to this topic. The combination of functional neuroimaging and machine learning has allowed us to examine distributed patterns of activity in the human brain to decode what they represent about the world, and to what level of abstraction. I describe our recent findings that revealed a hierarchical organization of multisensory information integration, leading to representations that generalize across different sensory modalities. I will also discuss our work on the social function of concepts, which enables the communication of similar thoughts and associations between individuals. Bio: I am a research associate at the Brain and Creativity Institute of the University of Southern California. I earned my Ph.D. at USC, mentored by Antonio Damasio. I am interested in the general problem of consciousness, and in particular how different sensations are bound together by the brain into a unified experience of the world.
20 Nov 2014	Robert Munro (Idibon)	Technologies for every language: how machine learning can reach everyone in the world Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Speakers of more than 5,000 languages have access to internet and communication technologies. The majority of phones, tablets and computers now ship with language-enabled capabilities like speech-recognition and intelligent auto-correction, and people increasingly interact with data-intensive cloud-based language technologies like search-engines and spam-filters. For both personal and large-scale technologies, the service quality drops or disappears entirely outside of a handful of languages. Speakers of low-resource languages correlate with lower access to healthcare, education and higher vulnerability to disasters. Serving the broadest possible range of languages is crucial to ensuring equitable participation in the global information economy.I will present examples of how natural language processing and distributed human computing are improving the lives of speakers of all the world's languages, in areas including education, disaster-response, health and access to employment. When applying natural language processing to the full diversity of the world's communications, we need to go beyond simple keyword analysis and implement complex technologies that require human-in-the-loop processing to ensure usable accuracy. I will share results that show how for-profit technologies are improving people's lives by providing sustainable economic growth opportunities when they support more languages, aligning business objectives with global diversity. Bio: Robert Munro is the CEO of Idibon, a company with the objective of providing language technologies for all the world's languages. In past work he has served as Chief Information Officer for the largest solar energy company in Sierra Leone; was the Chief Technology Officer for the largest use of big data technologies to track disease outbreaks globally; worked for the UN High Commission for Refugees in Liberia; lead the crowdsourced response to the 2010 earthquake Haiti; and has helped information processing in disaster response and election monitoring in more than a dozen countries. In current work, Idibon helps everyone from Fortune 500s to disaster response organizations process language data at scale. Outside of work, he has learned about the world's diversity by cycling more than 20,000 kilometers across 20 countries. Robert has a PhD from Stanford University.
14 Nov 2014	Gully Burns (USC/ISI)	Machine Reading of the Biomedical Literature: It's All About Data Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Like most scientific disciplines, cancer biology involves performing experiments and interpreting them. At present, most modeling efforts center on trying to bring together collections of interpretations as 'pathway diagrams' but do not attempt to capture the semantics of supporting experimental data. Here, I will describe a new strategic approach for machine reading of scientific articles based on a generic representation of experimental data with explicit examples within the field of cancer biology. I will also discuss this effort in the context of the Abstract Meaning Representation (AMR) and present an informal generative story for your consideration and feedback. Bio: Gully Burns develops pragmatic biomedical knowledge engineering systems for scientists that provide directly useful functionality in their everyday use and is based on innovative, cutting edge computer science. He was originally trained as a physicist at Imperial College in London before switching to do a Ph.D. in neuroscience at Oxford. He came to work at USC in 1997, developing the 'NeuroScholar' project in Larry Swanson's lab before joining the Information Sciences Institute in 2006. He is now works as project leader in ISI's Information Integration Group.
07 Nov 2014	Nima Pourdamghani (USC/ISI)	Aligning English Strings with Abstract Meaning Representation Graphs Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: We align pairs of English sentences and corresponding Abstract Meaning Representations (AMR), at the token level. Such alignments will be useful for downstream extraction of semantic interpretation and generation rules. Our method involves linearizing AMR structures and performing symmetrized EM training. We obtain 86.5% and 83.1% alignment F score on development and test sets. Bio: Nima Pourdamghani is a second year Ph.D. student at ISI. He works with Professor Kevin Knight on Abstract Meaning Representation and its application to machine translation.
31 Oct 2014	Nikolaos Malandrakis (USC/SAIL)	Generating Psycholinguistic Norms Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Abstract numerical representations of word and term content are very popular in NLP applications of behavioral analysis, like sentiment analysis, where the low dimensional representation allows for the use of complicated machine learning techniques, despite the lack of annotated in-domain data. In this presentation we will discuss our experiments on automatically expanding manually annotated lexica of linguistic norms, starting from word emotion norms and generalizing to include higher order terms, norms beyond emotion (like concreteness and age of acquisition) as well as languages other than English. We will present our attempts at domain adaptation of these norms, as well as the composition of norms for larger lexical units via their constituents by utilizing distributional semantic representations. As examples of actual applications we will present a highly ranked system of sentiment analysis submitted to SemEval 2014 and a multi-modal depression diagnosis system for German submitted to AVEC 2014. Bio: Nikolaos Malandrakis is a third year PhD student at the USC Computer Science Department and a research assistant at the Signal Analysis and Interpretation Laboratory (SAIL). He is originally from Chania, Greece, where he completed a BSc and MSc in Computer Engineering at the Technical University of Crete.
17 Oct 2014	Qing Dou (USC/ISI)	Beyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation [EMNLP Practice Talk] Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Inspired by previous work, where decipherment is used to improve machine translation, we propose a new idea to combine word alignment and decipherment into a single learning process. We use EM to estimate the model parameters, not only to maximize the probability of parallel corpus, but also the monolingual corpus. We apply our approach to im- prove Malagasy-English machine transla- tion, where only a small amount of paral- lel data is available. In our experiments, we observe gains of 0.9 to 2.1 Bleu over a strong baseline. Qing Dou is a fifth year Ph.D. student at ISI. He works with Professor Kevin Knight on various decipherment problems and its application to different Natural Language Processing tasks.
10 Oct 2014	Boris Gutman (USC/ISI)	Interplay between Continuous and Discrete Aspects of Brain Image Analysis Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Brain MRI offers tremendous opportunity to learn about cortical anatomy, function and connectivity. In this talk I will go over several standard techniques for image understanding used in brain imaging. These include image registration, segmentation, tractography and graph-based connectivity analyses. Among these algorithms, we routinely encounter both continuous and discrete types of analysis. Non-linear image registration, typically formalized as a diffeomorphism on the image domain, is an example of the former: we may ask for instance how much volume change the brain is experiencing locally over time, clearly a continuous measure. In another example, we may trace continuous curves in space that best fit a Diffusion Tensor MR image to approximate fibers in the brain’s white matter. One the other hand, connectivity between distinct units within the nervous system is an example of discrete analysis: for instance, the brain’s functionally distinct regions are thought of as nodes in a graph, whose edges are defined by the connecting fiber models.After a brief description of the standard methods at hand, I will suggest an approach for combining the two types of analysis. By assuming the continuous paradigm for connectivity, we can push our connectome model from being a discrete graph to being a linear operator. Using some well-known results from operator theory, we can decompose the operator into its resident “eigen-networks,” and apply continuous methods directly. As an example, we can spatially register connectivity matrices with spatially distributed nodes. Finally, I will show two simple examples of continuous analogues for standard graph theory measures, and their potential application for an Alzheimer ’s disease study. Bio: Boris Gutman received his B.S. in Applied Mathematics and PhD in Biomedical Engineering from UCLA before joining USC’s Imaging Genetics Center (IGC). He is currently a post-doctoral scholar at the IGC, under the supervision of Professor Paul M. Thompson.
03 Oct 2014	Kevin Knight (USC/ISI)	Getting Good at Research Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: If you do good research, you'll find that many doors open. I'll offer some suggest for how to make that happen. This should be an interactive session. Bio: Kevin Knight is the director of the ISI Natural Language group, a professor of Computer Science at USC, and an ISI Fellow.
26 Sep 2014	Bill MacCartney (Google/Stanford)	Semantic Parsing at Google Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: With the shift from desktop to mobile, and the rise of voice-driven UIs, a growing proportion of the Google query stream is not well-served by conventional keyword-based information retrieval. More and more queries use natural language ("when does walgreens close"), seek answers not found on any web page ("how do i get to work from here"), or demand action rather than information ("text my wife i'm 10 minutes late"). Satisfying such queries requires semantic parsing, that is, mapping the query into a structured, machine-readable representation of meaning. In this talk, I will give an overview of the techniques Google has developed to address the problem of semantic parsing, and discuss some of the challenges that remain. I'll also highlight differences between academia and industry in how the problem is conceived. Bio: Bill MacCartney is a Senior Research Scientist at Google, working primarily on semantic parsing. He is also a Consulting Assistant Professor of Computer Science at Stanford. For more info: http://nlp.stanford.edu/~wcmac/
19 Sep 2014	Markus Dreyer (SDL)	An open-source toolkit for the representation, manipulation and optimization of weighted hypergraphs Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Weighted hypergraphs arise naturally in parsing, syntax-based machinetranslation and other tree-based NLP models, as well as in weightedlogic programming.We present an open-source toolkit for the representation andmanipulation of weighted hypergraphs. It provides hypergraph datastructures and algorithms, such as the shortest path andinside-outside algorithms, composition, projection, and more. Inaddition, it provides functionality to optimize hypergraph featureweights from training data. We model finite-state machines as aspecial case. We give a tutorial on hypergraphs and the hypergraphtoolkit and explain how you can use these tools in your research.This is joint work with Jonathan Graehl. Bio: Markus Dreyer is a Senior Research Scientist at SDL LanguageWeaver. His research focuses on algorithms and machine learningtechniques for large-scale machine translation and NLP. He receivedhis PhD in Computer Science from Johns Hopkins University, advised byJason Eisner. For more information, see http://goo.gl/d6mHUi.
11 Sep 2014	Eunsol Choi (University of Washington) and Matic Horvat (Cambridge)	Towards automatic extraction of experimental data from scientific papers [Intern final talk] Time: 3:30 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Many areas of science have experienced rapid growth in the amount of scientific literature published. For example, there are approximately 400 new papers published each year in the area of Machine Translation. As such amount of new data is virtually impossible to processes by a single researcher, a new tool is needed that would help researchers explore existing and discover new MT literature. To address this problem we built an approach for automatic extraction of experimental data from scientific papers that populates a database enabling structured queries.Bios:Eunsol Choi is a PhD student at the University of Washington, advised by Prof. Luke Zettlemoyer. Prior to UW, she studied mathematics and computer science at Cornell University. Matic Horvat is a PhD student at University of Cambridge researching integration of semantics and Statistical Machine Translation. He is originally from Ljubljana, Slovenia, where he completed a BSc in Computer Science in 2012. He continued with a masters in Advanced Computer Science at University of Cambridge, graduating in 2013.
05 Sep 2014	Claire Bonial (University of Colorado, Boulder)	Take a look at this! Form, Function and Productivity of English Light Verb Constructions Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: English light verb constructions (LVCs), such as have a drink, make an offer, take a bath, do an investigation, and give a groan, represent a powerfully expressive type of Multi-Word Expression (MWE) in English; however, the precise definition, semantic function and productivity of English LVCs remain unclear, hampering efforts to treat LVCs appropriately in Natural Language Processing (NLP) resources. This research focuses on exploring these three issues. A definition for LVCs that combines syntactic and semantic criteria is developed, initially based on existing research on delimiting and defining LVCs, and iteratively refined during the development of an LVC annotation schema for the PropBank project (Palmer et al., 2005). Existing theories on the linguistic function of LVCs both cross-linguistically and in English are discussed, and a corpus study of LVCs provides evidence that the primary function of LVCs in English is to enable speakers to describe events in a manner that can take advantage of rich nominal modification; for example, The inspector general did a rather controversial investigation... Finally, a dominant hypothesis concerning the productivity of certain verbal constructions is investigated in relation to LVCs, using large-scale Mechanical Turk surveys. This firstly probes the question of why “families” of semantically similar LVCs occur (e.g. make a statement/speech/declaration/proposal), but other arguably similar LVCs are odd to speakers (e.g. ?make a yell/hint). Secondly, this provides the groundwork for better detection of very low frequency LVCs that arise from a speaker’s ability to shift and extend verbal meanings within novel constructions. The contributions of these findings on both NLP and linguistic theory are presented. Bio: Claire Bonial is in the final weeks of her academic adventure at the University of Colorado, Boulder, and will soon be graduating with a joint PhD in Linguistics and Cognitive Science. Claire has had a rich academic and professional experience at CU, including collaborative research with Martha Palmer on a variety of Natural Language Processing (NLP) resources, such as PropBank and VerbNet. Although Claire has served in many roles on these projects, her primary contribution has been theoretical research on English Light Verb Constructions (LVCs, e.g. take a walk, make a mistake). Specifically, she has worked to improve the coverage and treatment of this type of multi-word expression. Two years ago, Claire began working with Kevin Knight and ISI on the Abstract Meaning Representation (AMR) project, which has since afforded her several lovely trips to Marina del Rey, as well as the opportunity to apply her linguistic expertise in the development of this exciting and challenging project. Webcast Link:http://webcasterms1.isi.edu/mediasite/Viewer/?peid=c389f52cfb16424facb6386ff180de771d
29 Aug 2014	Allen Schmaltz (Harvard) and Julian Schamper (RWTH Aachen)	Toward Semantic Parsing [Intern final talk] Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Semantic parsing has potential applications in a number of areas, including machine translation and machine reading, among many others. In this talk we will present our initial work on the parsing task for the semantic representation language known as Abstract Meaning Representation (AMR). The task is to take an English sentence and transform it into its semantic representation.We will present a series of approaches and associated results, providing guidelines for future work in this area. We will show approaches using heuristics, tree transducers, and probabilistic context free grammars. We will also present approaches for AMR rule extraction for the applicable formalisms. In doing so, we will also highlight challenges relative to syntactic parsing.Additionally, we will provide a map for the future directions in AMR parsing that we plan to pursue in the fall.Bios:Allen Schmaltz is a Ph.D. student in Computer Science in the School of Engineering and Applied Sciences at Harvard University (2013-present; S.M. 2014), working with Stuart Shieber. He is interested in formal, statistical, and human-augmented machine learning approaches for computational linguistics. Before starting his Ph.D. in Computer Science, he completed the better part of an additional Ph.D. in the (quantitative) social sciences at Harvard University (2010-2013), received a M.A. from Stanford University (2010), and received a B.A. from Northwestern University (2006). Earlier in his academic career he also studied at Cornell University and in Yokohama, Japan, among other places. Julian Schamper studies computer science at RWTH Aachen University. He did his bachelor thesis in the field of deciphering foreign language and works as a student research assistant at Prof. Hermann Ney's Human Language Technology and Pattern Recognition Group.
22 Aug 2014	Allen Schmaltz (Harvard)	Determinantal Point Processes for Human-Augmented Machine Translation [Intern talk] Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This talk will introduce languageFractal, an online system for human-augmented machine translation (MT) that aims to incorporate monolingual speakers into the translation pipeline in a cost-effective manner. The essential principle is to take a middle ground between pure MT and a fully crowdsourced approach by augmenting MT results with human corrections in an iterative cycle. To efficiently emit phrases and sentences to users and to effectively explore the space of possible translation options, we propose the use of determinantal point processes (DPPs), which can be used to model subset selection problems in which diversity of the subset is a desirable characteristic.I will provide a brief tutorial on DPPs (including L-ensembles and the structured variant), and I will present an overview of our formulation of DPPs for dynamic programming problems in the context of the human-augmented machine translation pipeline. I will also introduce the languageFractal pilot and pipeline, the full trials of which will run through the 2014-2015 academic year at Harvard University. Bio: Allen Schmaltz is a Ph.D. student in Computer Science in the School of Engineering and Applied Sciences at Harvard University (2013-present; S.M. 2014), working with Stuart Shieber. He is interested in formal, statistical, and human-augmented machine learning approaches for computational linguistics. Before starting his Ph.D. in Computer Science, he completed the better part of an additional Ph.D. in the (quantitative) social sciences at Harvard University (2010-2013), received a M.A. from Stanford University (2010), and received a B.A. from Northwestern University (2006). Earlier in his academic career he also studied at Cornell University and in Yokohama, Japan, among other places.
08 Aug 2014	Tim Schlippe (Karlsruhe Institute of Technology)	Rapid Generation of Pronunciation Dictionaries for New Domains and Languages Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Automatic speech recognition systems exist only for a small fraction of the more than 7,100 languages in the world since the development of such systems is usually expensive and time-consuming. Therefore, porting speech technology rapidly to new languages with little effort and cost is an important part of research and development.Pronunciation dictionaries are a central component for both automatic speech recognition and speech synthesis. They provide the mapping from the orthographic form of a word to its pronunciation, typically expressed as a sequence of phonemes.I will present innovative strategies and methods for the rapid generation of pronunciation dictionaries for new application domains and languages. Depending on various conditions, solutions are developed and proposed – starting from the simple scenario in which the target language can be found in written form on the Internet and we have a simple mapping between speech and written language – up to the difficult scenario in which no written form for the target language exists. We embedded many of the tools implemented in this work in the Rapid Language Adaptation Toolkit. Its web interface is publicly accessible and allows people to build first speech recognition systems with little technical background. Bio: Since 2008 Tim Schlippe is a research assistent and PhD student at Karlsruhe Institute of Technology (KIT), Institute for Anthropomatics, in Germany.At KIT he is involved in teaching and several projects. He has published multiple publications in the field of multilingual speech recognition.For his master's thesis he was as a visiting researcher at Carnegie Mellon University, doing research in the field of statistical machine translation.Tim Schlippe will finish his PhD in November 2014. His current research interests are:Multilingual speech recognition with a focus on rapid adaptation of speech recognition systems to new domains and languages, pronunciation modeling, and language modeling.
31 Jul 2014	Ali Borji (USC)	Computational Modeling of Bottom-up and Top-down Visual Attention Time: 11:00 am - 12:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Over the last two decades, the inter-disciplinary fields of visual attention and saliency have attracteda lot of interest in cognitive sciences, computer vision, robotics, and machine learning. The high complexity of natural environments requires the primate visual system to combine, in a highly dynamic and adaptive manner, sensory signals that originate from the environment (bottom-up) with behavioral goals and priorities dictated by the task at hand (top-down). I will talk about my recent research in two directions: 1) Bottom-up attention: I will give a snapshot of biological findings on visual attention (e.g., how gaze direction of people in a scene influences eye movements of an external observer), theoretical background on saliency concepts, our model benchmark and saliency models, and 2) Top-down attention: I will describe our neuromorphic algorithms to predict, in a task-independent manner, which elements in a video scene might more strongly attract the gaze of a human. Multi-modal data including bottom-up saliency, "gist" or global context, physical actions and object properties (using example recorded eye movements and videos of humans engaged in various 3D video games, including flight combat, driving, first-person shooting, running a hot-dog stand that serves hungry customers) are utilized to associate particular scenes with particular locations of interest, given the task (e.g., when the task is to drive, if the scene depicts a road turning left, the system learns to look at that left turn). Finally, I will present some successful engineering and clinical applications of our models. Bio: Ali Borji received the BS and MS degrees in computer engineering from the Petroleum University of Technology, Tehran, Iran, 2001 and Shiraz University, Shiraz, Iran, 2004, respectively.He received the PhD degree in computational neurosciences from the Institute for Studiesin Fundamental Sciences (IPM) in Tehran, 2009. He then spent a year at University of Bonn as a postdoc. He has been a postdoctoral scholar at iLab, University of Southern California, Los Angeles since March 2010. His research interests include computer vision, machine learning, and neurosciences with particular emphasis on visual attention, visual search, active learning, scene and object recognition, and biologically plausible vision models.
25 Jul 2014	Daniel Lamprecht (TU Graz)	Navigation Dynamics in Networks Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Research on networks has already revealed much about the structure of real-world networks. Network dynamics such as navigation or exploration, however, are something less well-researched. Yet, we constantly design and use networked systems meant for navigation and exploration. In this talk, I will present a short overview of what we know about navigability, followed by the our work on exploring dynamics occurring on recommendation networks - networks formed implicitly by recommender systems. Navigability can serve as an evaluation criterion for recommender systems and reveal to what extent a system supports navigation and exploration. Based on analysis of topology and dynamical processes, we find that current systems do not support navigation very well, and propose techniques to overcome this. Bio: Daniel Lamprecht is a PhD student at Graz University of Technology and is interning at ISI this summer. His research explores network science, web science and recommender systems and especially focuses on network navigability. This summer, he's working with Kristina Lerman on navigation dynamics and click biases in Wikigames. In the past, he has also studied navigation dynamics in information networks with the aid of biomedical ontologies.
18 Jul 2014	Jonathan May (USC/ISI)	An Arabizi-English Social Media Statistical Machine Translation System Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We present a machine translation engine that can translate romanized Arabic, often known as Arabizi, into English. With such a system we can, for the first time, translate the massive amounts of Arabizi that are generated every day in the social media sphere but until now have been uninterpretable by automated means. We accomplish our task by leveraging a machine translation system trained on non-Arabizi social media data and a weighted finite-state transducer-based Arabizi-to-Arabic conversion module, equipped with an Arabic character-based n-gram language model. The resulting system allowshigh capacity on-the-fly translation from Arabizi to English. We demonstrate via several experiments that our performance is quite close to the theoretical maximum attained by perfect deromanization of Arabizi input. This constitutes the first presentation of an end-to-end social media Arabizi-to-English translation system. bio:Jonathan May is a computer scientist at USC-ISI, where he also received a PhD in 2010. His current focus areas are in machine translation, machine learning, and natural language understanding. Jonathan co-developed and patented a highly portable method for optimizing thousands of features in machine translation systems that has since been incorporated into all leading open source MT systems. He has previously worked in automata theory and information extraction and at SDL Language Weaver and BBN Technologies.
11 Jul 2014	Yang Feng (USC/ISI)	Factored Markov Translation with Robust Modeling Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Phrase-based translation models usually memorize local translation literally and make independent assumption between phrases which makes it neither generalize well on unseen data nor model sentence-level effects between phrases. We present a new method to model correlations between phrases as a Markov model and meanwhile employ a robust smoothing strategy to provide better generalization. This method defines a recursive estimation process and backs off in parallel paths to infer richer structures. Our evaluation shows an 1.1–3.2% BLEU improvement over competitive baselines for Chinese-English and Arabic-English translation. Bio: Yang Feng is a postdoctoral scholar in Kevin Knight's NLP group in USC/ISI. She got her Ph.D. degree in 2011 from Institute of Computing Technology, Chinese Academy of Sciences. Her interests are machine translation and machine learning, focusing on Bayesian inference and Gaussian process. Now her main work is to improve ISI syntax-based system.
02 Jul 2014	Matic Horvat (Cambridge)	A Graph-Based Approach to String Regeneration [Intern talk] Time: 2:30 pm - 3:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: I'll talk about a graph based approach to the string regeneration problem, published at 2014 EACL Student Research Workshop. I will conclude my talk by briefly talking about my PhD research direction of including semantics (MRS) into a state-of-the-art SMT system.String regeneration is the problem of generating a fluent sentence from an unordered list of words. The purpose of investigating and developing approaches to solving the string regeneration problem is grammaticality and fluency improvement of machine generated text. I investigated a graph-based approach to the string regeneration problem that finds a permutation of words with the highest probability under an n-gram language model. Bio: I am a PhD student at University of Cambridge researching integration of semantics and Statistical Machine Translation. I am originally from Ljubljana, Slovenia, where I completed a BSc in Computer Science in 2012. I continued with a masters in Advanced Computer Science at University of Cambridge, graduating in 2013.
30 Jun 2014	Eunsol Choi (University of Washington)	Open Domain Semantic Parser for QA / Information Extraction [Intern talk] Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering or knowledge base completion with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as `daughter' and `number of people living in' cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. Here, we introduce a semantic parsing approach that learns to resolve such ontological mismatches. The parser uses a probabilistic CCG to build linguistically motivated logical-form meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Bio: Eunsol Choi is a Ph.D student at the University of Washington, advised by Prof. Luke Zettlemoyer. Prior to UW, she studied mathematics and computer science at Cornell University.
16 Jun 2014	Dirk Hovy (University of Copenhagen)	Two ways to deal with annotation bias Time: 3:30 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: In NLP, we rely on annotated data to train models. This implicitly assumes that the annotations represent the truth. However, this basic assumption can be violated in two ways: either because the annotators exhibit a certain bias (consciously or subconsciously), or because there simply is not one single truth. In this talk, I will present approaches to deal with both problems.In the case of biased annotators, we can collect multiple annotations and use an unsupervised item-response model to infer the underlying truth and the reliability of the individual annotators. We present a software package, MACE (Multi-Annotator Competence Estimation) with considerable improvements over standard baselines both in terms of predicted label accuracy and estimates of trustworthiness, even under adversarial conditions. Additionally, we can trade precision for recall, achieving even higher performance by focusing on the instances our model is most confident in.In the second case, where not a single truth exists, we can collect information about easily confused categories and incorporate this knowledge into the training process. We use small samples of doubly annotated POS data for Twitter to estimate annotation reliability and show how those metrics of likely inter-annotator agreement can be implemented in the loss functions of structured perceptron. We find that these cost-sensitive algorithms perform better across annotation projects and, more surprisingly, even on data annotated according to the same guidelines. Finally, we show that these models perform better on the downstream task of chunking. Bio: Dirk Hovy is a postdoc in the Center for Language Technology at the University of Copenhagen, working with Anders Søgaard on improving analysis of low-resource languages. Their recent paper on POS tagging with inter-annotator agreement won the best paper award at EACL 2014.Dirk received his PhD from the University of Southern California (USC), where he was working at the Information Sciences Institute (ISI) on unsupervised relation extraction. He has a background in socio-linguistics and worked on unsupervised and semi-supervised models for relation extraction, temporal links, and WSD, as well as annotator assessment. He is interested in the "human" aspects of NLP, i.e., the individual bias people have when producing or annotating language, and how it affects NLP applications.His other interests include cooking, cross-fit, and medieval art and literature.
11 Jun 2014	Julian Schamper (RWTH Aachen)	Solving Homophonic Sustitution Ciphers [Intern talk] Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Performing machine translation with monolingual data instead of parallel data is an interesting problem. Because of the lack of parallel data for many language pairs, solving the problem would arise interesting new use cases.On the road towards this we look at similar but easier problems. In the past improvements on simple substitution ciphers (1:1) were made - Even word substitutions ciphers with large vocabularies were solved for example by a beamsearch approach. This talk concentrates on the more complicated cipher class of homophonic substitution ciphers (1:m) like the famous Z408 of the Zodiac killer or the second page of the Beale cipher. We preset a method based on beamsearch. Covered aspects are an improved heuristic, the order the beamsearch should explore the search space, pruning, and the impact of the cipher lengths and cipher alphabet size on the deciphering accuracy. Bio: Julian Schamper studies computer science at RWTH Aachen University. He did its bachelor thesis in the field of deciphering foreign language and works as a student research assistant at Prof. Hermann Ney's Human Language Technology and Pattern Recognition Group.
06 Jun 2014	Elnaz Nouri (USC/ICT)	Cultural Negotiating Agents Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract:People from different cultures and backgrounds tend to make different decisions faced with the same set of choices. Cultural background influences people's decisions in social interactions. Computational agents that are intended to simulate human behavior or engage in interpersonal interactions such as negotiation with humans need decision making models that are sensitive to culture. In this talk, we show how agents can learn to behave like people from specific cultures in the context of a negotiation game. Bio: Elnaz Nouri is a PhD student in the Natural Language group at USC's Institute for creative Technologies (ICT).
23 May 2014	Xing Shi (USC/ISI)	How to Speak a Language Without Knowing It Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We develop a system that lets people overcome language barriers by letting them speak a language they do not know. Our system accepts text entered by a user, translates the text, then converts the translation into a phonetic spelling in the user’s own orthography. We trained the system on phonetic spellings in travel phrasebooks. Xing Shi is a PhD student at USC, advised by Professor Kevin Knight.
16 May 2014	Hans Chalupsky (USC/ISI)	Story-Level Inference to Improve Machine Reading Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Extracting well-defined entities and relations that hold between them fromunstructured text is an important prerequisite for a variety of tasks such asknowledge base population, question answering, data analytics, visualization,etc. The difficulty of this problem is evidenced by the annual TAC-KBPevaluations organized by NIST, where the best-performing systems in theslot-filling task still only achieve an f-value in the high 30's. These higherror rates on individual relations get further compounded once relationshave to be joined to answer a question.State-of-the art statistical information extraction techniques focus primarilyon the phrase and sentence level to extract entities and relations betweenthem, and are generally ignorant of the greater context around them. Wepresent a new approach which aggregates locally extracted information into alarger story context and uses abductive reasoning to generate the beststory-level interpretation. We demonstrate that this approach cansignificantly improve relation extraction and question answering performanceon complex questions. We will also describe ongoing work to apply this typeof inference to the TAC Knowledge Base Population task in order to improverelation extraction and coreference resolution. Bio: Hans Chalupsky is a project leader at the Information Sciences Institute ofthe University of Southern California, where he leads the Loom KnowledgeRepresentation and Reasoning Group. He holds a Master's degree in computerscience from the Vienna University of Technology, Austria and a Ph.D. incomputer science from the State University of New York at Buffalo.Dr. Chalupsky has over 25 years of experience in the design, development andapplication of knowledge representation and reasoning systems such asPowerLoom, and he is the principal architect of the KOJAK Link DiscoverySystem. His research interests include knowledge representation and reasoningsystems, natural language processing, knowledge and link discovery, anomalydetection and semantic interoperability.
14 May 2014	Qing Dou (USC/ISI)	Beyond Parallel Data [Qualification practice talk] Time: 11:00 am - 12:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Thanks to the availability of parallel data and advances in machine learning techniques, we have seen tremendous improvement in the field of machine translation over the past 20 years. However, due to lack of parallel data, the quality of machine translation is still far from satisfying for many language pairs and domains. In general, it is easier to obtain non-parallel data, and much work has tried to learn translations from non-parallel data. Nonetheless, improvements to machine translation have been limited. In this work, I follow a decipherment approach to learn translations from non parallel data and achieve significant gains in machine translation.I apply slice sampling to Bayesian decipherment. Compared with the state-of-the-art algorithm, the new approach is highly scalable and accurate, making it possible to decipher billions of tokens with hundreds of thousands of word types at high accuracy for the first time. Furthermore, I introduce dependency relations to address the problems of word reordering, insertion, and deletion when deciphering foreign languages, and show that dependency relations help improve deciphering accuracy by over 5-fold.I decipher large amounts of monolingual data to learn translations for out-of-vocabulary words and observe significant gains of up to 3.8 BLEU points in domain-adaptation. Moreover, I show that a translation lexicon learned from large amounts of non-parallel data with decipherment can improve a phrase-based machine translation system trained with limited parallel data. In experiments, I observe BLEU gains of 1.2 to 1.8 across three different test sets.Given the above success, I propose to work on advancing machine translation of real world low density languages, and to explore using non-parallel data to improve word alignment and discovery of phrase translations. Bio: Qing Dou is a fourth year PhD student at USC/ISI, advised by Professor Kevin Knight.
09 May 2014	Aram Galstyan (USC/ISI)	Deciphering Social Interactions from Text Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Studies of social systems have traditionally focused on analyzing various structural properties of networks induced by social communication, while ignoring the content of communication. Despite recent advances, language-based analysis of social processes is still a challenging problem due to the lack of sound mathematical frameworks and adequate computational methods for extracting and analyzing useful social signals from unstructured text.Here I will describe our recent work on content-based analysis of social interactions, which involves two main steps: (a) Embedding communication content in an abstract content space, so that a sequence of textual exchanges is represented as trajectories in this space; and (b) Applying tools from information theory and dynamical systems to discover and characterize directional correlations among those trajectories. I will briefly describe the main elements of the technical approach, and demonstrate the usefulness of the proposed framework on two case studies: content-based characterization of social influence, and stylistic coordination in dialogues. Bio: Aram Galstyan is a Project Leader at the USC Information Sciences Institute and a Research Assistant Professor at the USC Computer Science Department. His current research focuses on characterizing and predicting behavior of dynamic networks using information–theoretic concepts. His other research interests include developing statistical–physics based approaches for understanding fundamental limits of various inference algorithms and characterizing the performance of those algorithms with respect to stability and robustness.
25 Apr 2014	Hui Zhang (USC/ISI)	[ACL2014 practice talk] Kneser-Ney Smoothing on Expected Counts Time: 10:30 am - 11:30 am Location: 11th Floor Large Conference Room [1135] Abstract: Widely used in speech and language processing, Kneser-Ney (KN) smoothing has consistently been shown to be one of the best-performing smoothing methods. However, KN smoothing assumes integer counts, limiting its potential uses—for example, inside Expectation-Maximization. In this paper, we propose a generalization of KN smoothing that operates on fractional counts, or, more precisely, on distributions over counts. We rederive all the steps of KN smoothing to operate on count distributions instead of integral counts, and apply it to two tasks where KN smoothing was not applicable before: one in language model adaptation, and the other in word alignment. In both cases, our method improves performance significantly. Hui Zhang is a fourth year PhD student working with Professor David Chiang at the USC Information Sciences Institute. His main research interests are in statistical machine translation and machine learning. He has focused on domain adaptation and smoothing techniques.
25 Apr 2014	Linhong Zhu (USC/ISI)	Partitioning Networks with Node Attributes by Compressing Information Flow Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Real-world networks are often organized as modules or communities of similar nodes that serve as functional units. These networks are also rich in content, with nodes having distinguishing features or attributes. In order to discover a network's modular structure, it is necessary to take into account not only its links but also node attributes.We describe an information-theoretic method that identifies modules by compressing descriptions of information flow on a network. Our formulation introduces node content into the description of information flow, which we then minimize to discover groups of nodes with similar attributes that also tend to trap the flow of information.The method has several advantages: it is conceptually simple and does not require ad-hoc parameters to specify the number of modules or to control the relative contribution of links and node attributes to network structure.We apply the proposed method to partition real-world networks with known community structure. We demonstrate that adding node attributes helps recover the underlying community structure in content-rich networks more effectively than using links alone. In addition, we show that our method is faster and more accurate than alternative state-of-the-art algorithms. Linhong Zhu is currently a Postdoctoral Research Associate at Information Sciences Institute, University of Southern California, under the supervision of Dr. Kristina Lerman and Dr. Aram Galstyan. Before that, she worked as a scientist-I at Institute for Infocomm Research Singapore from Oct 2010 to Jan 2013. She got her B Eng. Degree in Computer Science from University of Science and Technology of China in 2006 (2002-2006) and received her Ph.D. Degree in Computer Engineering from Nanyang Technological University (2006-2011). Her research interests focus on large-scale social network analysis and sentiment analysis.
16 Apr 2014	Derek Abbott (University of Adelaide)	The Mystery of the Tamam Shud Code Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: One of the leading unsolved mysteries in Australia, is the case of the Somerton Man. This was a very athletically fit man found in a nice suit lying deceased on a beach in Australia in 1948. The mystery is that there was no mark on him and there was nothing to identify him. No one came forward to identify him either. Over 65 years later we still do not know his name or how he died. He had no ID, but his pocket had a piece of paper with the words "Tamam Shud" on it. It was subsequently found that the piece of paper had been torn out of a copy of a poetry book called the Rubaiyat of Omar Khayyam. Penciled in the back of the book were letters that appeared to be some sort of code. Is this a clue? This talk will outline the key facts of mystery and show how forensic skills in engineering and computing are being used to attempt to both identify the man and shed light on the mysterious letters. Derek Abbott received a B.Sc. (Hons) in physics from Loughborough University, U.K. in 1982 and completed his Ph.D. in electrical and electronic engineering from the University of Adelaide, Adelaide, Australia, in 1995. From 1978 to 1986, he was a research engineer at the GEC Hirst Research Centre, London, U.K. From 1986–1987, he was a VLSI design engineer at Austek Microsystems, Australia. Since 1987, he has been with the University of Adelaide, where he is presently a full Professor with the School of Electrical and Electronic Engineering. Prof. Abbott is a Fellow of the Institute of Physics (IOP) and a Fellow of the IEEE. He has won a number of awards including a Tall Poppy Award for Science (2004), a Premier’s Award in Science and Technology for outstanding contributions to South Australia (2004), and an Australian Research Council (ARC) Future Fellowship (2012). He is on the editorial board of Proceedings of the IEEE. His interests are in complex systems and multidisciplinary applications of physics and engineering.
11 Apr 2014	Farshad Kooti (USC/ISI)	Network Weirdness: Exploring the Origins of Network Paradoxes Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Social networks have many counter-intuitive properties, including the “friendship paradox” that states, on average, your friends have more friends than you do. Recently, a variety of other paradoxes were demonstrated in online social networks. This paper explores the origins of these network paradoxes. Specifically, we ask whether they arise from mathematical properties of the networks or whether they have a behavioral origin. We show that sampling from fat-tailed distributions always gives rise to a paradox in the mean, but not the median. We propose a strong form of network paradoxes, based on utilizing the median, and validate it empirically using data from two online social networks. Specifically, we show that for any user the majority of user’s friends and followers have more friends, followers, etc. than the user, and that this cannot be explained by statistical properties of sampling. Next, we explore the behavioral origins of the paradoxes by using the shuffle test to remove correlations between node degrees and attributes. We find that paradoxes for the mean persist in the shuffled network, but not for the median. We demonstrate that strong paradoxes arise due to the assortativity of user attributes, including degree, and correlation between degree and attribute.
28 Feb 2014	Kenji Sagae (USC/ICT)	Dependency parsing with directed graph output Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Most data-driven dependency parsing approaches assume that thestructure of sentences is represented as trees. Although trees haveseveral desirable properties from a computational perspective, thestructure of linguistic phenomena that go beyond shallow syntax oftencannot be fully captured by tree representations. I will describedata-driven dependency parsing approaches that produce more generalgraphs as output, and present results obtained with these approacheson predicate-argument structures extracted from CCG and HPSG datasets. Kenji Sagae is a Research Scientist in the Institute for Creative Technolgies at the University of Southern California, and a Research Assistant Professor in the USC Computer Science Department. He received his PhD from Carnegie Mellon University in 2006. Prior to joining USC in 2008, he was a research associate at the University of Tokyo. His main area of research is Natural Language Processing, focusing on data-driven approaches for syntactic parsing, predicate-argument analysis and discourse processing. His current work includes the application of these techniques in analysis of personal narratives in blog posts, the study of child language, spoken dialogue systems, and multimodal processing.
14 Feb 2014	Hal Daumé III (University of Maryland)	Predicting Linguistic Structures Accurately and Efficiently Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Many classic problems in natural language processing can be cast as building mapping from a complex input (e.g., a sequence of words) to a complex output (e.g., a syntax tree or semantic graph). This task is challenging both because language is ambiguous (learning difficulties) and represented with discrete combinatorial structures (computational difficulties). Often these are at odds: the features you want to add to decrease learning difficulties cause nontrivial additional structure yielding worse computational difficulties.I will begin by discussing algorithms that side-step the issue of combinatorial blowup and aim to predict an output structure directly. I will then present approaches that explicitly learn to trade-off accuracy and efficiency, applied to a variety of linguistic phenomena. Moreover, I will show that in some cases, we can actually obtain a model that is faster and more accurate by exploiting smarter learning algorithms. Hal's homepage: http://www.umiacs.umd.edu/~hal/
17 Jan 2014	Mohsen Taheriyan (USC/ISI)	A Graph-based Approach to Learn Semantic Descriptions of Data Sources Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Semantic models of data sources and services provide support to automate many tasks such as source discovery, data integration, and service composition, but writing these semantic descriptions by hand is a tedious and time-consuming task. Most of the related work focuses on automatic annotation with classes or properties of source attributes or input and output parameters. However, constructing a source model that includes the relationships between the attributes in addition to their semantic types remains a largely unsolved problem. In this talk, we present a graph-based approach to hypothesize a rich semantic description of a new target source from a set of known sources that have been modeled over the same domain ontology. We exploit the domain ontology and the known source models to build a graph that represents the space of plausible source descriptions. Then, we compute the top k candidates and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic descriptions of future data sources. Our evaluation shows that our method produces models that are twice as accurate than the models produced using a state of the art system that does not learn from prior models. Mohsen's webpage: http://www-scf.usc.edu/~taheriya/
06 Dec 2013	Shiwali Mohan (University of Michigan)	Learning Hierarchical Tasks from Situated Interactive Instruction Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Our research aims at building interactive robots and agent that can expand their knowledge by interacting with human users. In this talk, I will give an overview of our ongoing work on learning novel tasks from linguistic, mixed-initiative instructions. The first part of the talk will address the problems of situated language comprehension for cognitive agents in real-world environments. The second part will focus on task learning. I will discuss the knowledge representations we employ to represent hierarchical, goal-oriented tasks and how this knowledge can be learned from interactions using an explanation-based learning framework. Bio: Shiwali Mohan is a Ph.D. candidate in the department of Computer Science and Engineering at the University of Michigan, Ann Arbor. Her research interests include situated language, interactive learning, and cognitive systems.
15 Nov 2013	Vikram Ramanarayanan (USC)	Data-Driven Techniques for Modeling Speech Motor Control Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Modeling the ways in which humans produce and perceive various forms of behavioral communication, such as speech, pose many diverse challenges. For instance, from a controls perspective, it is important to understand and model how control and coordination of various biological actuators in human body is achieved order to produce motor actions. From a signal processing perspective, we would like to discover novel representations or system architectures that are used in order to effect this coordination. We present a computational, data-driven approach to derive interpretable movement primitives from speech articulation data in a bottom-up manner. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatio-temporal basis sequences and an activation matrix. The algorithm optimizes a cost function that trades off the mismatch between the proposed model and the input data against the number of primitives that are active at any given instant. We further argue that such primitives can be modeled using nonlinear dynamical systems in a control-theoretic framework for speech motor control. Specifically, we extend our approach to extract a spatio-temporal dictionary of control primitives (sequences of control parameters), which can then be used to control a dynamical systems model of the vocal tract to produce any desired sequence of movements. Although the method is particularly applied to measured and synthesized articulatory data in our case, the framework is general and can be applied to any multivariate timeseries. The results suggest that the proposed algorithm extracts movement primitives from human speech production data that are linguistically interpretable.
08 Nov 2013	Giuseppe Carenini (University of British Columbia, Canada)	Modeling Topics, Opinions and Discourse Structure in Asynchronous Conversations Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [Rm 689] Abstract: Due to the Internet revolution, human conversational data--in written forms--are accumulating at a phenomenal rate, as more and more people engage in email exchanges, blogging, texting and other social media activities. In this talk, we will present automatic methods for analyzing conversational text generated in asynchronous conversations, i.e., where participants communicate with each other at different times (e.g., email, blog, forum). Our focus will be on novel techniques to detect the topics covered in the conversation, to identify whether an utterance in the conversation is expressing an opinion, as well as to determine the discourse structure of each message. In our work, we apply both graph-based methods and probabilistic graphical models. Giuseppe is an Associate Professor in Computer Science at the University of British Columbia (BC, Canada). Giuseppe has broad interdisciplinary interests. His work on natural language processing and information visualization to support decision making has been published in over 90 peer-reviewed papers. Dr. Carenini was the area chair for “Sentiment Analysis, Opinion Mining, and Text Classification” of ACL 2009 and the area chair for “Summarization and Generation” of NAACL 2012. He has recently co-edited an ACM-TISTSpecial Issue on “Intelligent Visual Interfaces for Text Analysis”. In July 2011, he has published a co-authored book on “Methods for Mining and Summarizing Text Conversations”. In his work, Dr. Carenini has also extensively collaborated with industrial partners, including Microsoft and IBM. Giuseppe was awarded a Google Research Award and anIBM CASCON Best Exhibit Award in 2007 and 2010 respectively.
01 Nov 2013	Greg Ver Steeg (USC/ISI)	Coarse-graining Text Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Because natural language is complex, researchers in many domains look for lower-dimensional representations of text to suit their purposes. Different methods attempt to single out intuitive aspects of language like content, sentiment, or style. I will discuss a new, unsupervised approach to learning abstract representations of text (or other high-dimensional signals). The motivating principle is to use information theory to construct higher-order features that explain correlations between lower-order features. I will present preliminary results using this framework. Greg Ver Steeg is a research professor at ISI. His research explores practical methods for inferring meaningful structure in complex systems like social networks. He did his PhD in quantum physics at Caltech.
25 Oct 2013	Roy Schwartz (NLP Lab, Hebrew University in Jerusalem)	Semantic Representation using Flexible Patterns Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Ever since their introduction in 1992, hand-crafted lexico-semantic patterns have been shown to be useful for many semantic tasks. In recent years, an automatic, fully unsupervised method to generate patterns was developed ("flexible patterns"). I will demonstrate that flexible patterns are useful for extracting semantic information on words, word relations and sentences. I will present in detail the latest results in the field – applying flexible patterns on the task of authorship attribution on tweets (Schwartz et al., EMNLP2013).
24 Oct 2013	Kuzman Ganchev (Google Research)	Cross lingual transfer and learning with side information Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: I will describe a framework for cross-lingual transfer ofprobabilistic models that uses posterior regularization. As a longaside, I will describe several methods for learning with sideinformation: constraint driven learning, posterior regularization,generalized expectation, augmented loss as well as how they relate toeach other and to Bayesian measurements. I will conclude with someapplications from my work and from the literature, including sequenceand tree models.Biography: I was born in Sofia, Bulgaria where I lived until February 1989. Myfamily moved to Zimbabwe and then in 1995 to New Zealand where I wentto high school. I came to the US in 1999 to study at SwarthmoreCollege. I spent the 2001-2002 academic year studying abroad in Paris.After graduating with a Bachelor of Arts in Computer Science in 2003 Iworked at StreamSage Inc. in Washington DC until starting at theUniversity of Pennsylvania in Fall 2004. During the summer of 2007 Iwas an intern at TrialPay in Mountain View, CA and during the summerof 2008 I was an intern at Bank of America in New York. I graduatedfrom UPenn in 2010 and have since been working at Google Inc. in NewYork.
16 Oct 2013	Qing Dou (USC/ISI)	Dependency Based Decipherment for Resource-Limited Machine Translation (EMNLP2013 practice talk) Time: 11:00 am - 12:00 pm Location: 6th Floor Large Conference Room [Rm # 689] Abstract: We introduce dependency relations into deciphering foreign languages and show that dependency relations help improve the state-of-the-art deciphering accuracy by over 500%. We learn a translation lexicon from large amounts of genuinely non parallel data with decipherment to improve a phrase-based machine translation system trained with limited parallel data. In experiments, we observe BLEU gains of 1.2 to 1.8 across three different test sets.
27 Sep 2013	Andrew S. Gordon (USC/ICT)	Heider-Simmel Interactive Theater Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In a famous 1944 paper, psychologist Fritz Heider and his student Marianne Simmel described an experiment where undergraduates were shown a short animated film depicting the movement of geometric shapes. Asked to describe what happened in the film, these students produced narratives that described the behavior of these shapes in anthropomorphic terms, ascribing to them plans, goals, emotions, and social roles that accounted for their behavior. Fritz Heider later wrote his seminal book, The Psychology of Interpersonal Relations, which articulated the role of Commonsense Psychology in the interpretation of the behavior of other people.In this talk I'll discuss our recent efforts to model the reasoning of the students in Heider and Simmel's original experiment. I'll describe our vision of a "Heider-Simmel Interactive Theater," a software application where people can create their own short movies involving geometric shapes in the style of Heider and Simmel's original film, which are then interpreted by the computer to generate a textual narrative of the author's creation. Then I'll lay out the technical plan, which involves the integration of probabilistic graphical models, weighted abduction, data-driven text generation, logical formalizations of commonsense psychology, and game-based data collection from the public at large.Before coming to the talk, please sign up and play "Triangle Charades" at the following website: http://charades.ict.usc.edu
20 Sep 2013	Yang Feng (USC/ISI)	A Markov Model of Machine Translation using Non-parametric Bayesian Inference (ACL 2013) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modeling of phrase-internal translation and reordering. However phrase-based approaches are much less able to model sentence level effects between different phrase-pairs. We propose a new model to address this imbalance, based on a word-based Markov model of translation which generates target translations left-to-right. Our model encodes word and phrase level phenomena by conditioning translation decisions on previous decisions and uses a hierarchical Pitman-Yor Process prior to provide dynamic adaptive smoothing. This mechanism implicitly supports not only traditional phrase pairs, but also gapping phrases which are non-consecutive in the source.Yang Feng is a posdoc of the natural language group in USC/ISI. She got her ph.D degree from Institute of Computing Technology, Chinese Academy of Sciences. Her research interests are in all aspects of machine translation and machine learning focusing on graphical models and Bayesian inference.
13 Sep 2013	Kevin Knight (USC/ISI)	Some Potential NLP Thesis Topics and Other Fun Research Projects Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: I'll present a dozen interesting, potentially high-impact NLP research projects. I'd like to make this a very interactive session.
06 Sep 2013	Jeon-Hyung Kang (USC/ISI)	LA-CTR: A Limited Attention Collaborative Topic Regression for Social Media Time: 3:00 pm - 4:00 pm Location: 6th Floor Conference Room [RM # 689] Abstract: Probabilistic models can learn users’ preferences from the history of their item adoptions on a social media site, and in turn, recommend new items to users based on learned preferences. However, current models ignore psychological factors that play an important role in shaping online social behavior. One such factor is attention, the mechanism that integrates perceptual and cognitive features to select the items the user will consciously process and may eventually adopt. Recent research has shown that people have finite attention, which constrains their online interactions, and that they divide their limited attention non-uniformly over other people. We propose a collaborative topic regression model that incorporates limited, non-uniformly divided attention. We show that the proposed model is able to learn more accurate user preferences than state-of-art models, which do not take human cognitive factors into account. Specifically we analyze voting on news items on the social news aggregator and show that our model is better able to predict held out votes than alternate models. Our study demonstrates that psycho-socially motivated models are better able to describe and predict observed behavior than models which only consider latent social structure and content.
30 Aug 2013	Tomer Levinboim (USC/ISI)	MKL and Low Rank Multiplicative Shaping Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Multiple Kernel Learning (MKL) has been a subject of intensive research over the past decade.Instead of searching for a good kernel function (implicitly, feature transformation of our data), the idea is to learn a combination of kernels that optimizes our objective.This formulation has found usage in feature selection and interpretability as well as (sometimes) leading to increased classification accuracy.In the talk, I will provide an introduction to MKL as well as present and compare a few MKL formulations for SVM classification.Given time, I will present our own non-linear (yet still convex) MKL formulation that linearly combines kernels that are first multiplied by low-rank matrices.
23 Aug 2013	Jonathan May (SDL Research)	Models of Translation Competitions (long paper at ACL2013) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: What do we want to learn from a translation competition and how do we learn it with confidence? We argue that a disproportionate focus on ranking competition participants has led to lots of different rankings, but little insight about which rankings we should trust. In response, we provide the first framework that allows an empirical comparison of different analyses of competition results. We then use this framework to compare several analytical models on data from the Workshop on Machine Translation (WMT).
16 Aug 2013	Gully Burns (USC/ISI)	Bridging Between Bioinformatics and Natural Language Processing Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The Mouse Genome Informatics database (MGI) has participated extensively in shared NLP challenges focussed on developing infrastructure for their use. This collaboration has advanced the field of applying NLP to biomedical text but has not yet generated workable technology for use in the lab. In advance of a workshop (Monday August 19, 2013 at ISI) dedicated to this subject, I will describe the SciKnowMine project to introduce the domain of biomedical NLP and to showcase how we can collaboratively accelerate the process of biocuration, making these important databases far more effective.Students, colleagues! You are very welcome to the workshop: http://www.isi.edu/projects/sciknowmine/sciknowmine_release_workshop_-_bridging_bionlp_and_biocuration
26 Jul 2013	Fabienne Braune (University of Stuttgart)	Multi bottom-up tree transducers in statistical machine translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: After a brief overview of applications of tree transducers in statistical machine translation, we introduce multi bottom-up tree transducers (XMBOT).We then present a complete translation system integrating XMBOT. The two main components of our pipeline are (a) rule extraction and (b) decoding. We begin by presenting the extraction of XMBOT rules from an aligned and bi-parsed parallel corpus. In a second step, we introduce our XMBOT decoder which is an adaptation of the syntax-based component of the Moses open-source MT toolkit to handle XMBOT rules. We end this talk with an evaluation of our system on the WMT 2009 English-to-German translation task.
19 Jul 2013	Jacqueline Lee (MIT)	Bayesian Approaches to Acoustic Model and Pronunciation Lexicon Discovery Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In the first part of the talk, we investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable. We present an unsupervised model that simultaneously segments the speech, discovers a proper set of sub-word units (e.g., phones) and learns a Hidden Markov Model (HMM) for each induced acoustic unit. Our approach is formulated as a Dirichlet process mixture model in which each mixture is an HMM that represents a sub-word unit. We apply our model to the TIMIT corpus, and the results demonstrate that our model discovers phone units that are highly correlated with English phones as well as produces better segmentation than the state-of-the-art baselines. We test the quality of the learned acoustic models on a spoken term detection task. Compared to the baseline, our model is able to improve the detection precision of top hits by a large margin. The creation of a pronunciation lexicon remains the most inefficient process in developing an automatic speech recognizer. In the second part of the talk, we discuss an unsupervised alternative to the conventional manual approach for creating pronunciation dictionaries. We present a hierarchical Bayesian model, which jointly discovers the phonetic inventory and the Letter-to-Sound (L2S) mapping rules in a language using only transcribed data. When tested on a corpus of spontaneous queries, our results demonstrate the superiority of the proposed joint learning scheme over its sequential counterpart, in which the latent phonetic inventory and L2S mappings are learned separately. Furthermore, the recognizers built with the automatically induced lexicon consistently outperform grapheme-based recognizers and even approach the performance of recognition systems trained using conventional supervised procedures.
12 Jul 2013	Daniel Bauer (Columbia)	Understanding Descriptions of Visual Scenes Using Graph Grammars Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: I will present work on the interpretation of descriptions of visual scenes such as 'A man is sitting on a chair and using the computer'. One application of this research is the automatic generation of 3D scenes which provides a way for non-artists to create graphical content and have wide-ranging applications in entertainment and education.The core task of text-to-scene generation involves understanding the high-level content of a description and translating it into a low-level representation representing a 3D scene as a set of relations between pre-existing 3D models. Linguistic, spatial, and world-knowledge inference is required in this process on different levels.My talk will present VigNet, a repository of lexical- and world knowledge needed for text-to-scene generation, which is based on FrameNet. I will also describe how visual scenes can be represented as directed graphs and how information in VigNet can be encoded in Synchronous Hyperedge Replacement Grammars to enable semantic parsing and generation of a scene. Bio: Daniel Bauer is a PhD candidate at Columbia University. His research interests include lexical and computational semantics, semantic parsing, and formal grammars in syntax and semantics. He is a co-founder of WordsEye Inc, a company that aims to make text-to-3D-scene generation available to everyone on social media. Daniel is currently an intern at ISI for the second summer in a row. He received his undergrad degree in Cognitive Science from the University of Osnabrück, Germany, and a MSc in Language Science and Technology from Saarland University.
10 Jul 2013	Victor Chahuneau (CMU)	Translating into Morphologically Rich Languages with Synthetic Phrases Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: Translation into morphologically rich languages is an important but recalcitrant problem in machine translation. When confronted with the large vocabulary sizes resulting from various morphological phenomena, the independence assumptions made by standard translation models mean that vast amounts of parallel training data (which do not generally exist) would be necessary to reliably estimate the numerous required parameters. On the other hand, previous attempts to remedy this situation have been unsatisfying either because they were highly language-dependent, or because they failed from a modeling perspective (e.g., they improved performance on long-tail types at the expense of frequent types). We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentence-specific phrases that are added to a standard translation model prior to decoding. Our approach relies on morphological analysis of the target language but we show that an unsupervised Bayesian model can also be used in place of a standard supervised analyzer. We report significant improvements in translation quality when translating from English to Russian, Hebrew and Swahili.
07 Jun 2013	Malte Nuhn (Aachen University, Germany)	Is Decipherment Difficult? Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Is it possible to learn useful translations from large amounts ofmonolingual data to improve machine translation? The intuitivefeeling is that learning a language without bilingual data is atleast "more difficult than learning from example translations". Inthis talk, I will present recent results on decipherment: I will showthat the decipherment problem is indeed difficult (NP-hard) and whatapproximations to the original problem can be made without hurtingdecipherment accuracy much. Bio: Having studied Physics and Computer Science at RWTH Aachen University, I'm currently a PhD student at Prof. Ney's Human Language Technology and Pattern Recognition Group in Aachen. I'm particularly interested in applying decipherment techniques to improve machine translation.
05 Jun 2013	Dirk Hovy	Learning Whom to Trust with MACE(NAACL Practice Talk) Time: 3:30 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Non-expert annotation services like Amazon's Mechanical Turk (AMT) are cheap and fast ways to evaluate systems and provide categorical annotations for training data. Unfortunately, some annotators choose bad labels in order to maximize their pay. Manual identification is tedious, so we experiment with an item-response model. It learns in an unsupervised fashion to a) identify which annotators are trustworthy and b) predict the correct underlying labels. We match performance of more complex state-of-the-art systems and perform well even under adversarial conditions. We show considerable improvements over standard baselines, both for predicted label accuracy and trustworthiness estimates. We show that the latter can be further improved by introducing a prior on model parameters and using Variational Bayes inference. Additionally, we present a method for trading precision and recall, achieving even higher performance by focusing on the instances our model is most confident in. We provide an implementation of MACE (Multi- Annotator Competence Estimation) for download at (http://www.isi.edu/publications/licensed-sw/mace/). Bio: Dirk Hovy is a recent PhD graduate from USC's Information Sciences Institute, working with Jerry Hobbs and Ed Hovy. He has a background in socio-linguistics. His current work includes unsupervised and semi-supervised sequential models of relation extraction and WSD, as well as annotator assessment. He has also worked on temporal relations, metaphors, and prosody. A full list of his publications can be found at(http://www.dirkhovy.com/portfolio/papers/index.php). His other interests include cooking, picking up heavy things (and putting them back down), and medieval art and literature.
17 May 2013	Qing Dou	Deciphering Gigaword Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: State of the art machine translation systems learn translation rules from large amounts of parallel data (pairs of sentences that are translation of each other). Unfortunately, the amount of parallel data is very limited for many languages and domains. In general, it is easier to obtain monolingual data. Is it possible to learn useful translations from large amounts of monolingual data to improve machine translation when the amount of parallel data is limited? In this talk, I will present my ongoing work that applies decipherment techniques to decipher hundreds of millions Spanish news texts into English and learns a translation lexicon from the decipherment to improve a translation model learned from limited parallel data.
03 May 2013	Dirk Hovy	Learning Semantic Types and Relations from Text (Defense Practice Talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: NLP applications such as Question Answering (QA), Information Extraction (IE), or Machine Translation (MT) are incorporating increasing amounts of semantic information. A fundamental building block of semantic information is the relation between a predicate and its arguments, e.g. eat(John,burger). In order to reason at higher levels of abstraction, it is useful to group relation instances according to the types of their predicates and the types of their arguments. For example, while eat(Mary,burger) and devour(John,tofu) are two distinct relation instances, they share the underlying predicate and argument types INGEST(PERSON,FOOD).A central question is: where do the types and relations come from?The subfield of NLP concerned with this is relation extraction, which comprises two main tasks:1. identifying and extracting relation instances from text2. determining the types of their predicates and argumentsThe first task is difficult for several reasons. Relations can express their predicate explicitly or implicitly. Furthermore, their elements can be far part, with unrelated words intervening. In this thesis, we restrict ourselves to relations that are explicitly expressed between syntactically related words. We harvest the relation instances from dependency parses.The second task is the central focus of this thesis. Specifically, we will address these three problems: 1) determining argument types 2) determining predicate types 3) determining argument and predicate types. For each task, we model predicate and argument types as latent variables in a hidden Markov models. Depending on the type system available for each of these tasks, our approaches range from unsupervised to semi-supervised to fully supervised training methods. The central contributions of this thesis are as follows:1. Learning argument types (unsupervised): We present a novel approach that learns the type system along with the relation candidates when neither is given. In contrast to previous work on unsupervised relation extraction, it produces human-interpretable types rather than clusters. We also investigate its applicability to downstream tasks such as knowledge base population and construction of ontological structures. An auxiliary contribution, born from the necessity to evaluate the quality of human subjects, is MACE (Multi-Annotator Competence Estimation), a tool that helps estimate both annotator competence and the most likely answer.2. Learning predicate types (unsupervised and supervised): Relations are ubiquitous in language, and many problems can be modeled as relation problems. We demonstrate this on a common NLP task, word sense disambiguation (WSD) for prepositions (PSD). We use selectional constraints between the preposition and its argument in order to determine the sense of the preposition. In contrast, previous approaches to PSD used n-gram context windows that do not capture the relation structure. We improve supervised state-of-the-art for two type systems.3. Argument types and predicates types (semi-supervised): Previously, there was no work in jointly learning argument and predicate types because (as with many joint learning tasks) there is no jointly annotated data available. Instead, we have two partially annotated data sets, using two disjoint type systems: one with type annotations for the predicates, and one with type annotations for the arguments. We present a semisupervised approach to jointly learn argument types and predicate types, and demonstrate it for jointly solving PSD and supersense-tagging of their arguments. To the best of our knowledge, we are the first to address this joint learning task.Our work opens up interesting avenues for both the typing of existing large collections of triple stores, using all available information, and for WSD of various word classes.
12 Apr 2013	Hui Zhang	Beyond Left-to-Right: Multiple Decomposition Structures for SMT Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Standard phrase-based translation models do not explicitly model context dependence between translation units. As a result, they rely on large phrase pairs and target language models to recover contextual effects in translation. In this work, we explore language models over Minimal Translation Units (MTUs) to explicitly capture contextual dependencies across phrase boundaries in the channel model. As there is no single best direction in which contextual information should flow, we explore multiple decomposition structures as well as dynamic bidirectional decomposition. The resulting models are evaluated in an intrinsic task of lexical selection for MT as well as a full MT system, through n-best re-ranking. These experiments demonstrate that additional contextual modeling does indeed benefit a phrase-based system(up to 2.8 BLEU score) and that the direction of conditioning is important. Integrating multiple conditioning orders provides consistent benefit, and the most important directions differ by language pair.
05 Apr 2013	Abe Kazemzadeh	Sentiment and Sarcasm in the 2012 US Presidential Election Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Political discourse is challenging from a sentiment analysis point of view because political issues are subjective and highly dynamic. Political language may contain neologisms that do not occur frequently in general purpose lexical sentiment models. Also, the presence of humor, sarcasm, and comparatives may introduce errors in sentiment analysis. In Twitter, these issues are amplified by the use of Twitter-specific features and constrained message lengths. In this presentation, we will present a collaborative project between the University of Southern California (USC) Signal Analysis and Interpretation Laboratory, USC Annenberg Innovation Laboratory, and IBM. Our system is relies on manual curation of keywords and hashtags, crowd-sourced annotation, statistical machine learned sentiment models, and a real-time visualization that is ideal for display during live events. We describe our corpus and several experiments using different settings of our sentiment models. Among our findings are that sentiment in politics is skewed towards negative, annotation agreement tend to be low, and that sarcasm is a factor that explains some of the annotator disagreement.We have also studied bigger picture questions such as how much weight tweets by Big Bird (or someone pretending to be Big Bird) should be allocated in reporting the results of sentiment analysis. Question about the role of humor and sarcasm in social media lead to some skepticism of naive applications of sentiment analysis but present interesting examples of content that influences social media user behavior and spills over into traditional media. This is joint work with Dogan Can, Nikos Malandrakis, Hao Wang, Alex Leavitt, Kevin Driscoll, Kristen Guth, Theo Mazumdar, Varun Lingaraju, Sagar Jhobalia, Mellisa Loudon, Shrikanth Narayanan, Françs Bar, Kjerstin Thorson, Mike Ananny, Sam Thomson, Ed Elze, Graham Mackintosh, Robert Uleman, Leon Katsnelson, and Chris Gruber.
18 Mar 2013	Carlo Strapparava	Computational explorations of creative language Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Dealing with creative language and in particular with affective, persuasive and even humorous language has often been considered outside the scope of computational linguistics. Nonetheless it is possible to exploit current NLP techniques starting some explorations about it. We briefly review some computational experiences about these typical creative genres. We will start introducing techniques for dealing with emotional and witty language. Then we will talk about the exploitation of some extra-linguistic features: for example music and lyrics in emotion detection, and an audience-reaction tagged corpus of political speeches for the analysis of persuasive language. As examples of practical applications, we will present a system for automatized memory techniques for vocabulary acquisition in a second language, and an application for automatizing creative naming (branding). Bio: Carlo Strapparava is a senior researcher at FBK-irst (Fondazione Bruno Kessler - Istituto per la ricerca scientifica e Tecnologica) in the Human Language Technologies Unit. His research activity covers artificial intelligence, natural language processing, intelligent interfaces, human-computer interaction, cognitive science, knowledge-based systems, user models, adaptive hypermedia, lexical knowledge bases, word-sense disambiguation, affective computing and computational humour. He is the author of over 150 papers, published in scientific journals, book chapters and in conference proceedings. He also played a key role in the definition and the development of many projects funded by European research programmes.He regularly serves in the program committees of the major NLP conferences (ACL, EMNLP, etc.). He was executive board member of SIGLEX, a Special Interest Group on the Lexicon of the Association for Computational Linguistics (2007-2010), Senseval (Evaluation Exercises for the Semantic Analysis of Text) organisation committee (2005-2010).On June 2011, he was awarded with a Google Research Award on Natural Language Processing, specifically on the computational treatment of creative language.
08 Mar 2013	Sujith Ravi	Scalable Unsupervised Learning for Natural Language Processing Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Abstract: Natural language processing (NLP) tools have become ubiquitous for data analysis in digital environments such as the Web and social media. Popular applications include tools for clustering, sequence labeling, machine translation, to name a few. But unfortunately, majority of the existing toolkits rely on supervised learning to train models using labeled data. This poses several challenges---labeled data is not readily available in all languages or domains and building an NLP system from scratch for a new domain (or language, user, etc.) requires significant human effort which is both time-consuming and expensive. Moreover, scaling this strategy on the Web is infeasible.Recent advances in unsupervised algorithms have demonstrated promising results on several NLP tasks without using any labeled data. But despite their utility, scalable unsupervised algorithms rarely provide probabilistic representations of the data which can be useful for predicting on unseen data or integrated as components with a larger model or pipeline. In addition, these methods often favor simple model descriptions (e.g., k-means algorithm for clustering) in exchange for rich statistical models. This leads to the problem of rapidly diminishing returns when applying these methods on increasing amounts of data. Instead, we need to design algorithms that can scale elegantly to large data as well as complex models.In this work, I will present our recent work on scalable probabilistic learning with Bayesian inference. We show a novel algorithm for fitting mixtures of exponential families, which generalizes several models that are typically used in NLP and other areas. A major contribution of our work is a novel sampling method that uses locality sensitive hashing to achieve high throughput in generating proposals during sampling. Using "clustering" as an example application, I will describe our approach and show that it scales elegantly to large numbers of clusters achieving a speedup of several orders of magnitude over existing toolkits, while maintaining high clustering quality. In addition, we also prove probabilistic error guarantees for the new sampling algorithm. This is joint work with Amr Ahmed and Alex Smola. Lastly, I will briefly mention some ongoing work on large-scale unsupervised learning for other NLP applications such as machine translation. Bio: Sujith Ravi is a Research Scientist at Google. He completed his PhD at University of Southern California/Information Sciences Institute and joined Yahoo! Research, Santa Clara as a Research Scientist before joining Google, Mountain View in 2012. His main research interests span various problems and theory related to the fields of Natural Language Processing (NLP) and Machine Learning. He is specifically interested in large-scale unsupervised and semi-supervised methods and their applications to structured prediction problems in NLP, information extraction, user modeling in social media, graph optimization algorithms for summarizing noisy data, computational decipherment and computational advertising. His work has been reported in several magazines such as New Scientist, ACM TechNews, etc. For more information, you can visit his personal page (http://www.sravi.org).
22 Feb 2013	Louis-Philippe Morency	Modeling Human Communication Dynamics: From Depression Assessment to Multimodal Sentiment Analysis Time: 3:00 pm - 4:00 pm Location: 6th Floor Conference Room [689] Abstract: Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal displays and signals. Human interpersonal behaviors have long been studied in linguistic, communication, sociology and psychology. The recent advances in machine learning, pattern recognition and signal processing enabled a new generation of computational tools to analyze, recognize and predict human communication behaviors during social interactions. This new research direction have broad applicability, including the improvement of human behavior recognition, the synthesis of natural animations for robots and virtual humans, the development of intelligent tutoring systems, and the diagnoses of social disorders (e.g., autism spectrum disorder).In this talk, I will present some of our recent work modeling multiple aspects of human communication dynamics, including behavioral dynamic, multimodal dynamic and interpersonal dynamic. I will describe the different computational models specifically designed model these dynamics, including the Latent-Dynamic Conditional Random Fields, Multi-view Hidden Conditional Random Fields and the Latent Mixture of Discriminative Experts. I will show how these technologies can be applied to real-world problems such as negotiation outcome prediction, YouTube opinion mining, group learning analytics and psychological distress indicators. Finally, I will summarize our recent progress in integrating these sensing technologies with a virtual human for healthcare application. Bio: Louis-Philippe Morency is a Research Assistant Professor in the Department of Computer Science at the University of Southern California (USC) and Research Scientist at the USC Institute for Creative Technologies where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He received his Ph.D. and Master degrees from MIT Computer Science and Artificial Intelligence Laboratory. His research interests are in computational study of nonverbal social communication, a multi-disciplinary research topic that overlays the fields of multimodal interaction, computer vision, machine learning, social psychology and artificial intelligence. Dr. Morency was selected in 2008 by IEEE Intelligent Systems as one of the Ten to Watch for the future of AI research. He received 6 best paper awards in multiple ACM- and IEEE-sponsored conferences for his work on context-based gesture recognition, multimodal probabilistic fusion and computational modeling of human communication dynamics. His work was reported in The Economist, New Scientist and Fast Company magazines.
08 Feb 2013	Kartik Audhkhasi	A Computational Framework for Ensembles of Diverse Experts Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Ensembles of machine experts, from simple linear classifiers to complex hidden Markov models, have out-performed single experts across many applications. Likewise, ensembles have been central to computing with human experts, e.g. for data annotation. This widespread use of ensembles, albeit largely heuristic, is motivated by their better generalization and robustness to ambiguity in the production, representation, and processing of information.This talk will focus on three important problems which contribute towards a unified computational framework for ensembles of diverse experts. The first problem deals with "modeling" a diverse ensemble. I will present our proposed Globally-Variant Locally-Constant (GVLC) model as a statistical framework for answering this question. The second question is about "analysis", where I will address the link between ensemble diversity and performance using statistical learning theory. The final segment of my talk will focus on "designing" an ensemble of diverse linear classifiers, specifically conditional maximum entropy models. Practical applications throughout the talk will include emotion classification from speech, text classification, and crowd-sourcing for automatic speech recognition. Speaker bio: Kartik Audhkhasi received B.Tech. in Electrical Engineering and M.Tech. in Information and Communication Technology from Indian Institute of Technology, Delhi in 2008. He is currently pursuing the Ph.D. degree in Electrical Engineering from University of Southern California, Los Angeles. His thesis research focuses on modeling, analysis, and design of ensembles of multiple human or machine experts. He is also interested in crowd-sourcing for speech and language processing. His broad interests include machine learning and signal processing. Kartik is the recipient of the Annenberg, IBM, and Ming Hsieh Institute PhD fellowships, and best teaching assistant awards of the EE department at USC.
01 Feb 2013	Abeer Alwan	Dealing with Limited and Noisy Data in Speech Processing: A Hybrid Knowledge-Based and Statistical Approach Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk, I will focus on the importance of integrating knowledge of human speech production and speech perception mechanisms, and language-specific information with statistically-based, data-driven approaches to develop robust and scalable speech processing algorithms. The need for such hybrid systems is especially critical when dealing with data corrupted by background acoustic noise, when training data are limited, and when dealing with accents.
25 Jan 2013	Daniel Marcu	The Things I Learned While Doing Research in the Commercial World Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: When asked, as a PhD student, what I wanted to do when I grow up, I had one and only one answer: academic-oriented, natural language processing research. During the last decade, I have learned though to also appreciate the research opportunities in the commercial world. In this talk, I will compare several academic and commercial research models and ground the comparison in examples derived from my own experience while working as a researcher for USC, Language Weaver, and SDL
24 Jan 2013	Shrikanth Narayanan	Behavioral Signal Processing: Deriving Human Behavioral Informatics from Multimodal Signals Time: 3:00 pm - 4:00 pm Location: 6th Floor Conference Room [689] Abstract: Human behavior is exceedingly complex. Its expression and experience are inherently multimodal, and are characterized by individual and contextual heterogeneity. The confluence of sensing, communication and computing is however allowing access to data, in diverse forms and modalities, that is enabling us understand and model human behavior in ways that were unimaginable even a few years ago. No domain exemplifies these opportunities more than that related to human health and wellbeing. Consider for example the domain of Autism where crucial diagnostic information comes from manually-analyzed audiovisual data of verbal and nonverbal behavior. Behavioral signal processing advances can enable not only new possibilities for gathering data in a variety of settings--from laboratory and clinics to free living conditions--but in offering computational models to advance evidence-driven theory and practice.This talk will describe our ongoing efforts on Behavioral Signal Processing (BSP)--technology and algorithms for quantitatively and objectively understanding typical, atypical and distressed human behavior--with a specific focus on communicative, affective and social behavior. Using examples drawn from different application domains, the talk will also illustrate Behavioral Informatics applications of these processing techniques that contribute to quantifying higher-level, often subjectively described, human behavior in a domain-sensitive fashion.[Work supported by NIH, NSF, DARPA, and ONR].Biography of the Speaker:Shrikanth (Shri) Narayanan is Andrew J. Viterbi Professor of Engineering at USC, where he is Professor of Electrical Engineering, and, jointly in, Computer Science, Linguistics and Psychology. Prior to USC he was with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies. He is a Fellow of the Acoustical Society of America, IEEE, and the American Association for the Advancement of Science (AAAS). Shri Narayanan is an Editor for the Computer, Speech and Language Journal and an Associate Editor for the IEEE Transactions on Multimedia, the IEEE Transactions on Affective Computing and the Journal of Acoustical Society of America having previously served an Associate Editor for the IEEE Transactions of Speech and Audio Processing (2000-2004) and the IEEE Signal Processing Magazine (2005-2008). He is a recipient of several honors including the 2005 and 2009 Best Paper awards from the IEEE Signal Processing Society and serving as its Distinguished Lecturer for 2010-11. With his students, he has received a number of best paper awards including winning the Interspeech Challenges in 2009 (Emotion classification), 2011 (Speaker state classification) and in 2012 (Speaker trait classification). He has published over 500 papers and has 13 U.S. patents.
11 Jan 2013	Abe Kazemzadeh	Natural Language Description of Emotion (Ph.D. Thesis Defense Practice Talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This dissertation studies how people describe emotions with languageand how computers can simulate this descriptive behavior. Althoughmany non-human animals can express their current emotions as socialsignals, only humans can communicate about emotions symbolically.This symbolic communication of emotion allows us to talk aboutemotions that we may not currently be feeling, for example describingemotions that occurred in the past, gossiping about the emotions ofothers, and reasoning about emotions hypothetically. Another feature of thisdescriptive behavior is that we talk about emotions as if they werediscrete entities, even though we may not always have necessary andsufficient observational cues to distinguish one emotion from another,or even to say what is and is not an emotion. This motivates us tofocus on aspects of meaning that are learned primarily throughlanguage interaction rather than by observations through the senses.To capture these intuitions about how people describe emotions, wepropose the following thesis: natural language descriptions of emotionare definite descriptions that refer to intersubjective theoretical entities.We support our thesis using theoretical, experimental, computationalresults. The theoretical arguments use Russell's notion of definitedescriptions, Carnap's notion of theoretical entities, and thequestion-asking period in child language acquisition. The experimentaldata we collected include dialogs between humans and computers andweb-based surveys, both using crowd-sourcing on Amazon MechanicalTurk. The computational models include a dialog agent based onsequential Bayesian belief update within a generalized pushdown automaton,as well as a fuzzy logic model of similarity and subsethood between emotion terms.For future work, we propose a research agenda that includes acontinuation of work on the emotion domain as well as new work onother domains where subjective descriptions are established throughnatural language communication.Short Bio: Abe Kazemzadeh is a PhD candidate at the USC Computer Science Dept anda research assistant at the the Signal Analysis and InterpretationLaboratory (SAIL). His interests include natural language, logic,emotions, games, and algebra. He is currently the chief technologyofficer at the USC Annenberg Innovation Laboratory (AIL).
14 Dec 2012	Ulf Hermjakob	Launching Semantics-Based Machine Translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: I will present work defining an Abstract Meaning Represention (AMR)(joint work with Kevin Knight et al.) that serves as an intermediatesemantic structure when translating between languages such as Chineseand English as well as automatic and manual annotation efforts tobuild corpora of AMRs.I will give a demo of our web-based AMR Editor, which is used by dozensof annotators at LDC, SDL/LanguageWeaver (Cluj) and other places.Finally, I will give an overview of our initial end-to-end prototype,with rule extraction (own work), decoding from source language to AMR(work by Yinggong Zhao) and AMR to target language generation (Yang Gao).
07 Dec 2012	Shu Cai	Smatch: an Evaluation Metric for Semantic Feature Structures Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: Feature structures are useful for capturing logical semantic relationships. In this talk, we present smatch, a metric that determines semantic overlap between two semantic feature structures. We give an ef.cient algorithm to compute the metric, and we show the results of an inter-annotator agreement study.
16 Nov 2012	Jerry Hobbs	Abduction and Metaphor Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: I will talk about recent progress in implementing an efficient method for doing a type of inferencing called abduction, or inference to the best explanation. I will illustrate its wide applicability to a variety of language interpretation problems. I'll describe our recent work on implementing ontologies, or logical theories of commonsense domains. Then I will show how we are applying all this to the interpretation of metaphors.
09 Nov 2012	Ashish Vaswani and David Chiang	Neural Networks for NLP Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: Recent years have seen a resurgence of Neural Networks in Natural Language Processing. Much of this success can be attributed to learning compact representations (or embeddings) of words, which are used as input to train standard Neural Network architectures. In the first part of the talk I will describe two approaches for learning word embeddings for large vocabularies. In the second part, I will talk about successful applications of Neural Networks in NLP tasks like Part-Of-Speech tagging, Chunking, Parsing etc. without any feature engineering. I will also describe some preliminary work on Neural Networks for unsupervised Part-Of-Speech tagging.
07 Nov 2012	Ashish Vaswani and David Chiang	Neural Networks for NLP Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: Recent years have seen a resurgence of Neural Networks in Natural Language Processing. Much of this success can be attributed to learning compact representations (or embeddings) of words, which are used as input to train standard Neural Network architectures. In the first part of the talk I will describe two approaches for learning word embeddings for large vocabularies. In the second part, I will talk about successful applications of Neural Networks in NLP tasks like Part-Of-Speech tagging, Chunking, Parsing etc. without any feature engineering. I will also describe some preliminary work on Neural Networks for unsupervised Part-Of-Speech tagging.
02 Nov 2012	Christian Chiarcos	Linguistic Linked Open Data. Linking Corpora Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: In the last 15 years, the interoperability of language resources hasbeen recognized as a major problem in the development of NLPinfrastructures -- partly due to an increased focus on novel,underresourced languages and efforts to bootstrap language resourcesby annotation projection -- partly due to the increased interest inmore abstract levels of linguistic analysis beyond morphosyntax andsyntax, namely semantics, reference and discourse.This talk describes the application of Semantic Web formalisms, RDF,OWL/DL and SPARQL, to facilitate the interoperability of linguisticcorpora and linguistic annotations. Interoperability of linguisticcorpora involves two aspects: Structural interoperability (annotationsof different origin are represented using the same formalism) andconceptual interoperability (annotations of different origin arelinked to a common vocabulary). I will describe ontology-basedapproaches for both aspects, the POWLA ontology that defines a datamodel for annotated corpora, and the Ontologies of LinguisticAnnotation (OLiA) that provide definitions for linguistic categoriesand properties (Chiarcos 2012). As compared to state-of-the-artapproaches based on standoff XML, e.g., the recently published ISOstandard for an Linguistic Annotation Framework, key advantages ofthis approach include the existence of a rich technological ecosystemdeveloped around RDF and OWL, including standardized query languagesfor directed acyclic (multi-) graphs (SPARQL), APIs, data baseimplementations, as well as the availability of OWL reasoners that canbe applied to validate the consistency of linguistic corpora and theirannotations and to infer additional information that is relevant, forexample, for their appropriate visualization.Naturally, representing corpora in OWL and RDF also allows tointerlink resources freely, e.g., different annotation layers of amulti-layer corpus, translated texts in parallel corpora, orlinguistic corpora and lexical-semantic resources. Modeled in thisway, corpora can be fully integrated in a Linked Open Data (sub-)cloudof linguistic resources, along with lexical-semantic resources andknowledge bases of information about languages and linguisticterminology. The second part of my talk will introduce recent effortsto create a Linked Open Data sub-cloud of linguistic resources, theLinguistic Linked Open Data cloud (Chiarcos et al. 2012, cf.http://linguistics.okfn.org).ReferencesChristian Chiarcos, Sebastian Hellmann, Sebastian Nordhoff, et al.(2012), The Open Linguistics Working Group, Proceedings of the 8thInternational Conference on Language Resources and Evaluation(LREC-2012). Istanbul, Turkey, May 2012.[http://www.lrec-conf.org/proceedings/lrec2012/pdf/912_Paper.pdf]Christian Chiarcos (2012), Interoperability of Corpora andAnnotations, In: Christian Chiarcos, Sebastian Nordhoff, and SebastianHellmann (eds.) Linked Data in Linguistics. Representing andConnecting Language Data and Language Metadata. Springer, Heidelberg.[http://www.springer.com/computer/ai/book/978-3-642-28248-5]Bio Christian Chiarcos studied Computer Science and General Linguistics atthe Technical University Berlin, Germany, and received his PhD inComputational Linguistics from the University of Potsdam, Germany in2010. He is currently affiliated with the University of Frankfurt/M.,Germany. Since April 2012, he is visiting scholar at the ISI. Hisprimary areas of expertese include the study and modeling of discoursesemantics, as well as the development of infrastructures for rich andheterogeneous linguistic annotations.
31 Oct 2012	Marcello Federico (FBK Trento, Italy), Marco Trombetti (Translated srl, Rome - Italy)	Towards the integration of human and machine translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We will given an overview of the challenges and early results of an EC-funded project, named MateCat,whose goal is developing an enhanced web-based CAT tool integrating new MT functionalities. In particular,MateCat will investigate the integration of MT into the CAT working process along three main directions:self-tuning MT, user adaptive MT, and informative MT. In this seminar, we will report on recent activitiesconcerning domain and on-line MT adaptation and will introduce the first version of the MateCat tool,that will be officially released in open source by the end of the year.
29 Oct 2012	Douglas W. Oard, University of Maryland	Evaluating E-Discovery Search: The TREC Legal Track Time: 2:00 pm - 3:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Civil litigation in this country relies on each side making relevant evidence available to the other, a process known as "discovery." The explosive growth of information in digital form has led to an increasing focus on how search technology can best be applied to balance costs and responsiveness in what has come to be known as "e-discovery".This is now a multi-billion dollar business, one in which new vendors are entering the market frequently, usually with impressive claims about the efficacy of their products or services. Courts, attorneys, and companies are actively looking to understand what should constitute best practice, both in the design of search technology and in how that technology is employed. In this talk I will provide an overview of the e-discovery process, and then I will use that background to motivate a discussion of which aspects of that process the TREC Legal Track is seeking to model. I will then spend most of the talk describing two novel aspects of evaluation design: (1) recall-focused evaluation in large collections, and (2) modeling an interactive process for "responsive review" with fairly high fidelity. Although I will draw on the results of participating teams to illustrate what we have learned, my principal focus will be on discussing what we presently understand to be the strengths and weaknesses of our evaluation designs.About the Speaker: Douglas Oard is a Professor at the University of Maryland, College Park, with joint appointments in the College of Information Studies and the Institute for Advanced Computer Studies, where he currently serves as director of the Computational Linguistics and Information Processing lab. Dr. Oard earned his Ph.D. in Electrical Engineering from the University of Maryland. His research interests center around the use of emerging technologies to support information seeking by end users. His recent work has focused on interactive techniques for cross-language information retrieval, searching conversational media such as speech and email, evaluation design for e-discovery in the TREC Legal Track, and support for sense-making in large digital archival collections. Additional information is available at http://terpconnect.umd.edu/~oard/.
26 Oct 2012	Philipp Koehn	Computer Aided Translation Time: 3:00 am - 4:00 pm Location: 10th Floor Conference Room [1026] Abstract: Despite all the recent successes of machine translation, when itcomes to high quality publishable translation, human translatorsare still unchallenged. Since we can't beat them, can we helpthem to become more productive? I will talk about some recentwork on developing assistance tools for human translators.You can also check out a prototype at http://www.caitra.org/
19 Oct 2012	Marc Schulder	Metaphor Detection through Term Frequency Time: 3:00 am - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: Metaphors are used to replace complicated or unfamiliar ideas with familiar, yet unrelated concepts that share an important attribute with the intended idea. The result is a conceptual mapping between metaphoric source and literal target meaning.Computational metaphor processing is divided into detection and interpretation.To detect metaphors, most existing approaches attempt to identify these conceptual mappings. They require resources for the source (metaphor) as well as the target domain, and a set of defined mappings between the two. Creating these resources is expensive and limits the scope of these systemsThey are also usually restricted to well-observed, conventionalized metaphors, and can not deal with neologisms. Since metaphors are a productive area of language, this is a major shortfall.We propose a statistical approach to metaphor detection that utilizes the uncommonness of novel metaphors. Words that do not match a text's typical vocabulary are highlighted as metaphor candidates. No knowledge of semantic concepts or the metaphor's source domain is required for this. We analyze the performance of this approach as an unsupervised standalone classifier and as a feature in a supervised graphical model.
12 Oct 2012	Jagadeesh Jagarlamudi	Discriminative Interlingual Representations for NLP Time: 11:00 am - 12:00 pm Location: 11th Floor Conference Room [1135] Abstract: The language barrier in many of the multilingual natural language processing (NLP) tasks, such as name transliteration, mining bilingual word translations, etc., can be overcome by mapping objects (names and words in the respective tasks) from different languages (or ?iews? into a common low-dimensional subspace. Multi-view models learn such a low-dimensional subspace using a training corpus of paired objects, e.g. name pairs written in different languages.The central idea of my dissertation is to learn low-dimensional subspaces (or interlingual representations) that are effective for various multilingual and monolingual NLP tasks. First, I demonstrate the effectiveness of interlingual representations in mining bilingual word translations for machine translation, and then proceed to developing models for diverse situations that often arise in NLP tasks. In particular, I design models for 1) bridge setting -- when there are more than two views but we only have training data from a single pivot view into each of the remaining views 2) reranking setting -- when an object from one view is associated with a ranked list of objects from another view, and finally 3) when the underlying objects have rich structure, such as a tree. These problem settings arise frequently in real world applications. I choose a canonical task for each of the settings and compare my model with existing state-of-the-art baseline systems. I provide empirical evidence for the first two models on multilingual name transliteration and the part-of-speech tagging tasks, respectively. For the third problem setting, I discuss my ongoing work on vector based compositionality learning task. This task aims to find the meaning, represented as a vector in d-dimensional space, of a sentence or a phrase based on the meaning of its constituent words.
10 Oct 2012	Victoria Fossum	Sequential vs. hierarchical syntactic models of human sentence processing Time: 2:00 pm - 3:00 pm Location: 6th Floor Conference Room [689] Abstract: Human incremental sentence processing is the process by which we reada sentence, word-by-word, and ultimately comprehend its meaning. Acentral question in sentence processing research is to understand theprecise nature of the linguistic representations that we constructwhile comprehending a sentence. Experimental evidence demonstratesthat syntactic structure plays a role in these representations. Butopen questions remain about the type of syntactic structure that ismost relevant to the human sentence processing mechanism: is thissyntactic structure sequential or hierarchical? Does it includelexical information (in which case it is "lexicalized"), or is lexicalinformation processed independently from the syntactic structure (inwhich case the syntactic structure is "unlexicalized")?A previous study (Frank and Bod, 2011) compared unlexicalizedsequential and hierarchical models of human sentence processing, andfound that sequential models explain observed human behavior (e.g. eyemovements) during sentence processing better than hierarchical models.The authors concluded that the human sentence processing mechanism isinsensitive to hierarchical syntactic structure.We investigate this claim, and find a picture that is more complicatedthan the one presented by the previous study. First, we show thatlexicalized syntactic models explain observed human behavior duringsentence processing better than unlexicalized syntactic models.Second, we consider a broader set of sequential and hierarchicalmodels, and show that the findings of (Frank and Bod, 2011) do notgeneralize to this broader set. Finally, we show why, even within theset of models considered by (Frank and Bod, 2011), their findings arenot entirely conclusive. Our results indicate that the claim that thehuman sentence processing mechanism is insensitive to hierarchicalsyntactic structure is premature.
05 Oct 2012	Dirk Hovy	Learning Whom to Trust with MACE Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract:
06 Jul 2012	Stephan Gouws (Stellenbosch University)	Projecting features across domains using deep learning Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: Over the last few years, neural network-based deep-learning models achieved good results in various NLP tasks, such as language modelling, POS tagging, parsing, chunking, and NER. In contrast to discrete models like HMMs, neural models operate by jointly learning continuous input representations (embeddings), and the model to interpret them. These embeddings represent words and/or phrases in a lower-dimensional, latent, syntactic-semantic space and can often be learned in an unsupervised manner.We aim to exploit this property of deep learning to transfer knowledge from resource-rich to resource-poor domains. We facilitate the transfer of knowledge by constraining the learned embeddings of both domains to share as much structural similarity as possible. I will discuss preliminary results for noisy text normalization in Twitter, where the task is to transfer the correct clean words from English to the noisy Twitter domain, and review the main deep learning models for NLP (Bengio et al. (2003, Mnih and Hinton (2007), Collobert and Weston (2008), Mikolov et al. (2010), and Socher et al. (2011)). Bio: Stephan Gouws is a PhD student at Stellenbosch University in South Africa. He is currently on a short-term visit at the ISI. His main research focus is on developing robust, semi-supervised techniques for processing language in and across noisy domains. In 2011 he was also on a 6-month visit to the ISI during which he worked on orthographic normalization of non-standard Twitter text.
03 Jul 2012	Ashish Vaswani	Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm Time: 3:00 pm - 4:00 pm Location: 4th Floor Conference Room Abstract: Two decades after their invention, the IBM word-based translation models, widely available in the GIZA++ toolkit, remain the dominant approach to word alignment and an integral part of many statistical translation systems. Although many models have surpassed them in accuracy, none have supplanted them in practice.We propose a simple extension to the IBM models: an l0 prior to encourage sparsity in the word-to-word translation model. This extension has been implemented in GIZA++ and scales to large-scale data . We achieve significant improvements over IBM Model 4 in both word alignment and translation quality.This is a practice talk for ACL. Bio: Ashish Vaswani is a PhD student at ISI.
29 Jun 2012	Bevan Jones	Semantic Parsing with Bayesian Tree Transducers Time: 3:00 pm - 4:00 pm Location: 4th Floor Conference Room Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the literature on tree automata, which could both clarify the relationships between different approaches and increase the generality of new contributions. We attempt to clarify the relationship by presenting a tree transducer model that is closely related to previous work made without appealing to automata theory. We then describe a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results when coupled with our model while remaining applicable to any domain employing probabilistic tree transducers (not just semantic parsing).This is joint work with Mark Johnson and Sharon Goldwater to be presented at this year’s ACL Bio: I research computational models of language acquisition, exploring questions of how linguistic structure and meaning might interact during learning. For instance, I have worked on Bayesian models of unsupervised word segmentation, exploring how simultaneous word meaning acquisition influences the identification of lexical boundaries. Currently, I work on semantic parsing, using a combination of Bayesian techniques and automata theory to model more complex structural relationships between compositional meaning and syntactic structure. My PhD began at the department of Cognitive, Linguistic and Psychological Sciences at Brown University but has since moved to the School of Informatics at the University of Edinburgh and the Computing Department of Macquarie University.
22 Jun 2012	Vita Markman (Disney Interactive)	Discovering Latent Similarities in Car Models Based On Customer Reviews: Towards a Consumer-Driven Product Recommendation System Time: 3:00 pm - 4:00 pm Location: 11th Floor Conference Room [1135] Abstract: This pilot study explores the hypothesis that customer reviews of cars can be used to create and/or fine tune a recommendation system that offers a list of ranked top-N matches for a given vehicle. Our main premise is that positive or negative reviews invariably focus on the features relevant to the car being reviewed and hence can be used to uncover subtle similarities among various car models, as well as discover macro-types of cars (e.g. family cars, luxury, high performance sports etc). To discover similar models based on reviews we propose a Weighted Dice Coefficient which weighs each shared or non-shared word token by its tf-idf score. Closest top five cars are then discovered for each of the 226 reviewed car models. We also show that integrating tf-idf scores into the similarity metric improves the accuracy of the top five picks, as compared to the standard Dice Coefficient. Bio: I graduated from Rutgers in 2005 with a PhD in Linguistics. Having taught linguistics at Pomona College and Simon Fraser University between 2006 and 2008, I moved into industry in 2008. I currently work as a Computational Linguist at Disney Interactive Media Group. My work primarily concerns developing natural language processing techniques to ensure that the content of Disney's online chat is safe for kids. My work involves developing various NLP methods that filter online chat for inappropriate content, while taking into account the vast informality, sparsity, and noise of the on-line child chat language. In addition, I conduct independent research on Twitter data, specifically clustering one-line micro-tweets by topic. My additional research includes mining online car reviews to identify common car-types based on the features people rate as positive or negative.
25 May 2012	Liang Huang	Structured Perceptron with Inexact Search (NAACL HLT Practice Talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Most existing theory of structured prediction assumes exact inference, which is often intractable in many practical problems. This leads to the routine use of approximate inference such as beam search but there is not much theory behind it. Based on the structured perceptron, we propose a general framework of "violation-fixing" perceptrons for inexact search with a theoretical guarantee for convergence under new separability conditions. This framework subsumes and justifies the popular heuristic "early-update" for perceptron with beam search (Collins and Roark, 2004). We also propose several new update methods within this framework, among which the "max-violation" method dramatically reduces training time (by 3 fold as compared to early-update) on state-of-the-art part-of-speech tagging and incremental parsing systems.
18 May 2012	Dirk Hovy	Exploiting Partial Annotations with EM Training (NAACL HLT Practice Talk) Time: 3:30 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: For many NLP tasks, EM-trained HMMs are the common models. However, in order to escape local maxima and find the best model, we need to start with a good initial model. Researchers suggested repeated random restarts or constraints that guide the model evolution. Neither approach is ideal. Restarts are time-intensive, and most constraint-based approaches require serious re-engineering or external solvers. In this paper we measure the effectiveness of very limited initial constraints: specifically, annotations of a small number of words in the training data. We vary the amount and distribution of initial partial annotations, and compare the results to unsupervised and supervised approaches. We find that partial annotations improve accuracy and reduce the need for random restarts, which speeds up training time considerably.
18 May 2012	Jason Riesa	Automatic Parallel Fragment Extraction From Noisy Data (NAACL HLT Practice Talk) Time: 3:00 pm - 3:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: We present a novel method to detect parallel fragments within noisy parallel corpora. Isolating these parallel fragments from the noisy data in which they are contained frees us from noisy alignments and stray links that can severely constrain translation-rule extraction. We do this with existing machinery, making use of an existing word alignment model for this task. We evaluate the quality and utility of the extracted data on large-scale Chinese-English and Arabic-English translation tasks and show significant improvements over a state-of-the-art baseline.
03 May 2012	Dirk Hovy	Using Syntactic Information for Unsupervised Relation Extraction and Typing (Thesis Proposal Practice Talk) Time: 4:00 pm - 5:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Question Answering (QA) is a longstanding goal in Natural Language Processing (NLP). In its simplest form, QA relies on keyword matching to find single-word answers (e.g., search engines).But single words taken out of context are ambiguous -- only context disambiguates them. This meaningful context comes in the form of syntactic and/or semantic relations between predicates and arguments. Relations are thus at the core of meaning and information. Systems like Siri or Watson have put QA in more widespread use, and users move away from single-word questions to more complex ones. Finding and classifying relations to answer those questions will thus become the central challenge for future QA systems.The large number of relations makes relation extraction challenging; given a sentence, many possible relations can be extracted. If we can specify the relations we are interested in beforehand, we can annotate data to train supervised systems. Often though, definition beforehand is impossible, and we have to find all possible relations that hold in a text. In those cases, we must rely on unsupervised approaches.A second problem is rapid adaptation to new domains and topics. Relations extracted from one domain may not be relevant to another.A third problem is variation in the ways relations are expressed in text. Often, intervening words and phrases between predicates and arguments cause fixed-window pattern matching approaches to fail.Most previous relation extraction approaches have either relied on annotated data or (semi-) structured sources of information. These approaches require pre-defined relations and manually annotated data. Furthermore, many of these approaches rely on pattern matching over surface strings, which is not robust to variations. If previous approaches used unsupervised training methods, they largely focused on clustering, effectively ignoring sequential structure in the data.The future of QA will require us to quickly adapt to new domains and topics with little annotated data. Only if we can discover and disambiguate relations automatically can we build systems capable of open-ended QA.I present several techniques for discovering relations from text. I show how to use unsupervised sequential models to discover relations from raw text. These methods do not require any existing resources, manual annotation, or pre-defined relations, and can be applied to any domain. I use dependency parse structures as inputs to these methods, making these approaches more robust to surface variations. I show improvements over state-of-the-art systems as well as novel approaches to fully exploit the structure contained in the data.
27 Apr 2012	Christian Chiarcos (Uni Potsdam)	Towards operationalizable models of discourse phenomena: Addressing discourse relations Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The modeling of discourse has been a major topic of research in the linguistics and AI communities for decades. With respect to language, discourse phenomena refer to the use of linguistic indicators that reflect the functional organization of utterances, relationships between different utterances, with the interlocutors' state of mind and with the situational surrounding.The development of models of discourse that are operationalizable (as a part of NLP applications) is essential, for example, in machine translation:* to interpret, to translate and to generate pronouns, definite and indefinite NPs correctly,* to translate non-canonical constructions (e.g., passive),* to generate the correct word order (e.g., when translating into a free-word order language),* to insert or to drop discourse markers and conjunctions, or* to choose the appropriate type of syntactic embedding in complex sentences.In other branches of NLP, different aspects of discourse are important, e.g., relations between utterances (machine reading), the hierarchical organization of discourse (text summarization) and the sequential organization of utterances in a text (text structuring/natural language generation).Numerous models of different aspects of discourse have been proposed, including discourse structure (the hierarchical organization of utterances in discourse), discourse relations (relations between independent utterances in discourse), information structure (the functional structure of utterances in context), and information status (accessibility of antecedents of pronouns, definite descriptions and elliptic constructions). These approaches range from relatively abstract models from cognitive and functional linguistics (e.g., Givon 1983), over elaborate formal models developed in formal semantics (e.g., Asher 1993), to "parameterized", rule-based models in AI (e.g., Grosz et al. 1995).Since the mid-1990s, this traditional, "theory-centered" line of research has been complemented with an "annotation-centered" methodology, i.e., the development and the use of annotated corpora to test predictions and to develop statistical classifiers. In the first part of the talk, I describe selected activities of the applied computational linguistics group at the University of Potsdam/Germany in this direction, which include* the annotation of discourse structure, coreference, information structure and information status (Stede 2004, Krasavina and Chiarcos 2007, Ritz et al. 2008)* the development of generic multi-layer architectures capable to represent and to access these annotations along with other types of annotation applied to the same stretch of data (Chiarcos et al. 2008), e.g., annotations for constituent syntax, dependency syntax, or frame semantics, and* the application of machine learning techniques to predict discourse features from less abstract annotation layers (Ritz 2007, Chiarcos 2011).The primary drawback of annotation-centered models are the immense cognitive (and thus, financial) efforts necessary to produce reliable discourse annotations. One way to address this problem is to make use of corpora without discourse annotations to test predictions of candidate models, and to develop unsupervised or weakly supervised approaches to support or to replace manual annotation.In the second part of my talk, this "data-centered" approach on discourse will be illustrated for the example of discourse relations, one of the main topics of my work at ISI. I describe a pilot study that shows that significant, reproducible and interpretable insights about the discourse relation (that is likely to be) connecting a pair of events can be achieved from a sufficiently large corpus with syntax annotations only. Further, possible lines for subsequent research will be sketched.Nicholas Asher (1993). Reference to Abstract Objects in Discourse. Kluwer, Dordrecht, 1993.Christian Chiarcos (2011). Evaluating salience metrics for the context-adequate realization of discourse referents. In: Proceedings of the 13th European Workshop on Natural Language Generation (ENLG 2011). Association of Computational Linguistics, Nancy, France, Sep 2011, 32-43.Christian Chiarcos, Stefanie Dipper, Michael Gotze, Ulf Leser, Anke Lüdeling, Julia Ritz, and Manfred Stede (2008). A Flexible Framework for Integrating Annotations from Different Tools and Tagsets. TAL (Traitement automatique des langues) 49 (2): 218-248.Talmy Givon (ed., 1983). Topic Continuity in Discourse: A Quantitative Cross-Language Study. John Benjamins, Amsterdam and Philadelphia.Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein (1995). Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203–225.Olga Krasavina and Christian Chiarcos (2007). PoCoS - Potsdam Coreference Scheme. In Proceedings of the Linguistic Annotation Workshop. Held in Conjunction with the ACL-2007, Prague, Czech Republic, pages 156–163.Julia Ritz, Svetlana Petrova, Michael Götze, and Stefanie Dipper (2007). Automatic Identification of Information Structure in Small Corpora of Modern and Old High German. GLDV-Fruhjahrstagung 2007, Tubingen, Germany.Julia Ritz, Stefanie Dipper, und Michael Götze (2008). Annotation of Information Structure: An Evaluation Across Different Types of Texts. In Proceedings of the the 6th LREC conference. Marrakech, Morocco.Manfred Stede (2004). The Potsdam Commentary Corpus. In Bonnie Webber and Donna K. Byron, editors, Proceedings of the ACL-2004 Workshop on Discourse Annotation, Barcelona, pages 96–102.Biography:Christian Chiarcos, born 1977, studied Computer Science (MSc, 2002) and General Linguistics (MA, 2004) at the Technical University Berlin, Germany. From 2002 to 2003, he received a scholarship in the context of the project "Collocations in Dictionary" at the Berlin-Brandenburg Academy of Science under the auspicion of Christiane Fellbaum (Princeton). From 2003 to 2005, he participated in the graduate school "Economy and Complexity in Language" at the Humboldt-Unversity at Berlin and the University of Potsdam, Germany, where he developed a corpus-based approach to predict syntactic alternations for Natural Language Generation. This research formed the basis for his PhD thesis "Mental Salience and Grammatical Form" (University of Potsdam, 2010). Since 2006, he worked in the Applied Computational Linguistics group at the University of Potsdam, Germany, where he participated in different research projects dedicated to the development of interoperable infrastructures for NLP and multi-layer corpora. Since 2007, this research was carried out in the context of the Collaborative Research Center "Information Structure", a multidisciplinary network of projects at the University of Potsdam and the Humboldt-University Berlin, dedicated to the study of discourse phenomena.
16 Mar 2012	Jason Riesa	Syntactic Alignment Models for Large-Scale Translation (PhD Defense Practice Talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Word alignment, the process of inferring the implicit links between words across two languages, serves as an integral piece of the puzzle of learning linguistic translation knowledge. It enables us to acquire automatically from data the rules that govern the transformation of words, phrases, and syntactic structures from one language to another. Word alignment is used in many tasks in Natural Language Processing, such as bilingual dictionary induction, cross-lingual information retrieval, and distilling parallel text from within noisy data. In this talk, we focus on word alignment for statistical machine translation. We advance the state-of-the-art in search, modeling, and learning of alignments and show empirically that, when taken together, these contributions significantly improve the output quality of large-scale statistical machine translation, outperforming existing methods. The work we describe may be used for any language-pair, supporting arbitrary and overlapping features from varied sources.
17 Feb 2012	Adam Pauls (UC Berkeley)	Large Scale Syntactic Language Modeling with Treelets Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We propose a simple generative syntactic language model that conditions on overlapping tree contexts in the same way that n-gram language models condition on overlapping sentence context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a mater of hours. We evaluate on a range of grammaticality tasks, and find that we consistently outperform n-gram models and other generative baselines, and even compete with state-of-the-art discriminative models hand-designed for each task, despite training on positive data alone. We also show some improvements in preliminary machine translation experiments.
10 Feb 2012	Liang Huang	Efficient Search and Learning for Language Understanding and Translation Time: 3:00 pm - 4:00 pm Location: 6th Floor Large Conference Room [689] Abstract: What is in common between translating from English into Chinese andcompiling C++ into machine code? And yet what are the differences thatmake the former so much harder for computers? How can computers learnfrom human translators?This talk sketches an efficient (linear-time) "understanding +rewriting" paradigm for machine translation inspired by both humantranslators as well as compilers. In this paradigm, a source languagesentence is first parsed into a syntactic tree, which is thenrecursively converted into a target language sentence viatree-to-string rewriting rules. In both "understanding" and"rewriting" stages, this paradigm closely resembles the efficiency andincrementality of both human processing and compiling. We will discussthese two stages in turn.First, for the "understanding" part, we present a linear-timeapproximate dynamic programming algorithm for incremental parsing thatis as accurate as those much slower (cubic-time) chart parsers, whilebeing as fast as those fast but lossy greedy parsers, thus getting theadvantages of both worlds for the first time, achievingstate-of-the-art speed and accuracy. But how do we efficiently learnsuch a parsing model with approximate inference from huge amounts ofdata? We propose a general framework for structured prediction basedon the structured perceptron that is guaranteed to succeed withinexact search and works well in practice. Next, the "rewriting" stage translates these source-language parsetrees into the target language. But parsing errors from the previousstage adversely affect translation quality. An obvious solution is touse the top-k parses, rather than the 1-best tree, but this only helpsa little bit due to the limited scope of the k-best list. We insteadpropose a "forest-based approach", which translates a packed forestencoding exponentially many parses in a polynomial space by sharingcommon subtrees. Large-scale experiments showed very significantimprovements in terms of translation quality, which outperforms theleading systems in literature. Like the "understanding" part, thetranslation algorithm here is also linear-time and incremental, thusresembles human translation.
13 Jan 2012	Hercules Dalianis (Stockholm University)	Reusing clinical documentation for better health Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Today a large number of Electronic Patient Records (EPRs) are produced for legal reasons but they are very seldom reused, neither for clinical research nor for business (hospital) intelligence reasons. Moreover, the clinician's daily work in documenting the patient status is not always supported in a proper way. Hospital management needs key and real time information of the health care processes. Simultaneously, patients have become more demanding customers that want to be involved in their own health care process. We are aiming to support these demands.Clinical documentation forms an abundant source to extract valuable information that can be used for this purpose, however clinical corpora contain protected health information and must be kept in a safe way. Today only in Sweden (with a population of 10 million) 4-10 million pages of patient records are produced each year.We have studied the Stockholm EPR Corpus, a huge clinical document collection written in Swedish, containing over one million patient records. The document collection is distributed over 900 clinics from the Stockholm area encompassing three years 2006-2008. We have used this clinical corpus as a knowledge base to develop a set of tools that can work as basic building blocks for the future tools for health engineering. We have been assisted by physicians that have interpreted the content in the clinical text to us, they have annotated the clinical text and they have also set requirements on these tools together with their colleagues. We have identified four groups of users in the health domain: physicians, clinical researchers, hospital management and patients. We will show examples on these tools and the benefits they will give to health care.1) For physicians: Automatic ICD-10 assignment 2) For clinical researchers: Comorbidity networks 3) For hospital management: ICD-10 validation and adverse event detection, and finally 4) For patients: automatic text summarization.Brief Bio: Dr. Hercules Dalianis, Professor, born 20 July 1959Dalianis is a professor in Computer and Systems Sciences at Stockholm University. Dalianis received his Ph.D in 1996. Dalianis was a postdoc researcher at University of Southern California/ISI in Los Angeles in 1997. Dalianis was also postdoc researcher (forskarassistent) at KTH-Royal Institute of Technology in Stockholm, 1999-2003. Dalianis held a three year guest professorship at CST, University of Copenhagen during 2002-2005, founded by the Norfa, the Nordic council. Dalianis works in the interface between industry and university and with the aim to make research results useful for society. Dalianis has specialized in the area of human language technology, to make computers understand and process human language text, but also to make computers produce text automatically. Currently Dalianis is working in the area of clinical text mining with the aim to improve health care in form of better electronic patient record systems, presentation of the patient records and extraction of valuable information for clinical researchers as well as for the patients.
16 Dec 2011	Chris Dyer (Carnegie Mellon)	Generate-and-Test Models for Alignment and Machine Translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: I discuss translation as an optimization problem subject tothree kinds of constraints: lexical, configurational, and constraintsenforcing target-language wellformedness. Lexical constraints ensurethat the lexical choices in the output are meaning-preserving;configurational constraints ensure that the relationships betweensource words and phrases (e.g., semantic roles and modifier-headrelationships) are properly transformed in translation; andtarget-language wellformedness constraints ensure the grammaticalityof the output. In terms of the traditional source-channel model ofBrown et al. (1993), the "translation model" encodes lexical andconfigurational constraints and the "language model" encodes targetlanguage wellformedness constraints. On the other hand, theconstraint-based framework suggests a generate-and-test(discriminative) model of translation in which features sensitive toinput and output structures, and the feature weights are trained tomaximize the (conditional) likelihood of a corpus of exampletranslations. The specified features represent empirical hypothesesabout what variables correlate (but not why) and thus encodedomain-specific knowledge that is useful for the problem at hand; thelearned weights indicate to what extent these hypotheses are confirmedor refuted.To verify the usefulness of the feature-based approach, I discuss theperformance two models: first, a lexical translation model evaluatedby the word alignments it learns. Unlike previous unsupervisedalignment models, the new model utilizes features that capture diverselexical and alignment relationships, including morphologicalrelatedness, orthographic similarity, and conventional co-occurrencestatistics. Results from typologically diverse language pairsdemonstrate that the generate-and-test model provides substantialperformance benefits compared to state-of-the-art generativebaselines. Second, I discuss the results of an end-to-end translationmodel in which lexical, configurational, and wellformednessconstraints are modeled independently. Because of the independenceassumptions, the model is substantially more compact thanstate-of-the-art translation models, but still performs significantlybetter on languages where source-target word order differences aresubstantial. Bio: Chris Dyer is a postdoctoral researcher in Noah Smith's lab inthe Language Technologies Institute at Carnegie Mellon University. Hecompleted his PhD on statistical machine translation with PhilipResnik at the University of Maryland in 2010. Together with Jimmy Lin,he is author of "Data-Intensive Text Processing with MapReduce",published by Morgan & Claypool in 2010. Current research interestsinclude machine translation, unsupervised learning, Bayesiantechniques, and "big data" problems in NLP.
12 Dec 2011	Gael Dias (University of Caen Basse-Normandie, France)	Cross Domain Subjectivity Classification using Multi-View Learning Time: 4:00 pm - 5:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk, we will present our research on learning modelswith high cross domain accuracy for subjectivity classification. After asmall introduction about related works and challenges of sentimentanalysis, we will start by presenting new features for subjectivityanalysis. Then, we will present two different paradigms of multi-viewlearning strategies to learn transfer models: multi-view learning withagreement and guided multi-view learning. Then, we will present anexhaustive evaluation based on both paradigms including twostates-of-the-art algorithms and show that accuracy over 91% can beobtained using three views. In our concluding remarks, we will talkabout future extensions of the presented methodology. Then, we willbriefly present the Human Language Technology team of the GREYCLaboratory of the University of Caen Basse-Normandie (France) andpresent projects that are being studied ans further prospects.Biography: Gael Dias is full professor at the University of CaenBasse-Normandie (France). His research interests include unsupervisedmethodologies for text mining, information retrieval and textsummarization. His recent research focuses on Sentiment Analysis,Ontology Learning, Lexical Semantics, Web Personalization andCollaboration, Temporal Information Retrieval, and Paraphrase Extractionand Identification. He has served on program committees of internationalconferences and workshops such as ACL/HLT 2011, COLING 2010, IJCNLP/ACL2009, ACL 2007, HLT-NAACL 2007, COLING/ACL 2006 as well as is/was areviewer for Information Processing and Management, IEEE Transactions onAudio, Speech and Language Processing, Natural Language EngineeringJournal, Journal of Language Resources and Evaluation, Journal ofComputer Speech and Language and ACM Transactions on Speech andLanguage Processing.
04 Nov 2011	Ariya Rastrow (Johns Hopkins)	Going beyond n-grams: Incorporating non-local dependencies for Speech Recognition Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Due to the availability of large amounts of training data andcomputational resources, building more complex models with sentencelevel knowledge and longer dependencies has been an active area ofresearch in automatic speech recognition (ASR). Yet, due to thecomplexity of the speech recognition task, integration of many ofthese complex and sophisticated knowledge sources into the firstdecoding pass is not feasible. Many of these long-span models cannotbe represented as weighted finite-state automata (WFSA), making itdifficult even to incorporate them in a lattice rescoring pass.First, we motivate our work by providing compelling empirical evidencethat n-gram LMs are not sufficient for ASR task and why we need toincorporate non-local features such as syntax. The development oflanguage models with such long-span (non-local) features is underway,but is not addressed in this talk. We instead address how such modelsshould be trained discriminatively and applied effectively.Specifically, we describe a new approach for rescoring speech latticeswith such models (acoustic or language) that does not entailcomputationally intensive lattice expansion or limited rescoring ofonly an N -best list.We view the set of word-sequences in a lattice as a discrete space anddevelop a hill climbing technique to start with, say, the 1-besthypothesis under the lattice-generating model(s) and iterativelyimprove it using the new model. We demonstrate empirically that toachieve the same reduction in error rate using a better estimated,higher order LM, our technique evaluates fewer hypotheses thanconventional N-best rescoring by up to two orders of magnitude.We also propose to integrate the idea of hill climbing into thetraining of discriminative language models with non-local sentencelevel features. Discriminative models provide the flexibility toinclude both local n-gram features and arbitrary sentence levelfeatures. However, unlike generative LMs with long-span dependencieswhere one has to resort to N-best lists only during decoding(rescoring), discriminative models force the use of N-best lists evenfor LM training. We demonstrate significant computational saving during training as well as error-rate reduction over N-best training methods. Bio: Ariya Rastrow is a Ph.D. candidate at Johns Hopkins University,working with Sanjeev Khudanpur and Mark Dredze. He was initiallyadvised by Fred Jelinek. The focus of his PhD research is to advancespeech recognition systems to efficiently incorporate linguisticallymotivated non-local features into language models. In his recent work,he has developed an efficient hill-climbing algorithm to applynon-local complex models for the speech recognition task. He has alsoworked on out-of-vocabulary (OOV) detection, spoken term detection andsemi-supervised adaptation techniques for speech recognition.
07 Oct 2011	Ekaterina Ovchinnikova	Integration of World Knowledge for Natural Language Understanding Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Traditional inference-based natural language understanding (NLU) in acomputational framework suffered mainly from a lack of a sufficientlylarge knowledge base of commonsense knowledge. Recent advances havechanged this situation: A large amount of machine-readable knowledgeis now freely available to the community. This talk focuses onexploiting these developments to model large-scale NLU in aninference-based framework.The three main types of the existing knowledge sources arelexical-semantic dictionaries, distributional resources, andontologies. After comparing these types of resources and outliningtheir differences, I will present an integrative knowledge basecombining lexical-semantic, ontological, and distributional knowledgein a modular way.I will then talk about reasoning procedures able to make use of thelarge scale knowledge base. In particular, I will compare two mainforms of logical inferences applied to NLU: deduction and abduction. In the last part of the talk, I will present experiments on thefollowing knowledge-intensive NLU tasks: recognizing textualentailment, semantic role labeling, and paraphrasing of noun-noundependencies.
04 Oct 2011	Steve DeNeefe	Tree-adjoining Machine Translation (Ph.D. Defense Practice Talk) Time: 4:00 pm - 5:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Machine Translation (MT) is the task of translating a document from asource language (e.g., Chinese) into a target language (e.g., English)via computer. State-of-the-art statistical approaches to MT use largecollections of human-translated documents as training material,gathering statistics on the patterns of correspondence betweenlanguages according to the features specified by the translationmodel. Using this bilingual translation model in conjunction with atarget language model, created by gathering statistics from a largemonolingual corpus, a new document in the source language can beautomatically translated into its target-language equivalent withsurprising accuracy.Much MT research focuses on types of the patterns and features toinclude in a translation model.Recent statistical MT models have used syntax trees to enforcegrammaticality, but the currently popular tree substitution modelsonly memorize sequences of words or constituents, specifying exactlywhat phrases to use and exactly what trees are grammatical, which doesnot generalize well. Adding the operation of tree-adjoining providesthe freedom to splice additional information into an existinggrammatical tree. An adjoining translation model allows general,linguistically-motivated translation patterns to be learned withoutthe clutter of endless variations of optional material. Theappropriate modifiers, such as adjectives, adverbs, and prepositionalphrases, can be grafted into these core patterns as needed totranslate details. We show that the increased generalization powerprovided by adjoining, when used carefully, improves MT qualitywithout becoming computationally intractable. In this thesis, we describe challenges encountered by both word-sequence-basedand syntax-tree-based MT systems today, and present anin-depth, quantitative comparison of both models. Then we describe anovel model for statistical MT which addresses these challenges usinga synchronous tree-adjoining grammar. We introduce a method ofconverting these grammars to a weakly equivalent tree transducer fordecoding. Then we present a method for learning the rules andassociated probabilities of this grammar from aligned tree/stringtraining data, and empirically analyze important characteristics ofthe resulting model, considering and evaluating many variations.Finally, our results show that adjoining delivers a consistentimprovement over a baseline statistical syntax-based MT model on bothmedium and large-scale MT tasks using several language pairs.
30 Sep 2011	Dirk Hovy	Aligning Events and Time Stamps Time: 4:00 pm - 5:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Machine Reading relies to a large extent on information about entities and events. While the definition of events is controversial, most people agree that they have certain properties like a time and a place.We exploit this by trying to establish relations between events (such as ``bombing'' or ``election'') and temporal expressions that can be resolved to a timestamp, i.e., an expression like ``last Tuesday'' to an absolute value like 20110802.This enables a number of interesting applications, such as generation of absolute timelines, cross-document event coreference, and resolution of logical discrepancies. We define a baseline approach and improve upon it by identifying important subproblems (within-sentence vs. across-sentence), casting them as a relation extraction problem and showing that classification with kernel methods works well in capturing the information. Our results are competitive with previous approaches and reach a F-score of 76.6.We also show that resolution across sentences is a lot harder and cannot be approached with the same techniques used for the within-sentence. We outline some promising findings and suggest further research.
16 Sep 2011	Cerstin Mahlow (University of Zurich)	Linguistically supported editing and revising: concept and prototypical implementation based on interactive NLP resources Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Composing, revising, and editing are highly demanding tasks. Even in polishedand published texts from professional writers we can observe errors and mistakes.For many errors, we can infer how they came to be: Word processors offercharacter-based functions only. These functions do not take into accountelements and structures of the language the author is using. Authors are thusforced to translate their high-level goals into long and complex sequencesof low-level character-based functions. Both the translation process and theexecution of such sequences of functions are error-prone.However, in text editors for programmers ww find so-called language-awareediting functions. These functions operate on the elements and structures of aprogramming or mark-up language and help to avoid errors, as language-awarefunctions make revising and editing less tedious and error-prone.We argue that the concept of language awareness can be transferred to writingnatural language texts using word processors. We propose functions that take thestructures of natural languages into consideration. We distinguish informationfunctions, movement functions, and operations to support revising and editing.The design is based on current findings from writing research.Language-aware editing functions rely on the recognition and categorizationof relevant elements and structures with respect to a certain language. Weuse methods and resources from computational linguistics for morphologicalanalysis and generation, and for part-of-speech tagging. When evaluatingrespective resources we face a rather disappointing situation: NLP resourcesfor German are less suitable than assumed and less applicable for real-worldapplications than usually claimed in the literature. Our prototypical implementation of language-aware functions for revising andediting of German texts serves as a proof of concept. The implementationillustrates opportunities and limits of current NLP resources for German.
09 Sep 2011	Richard Socher (Stanford University)	Recursive Deep Learning in Natural Language Processing and Computer Vision Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Hierarchical and recursive structure is commonly found in differentmodalities, including natural language sentences and scene images. Iwill present some of our recent work on three recursive neural networkarchitectures that learn meaning representations for such hierarchicalstructure. These models obtain state-of-the-art performance on severallanguage and vision tasks.The meaning of phrases and sentences is determined by the meanings ofits words and the rules of compositionality. We introduce a recursiveneural network (RNN) for syntactic parsing which can learn vectorrepresentations that capture both syntactic and semantic informationof phrases and sentences. For instance, the phrases "declined tocomment" and "would not disclose" have similar representations.Since our RNN does not depend on specific assumptions for language, itcan also be used to find hierarchical structure in complex sceneimages. This algorithm obtains state-of-the-art performance forsemantic scene segmentation on the Stanford Background and the MSRCdatasets and outperforms Gist descriptors for scene classification by4%.The ability to identify sentiments about personal experiences,products, movies etc. is crucial to understand user generated contentin social networks, blogs or product reviews. The second architectureI will talk about is based on semi-supervised recursive autoencoders (RAE).RAEs learn vector representations for phrases sufficiently well as tooutperform other traditional supervised sentiment classification methodson several standard datasets.Lastly, I describe an alternative unsupervised RAE model that can learnfeatures which outperform previous approaches for paraphrasedetection on the Microsoft Research Paraphrase corpus.This talk presents joint work with Andrew Ng and Chris Manning. Bio: Richard Socher is a Computer Science PhD student at Stanford,co-advised by Chris Manning and Andrew Ng.Most recently, he won the Yahoo! Key Scientific Challenges ProgramAward and the Distinguished Application Paper Award at ICML, 2011for his work on recursive deep learning.
24 Aug 2011	Sravana Reddy	Cracking Running-Key Ciphers and Deciphering Speech (Interns Final Talk) Time: 2:30 pm - 3:00 pm Location: 4th Floor Large Conference Room [460] Abstract: In the first part of this talk, I will discuss our work on deciphering running-key ciphers, which are produced by encrypting the plaintext with a natural language string of the same length as the plaintext (the 'running key'). These ciphers are harder to crack than simple substitution ciphers, and no previous work has succeeded in decoding them. The second part of the talk will address the problem of speech recognition without access to word pronunciations or annotated training data. The problem's motivations arise from languages and domains where pronunciation lexicons and transcribed speech are not available. Given a representation of the speech as a sequence of phonemes, and a language model from non-parallel text, we present methods to find the sequence of words correspoding to the speech input.
24 Aug 2011	Xuchen Yao	Introducing context-dependent features into machine translation (Interns Final Talk) Time: 2:00 pm - 2:30 pm Location: 4th Floor Large Conference Room [460] Abstract: One fundamental assumption in machine translation is that sentences are translated independently of each other. We attack this assumption by trying to achieve lexical translation consistence among sentences within the same document. An additional lexicon reuse feature is introduced to help the decoder select a more consistent translation. In this talk we will discuss the design of the reuse feature and show experimental results.
19 Aug 2011	Stephen Tratz (PhD defense practice talk)	Semantically-Enriched Parsing for Natural Language Understanding Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This thesis details three contributions to the advancement ofsemantic-enriched parsing for English sentences: inventories of semanticrelations covering three semantically ambiguous linguistic phenomena,large datasets annotated according to the inventories, and, finally, asuite of tools for semantically-enriched parsing built using the data.For the purposes of this thesis, semantically-enriched parsing isdefined as the reconstruction of the underlying grammatical structure oftext along with shallow semantic annotation of semantically-ambiguousstructures. Ultimately, semantically-enriched parsing is one of the mostcritical steps in natural language understanding---the initial step inwhich the text is read by the machine into a knowledge representationfor further processing and reasoning.The first contribution of this thesis is to advance the theoreticalfoundations for the interpretation of three ambiguous linguisticphenomena in English that have significant overlap in terms of therelations expressed: noun compounds, possessive constructions, andprepositions. For these, I define inventories of relations based uponextensive annotation by myself, previous work by others, andinter-annotator agreement studies. In the case of prepositions, therelations are created by refining an existing resource whereas the othertwo are created from scratch. In addition to mappings to prior work,mappings are provided across the different inventories in order tocreate a unified set of relations.Second, I produce large datasets annotated according to theaforementioned sense inventories. Such data is vital for training mostautomatic tools and also provides exemplars for the theory embodied inthe inventories. Some of these datasets are created from scratch,including a collection of over 17,500 noun compounds and a collection ofover 21,900 possessive construction examples. In the case ofprepositions, an existing resource including over 24,000 annotatedexamples is refined.The final contribution is a suite of tools that can constructsemantically-enriched parse trees. The suite is designed to work in asequential, pipeline-like fashion and can be thought of as consisting oftwo subsections. The first part reconstructs the grammatical structureof the text using a dependency parser that extends the non-directionaleasy-first algorithm developed by Goldberg and Elhadad (2010) in orderto support non-projective trees and is trained using my improveddependency tree conversion of the Penn Treebank. Second is a semanticannotation module that adds shallow semantic annotation for nouncompounds, preposition senses, and possessives. Combined, these toolsproduce semantically-enriched parse trees that include both grammaticalstructure and shallow semantics. The core parser itself achievesstate-of-the-art accuracy and can process over 75 sentences per second,which is substantially faster than most of the accurate parsersavailable today.In conclusion, this thesis work provides significant contributions tocomputational linguistics, both in terms of theory and resources. Itadvances our understanding of the relations expressed by threesemantically-ambiguous linguistic phenomena, creates large annotateddatasets useful for machine learning, and produces a fast, accurate, andinformative system for semantically-enriched parsing.
17 Aug 2011	Licheng Fang	Structured Language Modelling for Machine Translation Time: 2:00 pm - 2:30 pm Location: 4th Floor Large Conference Room [460] Abstract: Machine translation can potentially benefit from the guidance of alanguage model that evaluates translation candidates based on syntacticstructures. In this talk we are going to describe the summer project tobuild such an incremental structured language model that can be used inmachine translation systems that generate the target language in aleft-to-right manner. We will describe in detail our work in modelling,search, and parameter smoothing.
05 Aug 2011	Dave Uthus	Overcoming Information Overload in Navy Chat Time: 3:00 pm - 4:00 pm Location: 4th Floor Large Conference Room [460] Abstract: In this talk, I will describe the research we are undertaking at the Naval Research Laboratory which revolves around chat (such as Internet Relay Chat) and the problems it causes in the military domain. Chat has become a primary means for command and control communications in the US Navy. Unfortunately, its popularity has contributed to the classic problem of information overload. For example, Navy watchstanders monitor multiple chat rooms while simultaneously performing their other monitoring duties (e.g., tactical situation screens and radio communications). Some researchers have proposed how automated techniques can help to alleviate these problems, but very little research has addressed this problem.I will give an overview of the three primary tasks that are the current focus of our research. The first is urgency detection, which involves detecting important chat messages within a dynamic chat stream. The second is summarization, which involves summarizing chat conversations and temporally summarizing sets of chat messages. The third is human-subject studies, which involves simulating a watchstander environment and testing whether our urgency detection and summarization ideas, along with 3D-audio cueing, can aid a watchstander in conducting their duties. Short Bio: David Uthus is a National Research Council Postdoctoral Fellow hosted at the Naval Research Laboratory, where he is currently undertaking research focusing on analyzing multiparticipant chat. He received his PhD (2010) and MSc (2006) from the University of Auckland in New Zealand and his BSc (2004) from the University of California, Davis. His research interests include microtext analysis, machine learning, metaheuristics, heuristic search, and sport scheduling.
15 Jul 2011	Markus Dreyer (SDL Language Weaver)	Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model (EMNLP 2011 practice talk) Time: 3:30 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We present an inference algorithm that organizes observedwords (tokens) into structured inflectional paradigms (types). Italso naturally predicts the spelling of unobserved forms that aremissing from these paradigms, and discovers inflectionalprinciples (grammar) that generalize to wholly unobserved words.Our Bayesian generative model of the data explicitly representstokens, types, inflections, paradigms, and locally conditionedstring edits. It assumes that inflected word tokens are generatedfrom an infinite mixture of inflectional paradigms (stringtuples). Each paradigm is sampled all at once from a graphicalmodel, whose potential functions are weighted infinite-statetransducers with language-specific parameters to be learned. Theseassumptions naturally lead to an elegant empirical Bayesinference procedure that exploits Monte Carlo EM, beliefpropagation, and dynamic programming. Given 50-100 seedparadigms, adding a 10-million-word corpus reduces predictionerror for morphological inflections by up to 10%. This is joint work with Jason Eisner, JHU.
15 Jul 2011	Jonathan May (SDL Language Weaver)	Tuning as Ranking (EMNLP 2011 practice talk) Time: 3:00 pm - 3:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: We offer a simple, effective, and scalable method for statisticalmachine translation parameter tuning based on the pairwise approach toranking. Unlike the popular MERT algorithm, our pairwise rankingoptimization (PRO) method is not limited to a handful of parametersand can easily handle systems with thousands of features. Moreover,unlike recent approaches built upon the MIRA algorithm of Crammer andSinger, PRO is easy to implement. It uses off-the-shelf linear binaryclassifier software and can be built on top of an existing MERTframework in a matter of hours. We establish PRO's scalability andeffectiveness by comparing it to MERT and MIRA and demonstrate parityon both phrase-based and syntax-based systems in a variety of languagepairs, using large scale data scenarios.
07 Jul 2011	Deniz Yuret (Koc University)	The Noisy Channel Model for Unsupervised Word Sense Disambiguation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We introduce a generative probabilistic model, the noisychannel model, for unsupervised word sense disambiguation. In ourmodel, each context C is modeled as a distinct channel through whichthe speaker intends to transmit a particular meaning S using apossibly ambiguous word W. To reconstruct the intended meaning thehearer uses the distribution of possible meanings in the given contextP(S\|C) and possible words that can express each meaning P(W\|S). Weassume P(W\|S) is independent of the context and estimate it usingWordNet sense frequencies. The main problem of unsupervised WSD isestimating context dependent P(S\|C) without access to any sense taggedtext. We show one way to solve this problem using a statisticallanguage model based on large amounts of untagged text. Our model usescoarse-grained semantic classes for S internally and we explore theeffect of using different levels of granularity on WSD performance.The system outputs fine grained senses for evaluation and itsperformance on noun disambiguation is better than most previouslyreported unsupervised systems and close to the best supervisedsystems.Short Bio: Deniz Yuret is an assistant professor in Computer Engineering at Koc University in Istanbul. Previously he was at the MIT AI Lab and laterco-founded Inquira, Inc. His research is on lexical semantics andunsupervised approaches to parsing and disambiguation. Currently he isone of the organizers of the SemEval3 semantic evaluation exercise,co-chair for the ACL 2011 semantics area, and an editor for theComputational Linguistics Journal.
28 Jun 2011	Suzy Howlett (Macquarie University)	Confidence in Syntax for Statistical Machine Translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Phrase-based statistical machine translation typically uses no syntactic information during translation, but while this information intuitively seems useful, including it has not necessarily helped translation performance. My PhD project is looking at this problem in the context of a syntactically-informed reordering preprocessing step prior to phrase-based translation. My work so far has shown that this preprocessing step does not necessarily improve performance when applied to every sentence; in my project I aim to develop a lattice-based system, armed with a number of syntax-based confidence features, that can choose on a sentence-by-sentence basis whether to use the reordering. In this presentation I will outline my progress so far, and welcome feedback and suggestions, particularly with respect to features to consider. Short Bio: Suzy Howlett is a PhD student at the Centre for Language Technology at Macquarie University, Australia, under the supervision of Mark Dras. She studied computer science and linguistics as an undergraduate at the University of Sydney, finishing in 2008 with an Honours year with James Curran, looking at automatically annotating additional training data for the C&C statistical CCG parser.
17 Jun 2011	Xuchen Yao	Nonparametric Bayesian Word Sense Induction (ACL practice talk) Time: 3:00 pm - 3:40 pm Location: 11th Floor Large Conference Room [1135] Abstract: We propose the use of a nonparametric Bayesian model, the Hierarchical Dirichlet Process (HDP), for the task of Word Sense Induction. Results areshown through comparison against Latent Dirichlet Allocation (LDA), a parametric Bayesian model employed by Brody and Lapata (2009) for this task.We find that the two models achieve similar levels of induction quality, while the HDP confers the advantage of automatically inducing a variable number of senses per word, as compared to manually fixing the number of senses a priori, as in LDA. This flexibility allows for the model to adapt to terms with greater or lesser polysemy, when evidenced by corpus distributional statistics.
17 Jun 2011	Sravana Reddy	Unsupervised Discovery of Rhyme Schemes (ACL practice talk) Time: 3:40 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We describe an unsupervised, language-independent model for finding rhyme schemes in poetry, using no prior knowledge about rhyme or pronunciation.
10 Jun 2011	Cartic Ramakrishnan	The Role of Information Extraction in the Design of a Document Triage Application for Biocuration Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Traditionally, automated triage of papers is performed using lexical (unigram, bigram, and sometimes trigram) features. This talkexplores the use of information extraction (IE) techniques to create richer linguistic features than traditional bag-of-words models. Ourclassifier includes lexico-syntactic patterns and more-complex features that represent a pattern coupled with its extracted noun,represented both as a lexical term and as a semantic category. Our experimental results show that the IE-based features can improve performance over unigram and bigram features alone. We present intrinsic evaluation results of full-text document classification experiments to determine automatically whether a paper should be considered of interest to biologists at the Mouse Genome Informatics (MGI) system at the Jackson Laboratories. We also further discuss issues relating to design and deployment of our classifiers as an application to support scientific knowledge curation at MGI.
27 May 2011	Shu Cai	Language-Independent Parsing with Empty Elements Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We present a simple, language-independent method for integrating recovery of empty elements into syntactic parsing. This method outperforms the best published method we are aware of on English and a recently published method on Chinese. This is a joint work with David Chiang and Yoav Goldberg
06 May 2011	Abe Kazemzadeh (USC)	Natural Language Descriptions of Emotions Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This proposal seeks to explain how humans describe emotions usingnatural language. Thefocus of the proposal is on words and phrases that refer to emotions,rather than the more general phenomena of emotional language. The mainproblem I address is that if natural language descriptions of emotionsrefer to abstract concepts that are local to a particular human (oragent), then how do these concepts vary from person to person and howcan shared meaning be established between people. The thesis of theproposal is that naturallanguage emotion descriptions are definite descriptions that refer totheoretical objects, which provide a logical framework for dealingwith this phenomenon in scientific experimentsand engineering solutions. An experiment, Emotion Twenty Questions(EMO20Q), was devised to study the social natural language behavior ofhumans, who must use descriptions of emotions to play the familiargame of twenty questions when the unknown word is an emotion.The idea of a theory based on natural language propositions isdeveloped and used to formalize the knowledge of a sign-using agent.Based on this pilot data, it was seen that approximately 25% of theemotion descriptions referred to emotions as objects with dimensionalattributes, similarity, or subsethood. This motivated the author touse interval type-2 fuzzy sets as a computational model for theconceptual meaning of emotion descriptions. This model introduces adefinition of a variable that ranges over emotions and allows for bothinter- and intra-subject variability. A second experiment usedinterval surveys and translation tasks to assess this model. Finally,the author proposes the use of spectral graph theory to representemotional knowledge as a network of proposition nodes that areconnected to emotion nodes based on data from EMO20Q. Short Bio: Abe Kazemzadeh is a PhD student at the USC Computer Science Dept and aresearch assistant at the the Signal Analysis and InterpretationLaboratory (SAIL). His interests include natural language, logic,emotions, games, and algebra.
29 Apr 2011	Marie-Catherine de Marneffe (Stanford University)	Computational models of utterance meaning Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Much of the meaning conveyed in language use goes beyond the literal meaning of the words. Suppose someone asks whether I want to go for lunch, and I reply: "I had a very large breakfast". The utterance does not convey only what it literally means, my interlocutor is probably going to infer that I am not hungry and do not want to go for lunch now. Computational systems today understand at most the literal meaning of human language utterances. I aim at capturing aspects of utterance meaning, the kind of information that a reader will reliably extract from an utterance within text.The first part of the talk concentrates on interpreting answers to yes/no questions which do not straightforwardly convey a 'yes' or 'no' answer. I focus on questions involving scalar modifiers (Was it acceptable? It was unprecedented.) and numerical answers (Are you kids little? I have a 10 year-old and a 7 year-old.). I exploit the availability of large amount of text to learn meanings from words and sentences in real context. I show that we can ground scalar modifier meaning based on large unstructured databases, and that such meanings can drive pragmatic inference.The second part of the talk targets veridicality -- whether a speaker intends to convey that the events described are actual, non-actual or uncertain -- which is central to language understanding, but little used in relation and event extraction systems. What do people infer from a sentence such as FBI agents alleged in court documents today that Zazi had admitted receiving weapons and explosives training from al Qaeda operatives? Did Zazi received weapons and explosives training? I show that not only lexical semantic properties but context and world knowledge shape veridicality judgments. Since such judgments are not always categorical, I suggest they should be modeled as distributions, and propose a classifier to do so. The classifier features provide a nuanced picture of the diverse factors that affect veridicality. Short Bio: Marie-Catherine de Marneffe is a fifth-year PhD student in Linguistics at Stanford University. Prior to herdoctoral studies, she visited the Stanford NLP research group for 2 years, working with Christopher D. Manning.In 2000, she received her master degree in Classical Languages, and a master in Computer Science in 2002,both from the Université catholique de Louvain (Belgium). Her work in computational semantics focuses onon detecting entailment and contradiction in texts, grounding meaning from large unstructured databases, and assessing the information status of events from a reader's perspective. She is also interested in language acquisition, studying howchildren acquire verb forms in French.
22 Apr 2011	Dirk Hovy	Models and Training for Unsupervised Preposition Sense Disambiguation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We present a preliminary study on unsupervised preposition sense disambiguation (PSD), comparing different models and training techniques (EM, MAP-EM with L0 norm, Bayesian inference using Gibbs sampling). To our knowledge, this is the ﬁrst attempt at unsupervised preposition sense disambiguation. Ultimately, we want to disambiguate prepositions not by and for themselves, but in the context of sequential semantic labeling. This should also improve disambiguation of the words linked by the prepositions (here, morning, shopped, and Rome). We propose using unsupervised methods in order to leverage unlabeled data, since, to our knowledge, there are no annotated data sets. Our best accuracy for PSD reaches 56%, a signiﬁcant improvement (at p < .001) of 16% over the most-frequent-sense baseline. This is a joint work with Ashish Vaswani, Stephen Tratz, David Chiang, and Eduard Hovy
15 Apr 2011	Thomas Schoenemann	Computing Viterbi Alignments via Integer Linear Programming Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This talk is about an optimization problem that was shown to beNP-hard: computing optimal alignments for the IBM-3 translationmodel. I will show that in practice it can be solved quite efficientlyvia Integer Linear Programming. In addition to using a standard solverI will also show problem-specific preprocessing techniques: byderiving upper and lower bounds, a large number of variables can beremoved from the start.Short Bio: Thomas Schoenemann was born and grew up in Germany. He studiedComputer Science at RWTH Aachen, Germany, where he got a diploma in2005, having written his diploma thesis on the topic of confidencemeasures in machine translation in the group of HermannNey. Afterwards he went to the University of Bonn, Germany, to do hisPh.D. thesis in computer vision in the years 2006-2008. Up to a monthago he was a postdoc in the vision group at Lund University, Sweden,where he also resumed his work on translation. Currently he is takinga time off to explore other fields and broaden his scope.
18 Mar 2011	Sujith Ravi (PhD defense practice talk)	Deciphering Natural Language Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Most state-of-the-art techniques used in natural language processing (NLP) are supervised and require labeled training data. For example, statistical language translation requires huge amounts of bilingual data for training translation systems. But such data does not exist for all language pairs and domains. Using human annotation to create new bilingual resources is not a scalable solution. This raises a key research challenge: How can we circumvent the problem of limited labeled resources for NLP applications? Interestingly, cryptanalysts and archaeologists have tackled similar challenges in solving "decipherment problems". This thesis work aims to bring together techniques from classical cryptography, NLP and machine learning. We introduce a novel approach called "natural language decipherment" that can solve natural language problems without labeled (parallel) data. In this talk, we show how a wide variety of NLP problems can be formulated as decipherment tasks---for example, in statistical language translation one can view the foreign-language text as a cipher for English. Instead of relying on parallel training data, decipherment uses knowledge of the target language (e.g., English) and large quantities of readily available monolingual source (cipher) data to induce bilingual connections between the source and target languages. Using decipherment techniques, we make headway in attacking a hierarchy of problems ranging from letter substitution decipherment to sequence labeling problems (such as part-of-speech tagging) to language translation. Along the way, we make several key contributions---novel unsupervised algorithms that search for minimized models during decipherment and achieve state-of-the-art results on a number of important natural language tasks. Unlike conventional approaches, these decipherment methods can be easily extended to multiple domains and languages (especially resource-poor languages), thereby helping to spread the impact and benefits of NLP research.
11 Mar 2011	Cosmin Adrian Bejan (ICT)	Nonparametric Bayesian Models for Clustering Feature-Rich Linguistic Objects Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk, I will present how a new class of unsupervised,nonparametric Bayesian models can be effectively applied to solve real data applications that involve clustering feature-rich linguistic objects.First, I will describe a generalization of the hierarchical Dirichletprocess model to account for additional properties associated withobservable objects. In addition, to overcome some of the limitations ofthis new model, I will then describe a new hybrid model which combinesan infinite latent class model with a discrete time series model. Themain advantages of this hybrid model are the abilities to representa potentially infinite number of features associated with observableobjects and to perform an automatic selection of the most salientfeatures. Furthermore, all the models described in this talk aredesigned to account for a potential number of categorical outcomes.The evaluation performed for solving both within- and cross documentevent coreference shows significant improvements of the models whencompared against three baselines for this task.Short Bio: Cosmin Adrian Bejan is a postdoctoral researcher at the USC Institutefor Creative Technologies, where he is currently working on applicationsthat involve extraction and analysis of commonsense knowledge from largecollections of text documents. His research interests include eventsemantics, semantic parsing, commonsense causal reasoning, unsupervised learning, and nonparametric Bayesian methods.
04 Mar 2011	Steve DeNeefe (practice job talk)	Tree Adjoining Machine Translation Time: 4:30 pm - 5:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Tree adjoining grammars (TAGs) have greater linguistic expressiveness than the tree substitution grammars used in many natural language tasks, but are typically considered too complex or computationally expensive for practical systems. Many current statistical machine translation (MT) models use tree substitution to memorize sequences of words or constituents, specifying exactly what phrases to use or exactly what trees are grammatical. Adding the operation of tree adjoining provides the freedom to splice additional information into an existing grammatical tree. An adjoining translation model allows general, linguistically-motivated translation patterns to be learned without the clutter of endless variations of optional material. The appropriate modifiers, such as adjectives, adverbs, and prepositional phrases, can later be grafted in as needed to translate details. We show that the increased generalization power provided by adjoining, when used carefully, improves MT quality without becoming computationally intractable.In this talk, we describe challenges encountered by phrase-based and syntax-based MT systems today, and present an in-depth, quantitative comparison of both models. Then, we describe a novel model for statistical MT which addresses these challenges using a synchronous tree adjoining grammar. We introduce a method of converting these grammars to a weakly equivalent tree transducer for decoding. Then we present a method for learning the rules and associated probabilities of this grammar from aligned tree/string training data.Finally, our results show that adjoining delivers a consistent improvement over a baseline statistical syntax-based MT model on both medium and large-scale MT tasks using several language pairs.
03 Mar 2011	Christopher Thomas (Wright State University)	What Goes Around Comes Around -- Improving the State of Knowledge on the Web through On-Demand Model Creation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Information extraction is concerned with the retrieval of structured information from unstructured sources. Knowledge extraction/acquisition will need to go a step further by testing whether the extracted information is actually true. Since none of the extraction systems in current use can guarantee a perfect precision, it is necessary to incorporate manual verification steps into the information extraction pipeline in order to use extracted facts in further reasoning. My talk will present a framework that adopts a cyclic approach to advancing the state of factual knowledge within a system, taking advantage of available formal/structured knowledge sources, information extraction and human/social computing to verify the extracted information. For the fact extraction part, the system uses LoD as training data, a domain hierarchy extractor to delineate domain boundaries and non-NLP surface-pattern-based open IE techniques to connect concepts within the hierarchy. To combat the low recall that most IE approaches face, the system deploys generalization techniques and pertinence computation to increase the number of patterns. Verification is done by means of information use under the assumption that correct information will be utilized more often than incorrect one. Bio: Christopher Thomas is a PhD candidate in the Kno.e.sis Center at Wright State University. His research is in epistemological aspects of Computer Science and Artificial intelligence, namely knowledge extraction, representation, verification and dissemination. To build a coherent framework for this kind of systems epistemology, his publications span technical work on ontology design, ontology learning, information quality and information extraction as well as conceptual work on knowledge representation and social computing methods for knowledge verification.
17 Feb 2011	Alan Ritter (University of Washington)	Status Messages: A Unique Textual Source of Realtime and Social Information Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Recently there has been an explosion in the number of users posting short status messages on Social Media websites such as Facebook and Twitter. Although noisy and informal, this new style of text represents a valuable source of information not available elsewhere: it provides the most up-to-date information on current events, in addition to a massive publicly available corpus of naturally occurring human conversations. In this talk I will present ongoing work which explores both of these aspects.First, I will describe efforts towards Information Extraction from status messages. Because statuses can be posted quickly and are widely disseminated, they often provide the most up-to-date source of information on current events around the world and locally. This dynamically changing source of realtime information is already being processed using keyword extraction techniques, for example the "trends" displayed on Twitter's website provide a list of phrases which are frequent in the current stream of messages. In order to move beyond a flat list of phrases, we have been investigating the feasibility of applying Information Extraction techniques to produce more structured representations of events. A key challenge is the noisy nature of this data; unlike newswire, or biomedical text, status messages contain frequent misspellings and abbreviations, inconsistent capitalization, unique grammar, etc... To deal with these issues, we have been annotating a corpus of Twitter Posts with POS tags and Named Entities, then using these annotations to train Twitter-specific NLP tools. As a demonstration of their utility, the resulting tools are combined to produce a calendar of popular events occurring in the future.In addition, I will discuss work which exploits a corpus of roughly 1.3 million naturally occurring conversations collected from Twitter for building models of human conversation. Three data-driven approaches to generating responses to Twitter status posts are considered, based on either information retrieval or phrase-based statistical machine translation. Although there are many challenges to overcome in adapting phrase-based SMT to dialogue, we show that it is a promising approach to this problem. We compare these approaches in a human evaluation, using annotators from Amazon's Mechanical Turk service. Furthermore, we measure agreement between human evaluators and the BLEU automatic MT evaluation metric. As far as we are aware, this is the first work to investigate the application of phrase-based SMT to dialogue generation. Short Bio: Alan Ritter is a graduate student at the University of Washington advised by Oren Etzioni. His research interests are in Information Extraction, Computational Lexical Semantics, and Language Processing in Social Media.
14 Feb 2011	Hagen Fuerstenau (University of Saarland)	Learning Structured Semantics under Weak Supervision Time: 11:00 am - 12:00 pm Location: 4th Floor Large Conference Room [460] Abstract: In this talk I will present recent work on two topics: syntactically structured representations of word meaning in context and semi-supervised semantic role labeling. These will be presented as two instances of a general theme: acquiring structured meaning representations with little or no manual annotation.Vector space models have become a standard way of representing word meaning that can be learned in an unsupervised way. The problem of polysemy, however, has only recently been addressed within this framework. Several approaches to derive vector representations of words in specific sentential contexts have been proposed. I will present recent work on extending such contextualization operations to vector models incorporating rich syntactic structure, achieving significant improvements in context-dependent lexical substitution tasks.Going beyond the meaning of single words, I will then turn to work on semantic role labeling. Here, a key obstacle is the annotation effort required for the training of high quality role labeling systems. I will present a semi-supervised approach to semantic role labeling, based on generalizing semantic annotations from manually labeled seed sentences to unlabeled sentences via structural alignments, yielding significant improvements in role labeling performance.I will conclude my talk with an outlook onto how the search for adequate models of semantics may profit from formulation in task-specific ways. In particular, I will sketch some ideas on structured semantic models for statistical machine translation. Bio: Hagen Fürstenau is a researcher at Saarland University, Germany. Hereceived an M.Sc. in Mathematics from Bonn University and is about tofinish his Ph.D. in Computational Linguistics. His research interestsinclude data-driven methods in computational semantics and weaklysupervised machine learning.
11 Feb 2011	Hui Zhang	Joint Word Alignment and Synchronous Grammar Induction Time: 3:00 pm - 4:00 pm Location: 4th Floor Large Conference Room [460] Abstract: Synchronous grammars have been shown to be effective as models of translation, and the performance of such systems depends heavily on the quality of the grammar induced from the training data. The standard method for induction of synchronous grammars uses automatic word alignments to constrain possible derivations, which makes them prey to alignment errors. In this work, we propose a method for joint word alignment and grammar induction. Our experiments show that our method significantly outperforms the standard method, while reducing the size of the grammar by more than half.
04 Feb 2011	Stephan Gouws (Stellenbosch University)	Measuring Conceptual Similarity by Spreading Activation over Wikipedia's Hyperlink Graph Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The World Wide Web brought with it an unprecedented level ofinformation overload. Computers are very effective at processing andclustering numerical and binary data, however, the automated conceptualclustering of natural-language data is considerably harder to automate.Many techniques rely on relatively simple keyword-matching techniques orprobabilistic methods to measure semantic relatedness between words anddocuments. However, these approaches do not always accurately captureconceptual relatedness as measured by humans.In this talk I'll briefly discuss a novel use of spreading activation(SA) techniques (primarily from cognitive science) for computingsemantic relatedness between words and/or documents. This is done bymodelling the article hyperlink structure of Wikipedia as an associativenetwork structure for knowledge representation. The SA technique isadapted and several problems are addressed for it to function over thederived Wikipedia hyperlink graph. We evaluate these approaches overstandard document similarity datasets and by user evaluationexperiments, and achieve results which compare favourably with state ofthe art methods. By making use of the collaboratively-created resource Wikipedia, wehereby also overcome a significant problem in making use of spreadingactivation based techniques for information retrieval up to now, asnoted by Crestani (1997): "The problem of building a network whicheffectively represents the useful relations [between concepts] hasalways been the critical point of many of the attempts to use SA in IR.These networks are very difficult to build, to maintain and keep up todate.
28 Jan 2011	Markus Dreyer (SDL Language Weaver, formerly @ Johns Hopkins)	A Non-Parametric Model for the Discovery of Inflectional Paradigms from Plain Text using Graphical Models over Strings Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Statistical natural language processing can be difficult formorphologically rich languages. The observed vocabularies of suchlanguages are very large, since each word may have been inflected formorphological properties like person, number, gender, tense, orothers. This unfortunately masks important generalizations, leads toproblems with data sparseness and makes it hard to generate correctlyinflected text.The presented dissertation work tackles the problem of inflectional morphology with a novel, unified statistical approach. We present a generative probabilitymodel that can be used to learn from plain text how the words of alanguage are inflected, given some minimal supervision. In otherwords, we discover the inflectional paradigms that are implicit, orhidden, in a large unannotated text corpus.This model consists of several components: a hierarchical Dirichletprocess clusters word tokens of the corpus into lexemes and theirinflections, and graphical models over strings -- a novelgraphical-model variant -- model the interactions of multiplemorphologically related type spellings, using weighted finite-statetransducers as potential functions.We present the components of this model, from (1) weightedfinite-state transducers parameterized as log-linear models, to (2)graphical models over multiple strings, to (3) the final Bayesiannon-parametric model over a corpus, its lexemes, inflections, andparadigms. These three components of the model correspond to thecombined use of (1) dynamic programming, (2) belief propagation and(3) MCMC for inference.We show experimental results for several tasks along the way,including a lemmatization task in multiple languages and, todemonstrate that parts of our model are applicable outside ofmorphology as well, a transliteration task. Finally, we show thatlearning from large unannotated text corpora under our non-parametricmodel significantly improves the quality of predicted wordinflections.
14 Jan 2011	Donald Metzler	Relevance and Ranking in Online Dating Systems Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Match-making systems refer to systems where users want to meet other individuals to satisfy some underlying need. Examples of match-making systems include dating services, resume/job bulletin boards, community based question answering, and consumer-to-consumer marketplaces. One fundamental component of a match-making system is the retrieval and ranking of candidate matches for a given user.We present the first in-depth study of information retrieval approaches applied to match-making systems. Specifically, we focus on retrieval for a dating service. This domain offers several unique problems not found in traditional information retrieval tasks. These include two-sided relevance, very subjective relevance, extremely few relevant matches, and structured queries. We propose a machine learned ranking function that makes use of features extracted from the uniquely rich user profiles that consist of both structured and unstructured attributes. An extensive evaluation carried out using data gathered from a real online dating service shows the benefits of our proposed methodology with respect to traditional match-making baseline systems. Our analysis also provides deep insights into the aspects of match-making that are particularly important for producing highly relevant matches.
15 Nov 2010	Jason Riesa	Structured Models for Bilingual Alignment (Ph.D. Proposal practice talk) Time: 4:00 pm - 5:00 pm Location: 4th Floor Conference Room [460] Abstract: Bilingual alignment serves as an integral step and the foundation inthe building of any state-of-the-art statistical machine translationsystem. It enables us to automatically learn and extract translationrules from hundreds of millions of words of bilingual text.Twenty years ago, the research area of machine translation wasbeginning to make use of the increasing availability and speed ofcomputing resources demanded by the ideas of a previous generation,notably Weaver (1949). The IBM translation models -- statisticalmodels for automatic word-to-word translation (Brown et al., 1990;Brown et al., 1993) - spurred a flurry of new statistical andempirical research in this area. They have become ubiquitous in thefield and are easy to train in an unsupervised fashion; Al-Onaizan etal. (1999) and Och and Ney (2003) have given us open-source toolkitsfor this purpose. However, there are many problems that still exist. The work presentedin this thesis proposal will eliminate many of the problems withalignment systems that have persisted for two decades, significantly improving machine translationquality and decidedly advancing the state-of-the-art. In achievingthis goal, we develop new models of bilingual alignment and efficientsearch algorithms for working with such models.
12 Nov 2010	Stephen Tratz	Semantically-enriched Parsing for Natural Language Understanding (Ph.D. Proposal practice talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Natural language is riddled with many ambiguities, greatly complicatingnatural language processing tasks. Current parsers reconstruct thesyntax of sentences without addressing the numerous ambiguities oflanguage. This talk discusses a proposed solution forsemantically-enriched parsing that consists of ontological resources,datasets, and tools that can be used to produce more informative parsesof English sentences. The resulting parses consist not only of syntacticstructure, but also semantic interpretations for noun compounds,preposition senses, and possessive constructions.
07 Oct 2010	Anselmo Penas	Toward a Reading Machine Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Machine Reading (MR) aims at bridging the gap between texts and a formal representation that a reasoning system can use to make inferences about the text. In the MR Program (MRP), the target ontology is given and the inferences are oriented to answer queries about a set of textual documents. Traditionally, this setting is approached by Information Extraction engines that use annotated texts to learn the mapping between the text and the entity classes and relations of the target ontology. However, in the current MRP setting, almost no annotated data is given, and the systems are expected to adapt to a new domain in a very short time. This setting introduces the need to develop new architectures able to learn from previous readings (of unannotated texts) and to leverage as much as possible the small amount of annotated data. The talk will report the current development of a system with these features.
05 Oct 2010	Eduard Hovy	Toward a Computational Theory of Semantic Content Time: 4:00 pm - 5:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Semantics has been the object of deep study for many years. Yet representation of content—the actual meaning of the symbols used in semantic propositions—is curiously absent from most of this work. This talk argues that this is so because the most useful way of conceptualizing content is not in the form of symbols but as statistical word(sense) distributions, suitably organized. Over the past few years, NLP research has increasingly treated topic signature word distributions (also called 'context vectors', 'topic models', 'language models', etc.) as a de facto replacement for semantics at various levels of granularity. Whether the task is wordsense disambiguation, certain forms of textual entailment, information extraction, paraphrase learning, and so on, it turns out to be very useful to consider a semantic unit as being defined by the distribution of word(senses) that regularly accompany it (in the classic words of Firth, "you shall know a word by the company it keeps"). This is true for semantic units of all sizes, from individual word(sense)s to sentences to text collections; the information learned and used by WSD engines closely resembles that learned by LDA and similar topic characterization engines.In this talk I argue for a new kind of semantics, which is combines traditional symbolic logic-based proposition-style semantics of the kind used in older NLP with (computation-based) statistical word distribution information (what is being called Distributional Semantics in modern NLP). The core resource is a single lexico-semantic 'lexicon' that can be used for a variety of tasks provided it is reformulated appropriately. I show how to define such a lexicon, how to build and format it, and how to use it for various tasks. The talk pulls together a wide range of related topics, including Pantel-style resources like DIRT, inferences / expectations such as those used in Schank-style expectation-based parsing and expectation-driven NLU, PropBank-style word valence lexical items, and the treatment of negation and modalities. Combining the two views of semantics seems promising but opens many questions that need study, including the operation of logical operators such as negation and modalities over word(sense) distributions, the nature of ontological facets required to define concepts, and the action of compositionality over statistical concepts.
01 Oct 2010	Liang Huang and Haitao Mi	Efficient Incremental Decoding for Tree-to-String Translation (EMNLP 2010 Practice Talk) Time: 3:30 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in average case polynomial-time in theory, and linear-time with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).
01 Oct 2010	Erica Greene	Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation (EMNLP 2010 Practice Talk) Time: 3:00 pm - 3:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: We employ statistical methods to analyze, generate, and translate rhythmic poetry. We first apply unsupervised learning to reveal word-stress patterns in a corpus of raw poetry. We then use these word-stress patterns, in addition to rhyme and discourse models, to generate English love poetry. Finally, we translate Italian poetry into English, choosing target realizations that conform to desired rhythmic patterns.
27 Aug 2010	Sasha Rush	Intern Final Talk: Large-scale, High-dimensional, Discriminative Machine Translation Time: 3:00 pm - 3:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: This talk summarizes my summer work on scaling a machine translationsystem to train on a large data set. Similar system are tuned withMERT on 1k sentences, we train a CRF on 100k sentences. I will discusstechniques for training, features, distributed scaling,regularization, and tuning, and give preliminary results.
27 Aug 2010	Yoav Goldberg	Intern Final Talk: Small is beautiful. Is it any good? Time: 3:30 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This talk summarizes our experience with searching for small modelsfor syntax-based machine translation. I will first present casessuggesting that smaller models are desirable, and present someevidence that minimizing model size is a reasonable objectivefunction. I will then show cases where this objective may be tooaggressive.
25 Aug 2010	Sravana Reddy	Intern Final Talk: Towards deciphering the Voynich manuscript Time: 2:30 pm - 3:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The Voynich manuscript is a medieval illustrated book writtenin an undeciphered script. I will present some questions and answersabout the linguistic and statistical properties of the text.
25 Aug 2010	Anni Irvine	Intern Final Talk: Making Discriminative Alignment Smarter Time: 2:00 pm - 2:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Error analysis on grammars extracted for Machine Translation showsthat bad and useless translation rules are usually caused by badalignments. In this work, we improve previous work on hierarchicaldiscriminative alignment by incorporating knowledge of foreign sideparse trees, output from other aligners, and a look-ahead to grammarextraction. We give examples and results on Chinese to Englishtranslation.
06 Aug 2010	Sasha Rush (MIT)	Dual Decomposition for Natural Language Inference Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This talk presents dual decomposition as a general technique for NLP.The first part introduces dual decomposition as a framework forderiving inference algorithms for NLP problems. The approach relies onstandard dynamic-programming algorithms as oracle solvers forsub-problems, together with a simple method for forcing agreementbetween the different oracles. The approach provably solves a linearprogramming (LP) relaxation of the global inference problem. It leadsto algorithms that are simple, in that they use existing decodingalgorithms; efficient, in that they avoid exact algorithms for thefull model; and often exact, in that empirically they often recoverthe correct solution in spite of using an LP relaxation.The second part presents an application of dual decomposition tonon-projective parsing . We focus on parsing algorithms fornon-projective head automata, a generalization of head-automata modelsto non-projective structures. The dual decomposition algorithms aresimple and efficient, relying on standard dynamic programming andminimum spanning tree algorithms. They provably solve an LP relaxationof the non-projective parsing problem. Empirically the LP relaxationis very often tight: for many languages, exact solutions are achievedon over 98% of test sentences.The accuracy of our models is higherthan previous work on a broad range of datasets.
30 Jul 2010	William Yang Wang (Columbia)	Automatic Vandalism Detection in Wikipedia (COLING 2010 Practice Talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Discriminating vandalism edits from non-vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in deep linguistic analysis. In this talk, I will discuss a novel Web-based syntactic-semantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect vandalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia vandalism detection systems. This is a joint work with Prof. Kathleen McKeown at Columbia University and will appear in the oral session at COLING 2010. Bio: William Yang Wang is a graduate student at Columbia University, and heis currently visiting the NL Dialog Group at USC/ICT, working on phoneticallyaware natural language understanding and speech synthesis. In 2008-2009, he waswith the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences.
26 Jul 2010	Hoifung Poon (University of Washington)	Statistical Relational Learning for Knowledge Extraction from the Web Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Extracting knowledge from unstructured text has been a long-standinggoal of NLP. The advent of the Web further increases its urgency bymaking available billions of online documents. To represent theacquired knowledge that is complex and heterogeneous, we needfirst-order logic. To handle the inherent uncertainty and ambiguity inextracting and reasoning with knowledge, we need probability.Combining the two has led to rapid progress in the emerging field ofstatistical relational learning. In this talk, I will show thatstatistical relational learning offers promising solutions forconquering the knowledge-extraction quest. I will present Markovlogic, which is the leading unifying framework for representing andreasoning with complex and uncertain knowledge, and has spawned anumber of successful applications for knowledge extraction from theWeb. In particular, I will present OntoUSP, an end-to-end knowledgeextraction system that can read text and answer questions. OntoUSP iscompletely unsupervised and benefits from jointly conducting ontologyinduction, population, and knowledge extraction. Experiments show thatOntoUSP extracted five times as many correct answers compared tostate-of-the-art systems, with a precision of 91%.
23 Jul 2010	Yoav Goldberg (Ben Gurion), Sravana Reddy (Chicago), and Kevin Knight	Three Mini-Talks on Creative Language Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Analyzing and generating creative language (stories, poems, jokes, etc) is a growing field within computational linguistics. We will give three short talks on the topic -- Yoav on Haiku generation, Sravana on understanding eggcorns, and Kevin on poetry translation.
07 Jul 2010	Kenji Sagae	Dynamic Programming for Linear-time Incremental Parsing (ACL 2010 Practice Talk) Time: 3:30 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging "equivalent" stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce dependency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
02 Jul 2010	Zornitsa Kozareva	Learning Arguments and Supertypes of Semantic Relations using Recursive Patterns (ACL 2010 Practice Talk) Time: 3:30 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: A challenging problem in open information extraction and text mining is the learning of the selectional restrictions of semantic relations. We propose a minimally supervised bootstrapping algorithm that uses a single seed and a recursive lexico-syntactic pattern to learn the arguments and the supertypes of a diverse set of semantic relations from the Web. We evaluate the performance of our algorithm on multiple semantic relations expressed using "verb", "noun" and "verb prep" lexico-syntactic patterns. We embark on human based evaluation to assess the quality of the harvested information and find out that the overall accuracy of our algorithm is 90%. We also compare our results with existing knowledge base outlining the similarity and differences of the granularity and diversity of the harvested knowledge.
02 Jul 2010	Ashish Vaswani	An MDL-Inspired Objective Function for Unsupervised Training of Generative Models (ACL 2010 Practice Talk) Time: 3:00 pm - 3:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: The Minimum Description Length (MDL) principle is a method for modelselection that trades off between the explanation of the data by the modeland the complexity of the model itself. Inspired by the MDL principle, wedevelop an objective function for generative models that captures thedescription of the data by the model (log-likelihood) and the description ofthe model (model size). We also develop a efficient general search algorithmbased on the MAP-EM framework to optimize this function. Since recent workhas shown that minimizing the model size in a Hidden Markov Model forpart-of-speech (POS) tagging leads to higher accuracies, we test ourapproach by applying it to this problem. The search algorithm involves asimple change to EM and achieves high POS tagging accuracies on both Englishand Italian data sets.
30 Jun 2010	Jonathan May	Efficient Inference Through Cascades of Weighted Tree Transducers (ACL 2010 Practice Talk) Time: 4:00 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Weighted tree transducers have been proposed as useful formal models for representing syntactic natural language pro- cessing applications, but there has been little description of inference algorithms for these automata beyond formal foundations. We give a detailed description of algorithms for application of cascades of weighted tree transducers to weighted tree acceptors, connecting formal theory with actual practice. Additionally, we present novel on-the-fly variants of these algorithms, and compare their performance on a syntax machine translation cascade based on (Yamada and Knight, 2001).
30 Jun 2010	Jason Riesa	Hierarchical Search for Word Alignment (ACL 2010 Practice Talk) Time: 3:30 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Essentially, we treat word alignment as a parsing problem, and induce a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of local and nonlocal features features, trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
11 Jun 2010	Yoav Goldberg (Ben Gurion University of the Negev)	Easy First Dependency Parsing and How Different Parsers Behave Differently Time: 3:30 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: I will present a new kind of dependency parsing algorithm: easy first,non directional dependency parsing. This is a greedy, bottom upparser, admitting an efficient O(nlogn) implementation. Unlikeshift-reduce based greedy parsers, it does not analyze the sentence ina fixed sequential order, but instead tries to make easier attachmentdecisions between harder ones. The parser performs well on bothHebrew and English. I also present evidence that the parsers producesqualitatively different parses than either the Malt or the MSTparsers. This observation give rise to an intriguing questions: whydo different parsers produce different parses? can we quantify thiskind of difference? In the second part of the talk I will present myattempts to answer these kinds of questions.
10 Jun 2010	Mark Johnson (Macquarie University)	"Bayesian models of language acquisition" or "Where do the rules come from?" (continued from 7 Jun 2010) Time: 4:00 pm - 5:00 pm Location: 10th Floor Conference Room Abstract: This talk will be a continuation of topics from Monday's talk.
09 Jun 2010	Steven Bird (University of Melbourne)	The Human Language Project: Building a Universal Corpus of the World's Languages Time: 3:30 pm - 4:30 pm Location: 10th Floor Conference Room Abstract: We present a grand challenge to build a corpus that will include all of the world's languages, in a consistent structure that permits large-scale cross-linguistic processing, enabling the study of universal linguistics. The focal data types, bilingual texts and lexicons, relate each language to one of a set of reference languages. We propose that the ability to train systems to translate into and out of a given language be the yardstick for determining when we have successfully captured a language. We call on the computational linguistics community to begin work on this Universal Corpus, pursuing the many strands of activity described here, as their contribution to the global effort to document the world's linguistic heritage before more languages fall silent.(This talk will present joint work with Steve Abney.)Brief Bio: Steven Bird is Associate Professor in the Department of ComputerScience and Software Engineering at the University of Melbourne, andalso Senior Research Associate at the Linguistic Data Consortium. In2009 he served as president of the Association for ComputationalLinguistics, and he completed a textbook on Natural LanguageProcessing, published by O'Reilly. Steven studies scalable,semi-automatic methods for analyzing spoken and written language, andfor preserving endangered languages. This involves a mixture ofcomputational modelling and linguistic fieldwork. For further detailsand online publications, please visit http://stevenbird.me/
08 Jun 2010	Reut Tsarfaty (Uppsala University)	Morphology in Parsing: A Taxonomy-Based Approach Time: 10:30 am - 11:30 am Location: 10th Floor Conference Room [1026] Abstract: It has been a prominent empirical fact in the last decade that languages which have properties that are different from those of English, for instance, languages with free word-order and rich morphological structure, do not lend themselves naturally to the application of statistical models developed for processing English. In this talk I focus on the parsing task and based on the kind of correspondence patterns between form and function that characterize richly inflected languages, I aim to identify the properties of models that can successfully cope with parsing such structures. I start by demonstrating complex many-to-many correspondence patterns in Natural Language using data from the Semitic language Modern Hebrew. I review properties of prominent models for morphological analysis (Stump 2001), and isolate the ones that are appropriate for modeling such complex patterns. I then propose to apply the same strategy to the syntactic domain, arguing that this provides not only for a streamlined interface to morphology, but also better yields a better framework for capturing morphosyntactic interactions on the whole. I illustrate this approach via a particular instantiation, the relational-realizational model of (Tsarfaty 2010), applied to parsing Modern Hebrew. I report significant improvements on various measures over competing alternatives and previously reported results. I finally suggest that other modeling frameworks may often be enhanced to cope better with rich morphosyntactic phenomena, by similarly analyzing their underlying properties and enhancing their relational, or realizational, component, accordingly. Speaker website:http://stp.lingfil.uu.se/~tsarfaty/
07 Jun 2010	Mark Johnson (Macquarie University)	"Bayesian models of language acquisition" or "Where do the rules come from?" Time: 2:00 pm - 3:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Each human language contains an astronomically large (if not unbounded) number of different sentences. How can something so large and complex possibly be learnt? Over the past decade and a half we've figured out how to define probability distributions over grammars and the linguistic structures they generate, opening up the possibility of Bayesian models of language acquisition. Bayesian approaches are particularly attractive because they can exploit "prior" (e.g., innate) knowledge as well as statistical generalizations from the input. This opens the possibility of an empirical evaluation of the utility of various kinds of innate knowledge. Structured statistical learners have two major advantages over other approaches. First, because the generalizations they learn and the prior knowledge they utilize are both expressed in terms of explicit linguistic representations, it is clear what is learnt and what information is exploited during learning. Second, because of the "curse of dimensionality", learners that identify and exploit structural properties of their input seem to be the only ones that have a chance of "scaling up" to learn real languages. This talk describes Bayesian methods for learning Context-Free Grammars and a generalization of them that we call Adaptor Grammars, and applies them to problems of morphological acquisition and word segmentation.Joint work with Tom Griffiths (Berkeley) and Sharon Goldwater (Edinburgh)Speaker Bio: Mark Johnson is a Professor of Language Science (CORE) in the Department of Computing at Macquarie University. He was awarded a BSc (Hons) in 1979 from the University of Sydney, an MA in 1984 from the University of California, San Diego and a PhD in 1987 from Stanford University. He held a postdoctoral fellowship at MIT from 1987 until 1988, and has been a visiting researcher at the University of Stuttgart, the Xerox Research Centre in Grenoble, CSAIL at MIT and the Natural Language group at Microsoft Research. He has worked on a wide range of topics in computational linguistics, but his main research area is parsing and its applications to text and speech processing. He was President of the Association for Computational Linguistics in 2003, and was a professor from 1989 until 2009 in the Departments of Cognitive and Linguistic Sciences and Computer Science at Brown University. Professor Johnson's research area is computational linguistics, i.e., explicit computational models of language acquisition, comprehension and production. His recent work has focused on probabilistic models for syntactic parsing (identifying the way words combine to form phrases and sentences) and semantic interpretation, and on Bayesian models of the acquisition of phonology, morphology and the lexicon.
21 May 2010	Zornitsa Kozareva	Not All Seeds Are Equal: Measuring the Quality of Text Mining Seeds Time: 3:00 pm - 3:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Open-class semantic lexicon induction is of great interest for the current knowledge harvesting algorithms. We propose a general framework that uses patterns in bootstrapping fashion to learn open-class semantic lexicons for different kinds of relations. These patterns require seeds. To estimate the /goodness/ (the potential yield) of new seeds, we introduce a regression model that considers the connectivity behavior of the seed during bootstrapping. The generalized regression model is evaluated on six different kinds of relations with over 10000 different seeds for English and Spanish patterns. Our approach reaches robust performance of 90% correlation coefficient with 15% error rate for any of the patterns when predicting the /goodness/ of seeds.
19 May 2010	Jinho D. Choi (University of Colorado)	K-best, Transition-based Dependency Parsing using Robust Risk Minimization and Automatic Feature Reduction Time: 3:30 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this paper, we introduce a way of improving the parsing accuracy of a transition-based dependency parsing model by using k-best ranking. Our approach uses a broader search space than beam search, yet keeps the parsing complexity near a quadratic average running time. In addition, we take a simple post-processing step to ensure the parsing output is a connected dependency tree. As an oracle, we use a high-performing but relatively under-explored machine learning algorithm, Robust Risk Minimization, which gives a higher parsing accuracy than the Perceptron algorithm in the experiments. We also use an automatic feature reduction technique that reduces the feature space by about 49% without compromising the parsing accuracy. We evaluate our approach on the CoNLL '09 shared task English data and improve the transition-based dependency parsing accuracy, showing a 0.64% higher accuracy than the best transition-based CoNLL '09 system.
30 Apr 2010	Walter Daelemans (University of Antwerp)	Robust features for Computational Stylometry Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Computational stylometry is the automatic assignment ofauthor properties (e.g., identity, gender, personality,region, age, period, ideology, ...) to a text. Applicationsrange from forensic use to literary scholarship. Themethodology currently most successful is based on the wellknown approach to text categorization using trainingdata in the form of texts with known classes. The approachworks by extracting text features, selecting the best onesusing statistical methods, representing the text as a vectorof these features, and applying machine learning methods tothe resulting vectors with associated classes. The maindifference with the original text categorization approach isthat the extracted text features may be complex andlinguistically motivated (e.g. syntactic features).I will describe some recent applications at the Universityof Antwerp using this methodology: personality detection,author assignment with many authors and short texts, scribedetection in medieval texts, provenance and ideology detectionin Kenyan news articles, etc.I will then focus on an empirical comparison of therobustness of different feature types in differentsituations. Bio: Walter Daelemans (PhD in Computational Linguistics, University of Leuven, 1987). Trained as a linguist and psycholinguist at the Universities of Antwerpen and Leuven, he specialised in computational linguistics and held research posts at the University of Nijmegen and the AI Lab of the University of Brussels before becoming a lecturer in Computational Linguistics and Artificial Intelligence at Tilburg University where he founded an early research group on machine learning of language (ILK). Since 1999 he is full-time professor at the University of Antwerp where he also heads the computational linguistics group within the CLiPS research centre. His mainresearch interests are in machine learning of language (especially memory-based learning), text analytics, and computational psycholinguistics. He co-founded ACL's Special Interest Group on Natural Language Learning (SIGNLL) and its associated conference and shared task series (CoNLL).
16 Apr 2010	Rutu Mulkar-Mehta	Understanding Granularity in Natural Language Discourse (Ph.D. Proposal practice talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Granularity is the task of breaking down a complex description into simpler concepts of finer detail, such that each of the simpler concepts can be collectively describe the main description. It can be thought of as a hierarchy of varying levels of information, with fine grained and specific information i.e. information with more detail at lower levels, and coarse grained and generic information i.e. information with less detail, at higher levels. Shifting in granularity from lower to higher levels leads to information loss or abstraction of certain fine details which become irrelevant at that level. Similarly, shifting granularity from a coarse level to a fine level involves more specific details as compared to the level above this.Humans can seamlessly shift between various granularity levels when interpreting discourse. Textual descriptions are usually written such that the reader gets to know the key features of fine-grained events, and then theoverall picture from the coarse-grained description of a process. This thesis proposal is towards identification and extraction of such structures from Natural Language Discourse.
14 Apr 2010	Jonathan May	Weighted Tree Automata and Transducers for Syntactic Natural Language Processing (Ph.D. Defense practice talk) Time: 4:00 pm - 5:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Weighted finite-state string transducer cascades are a powerful formalism for models of solutions to many natural language processing problems such as speech recognition, transliteration, and translation. Researchers often directly employ these formalisms to build their systems by using toolkits that provide fundamental algorithms for transducer cascade manipulation, combination, and inference. However, extant transducer toolkits are poorly suited to current research in NLP that makes use of syntax-rich models. More advanced toolkits, particularly those that allow the manipulation, combination, and inference of weighted extended top-down tree transducers, do not exist. In large part, this is because the analogous algorithms needed to perform these operations have not been defined. This thesis solves both these problems, by describing and developing algorithms, by producing an implementation of a functional weighted tree transducer toolkit that uses these algorithms, and by demonstrating the performance and utility of these algorithms in multiple empirical experiments on machine translation data.
05 Apr 2010	Satoshi Sekine (NYU)	On-Demand Information Extraction and Knowledge Discovery Time: 10:30 am - 11:30 am Location: 11th Floor Large Conference Room [1135] Abstract: At present, adapting an Information Extraction system to new topics isan expensive and slow process, requiring some knowledge engineering foreach new topic. We propose a new paradigm of Information Extractionwhich operates 'on demand' in response to a user's query. On-demandInformation Extraction (ODIE) aims to completely eliminate thecustomization effort. Given a user's query, the system willautomatically create patterns to extract salient relations in the textof the topic, and build tables from the extracted information usingparaphrase discovery technology. It relies on recent advances in patterndiscovery, paraphrase discovery, and extended named entity tagging.I will show you a demo system, which produce a table in less than aminute for any give queries.I will also explain the need of linguistic knowledge and introduce someweakly supervised learning methods. I will show a demo of the ngramsearch engine, which extracts ngrams and sentences which match to aquery with arbitrary wildcard.Also, I will give a brief introduction about the Web People Search.It is a task to disambiguate search results of people name and peopleattribute extraction task. We organized WePS1 and 2, and currentlystarted the third evaluation, which includes 2 tasks: 1) the combinedtask of people disambiguation and attribute extraction and 2)organization disambiguation from twitter messages.Brief Bio: Satoshi Sekine is an Research Associate Professor at New York University.He received his MSc at UMIST, UK in 1992 and his PhD in 1998 at NYU. Hehas been working on various topics, including parsing, NE, InformationExtraction and minimally supervised knowledge discovery. He edited a bookabout NE from John Benjamins, organized a JHU summer workshop 2009,WePS task, NSF symposium on Semantic Knowledge Discovery, Organizationand Use in 2008, workshop on Textual Entailment and Parsing 2007 and so on.
02 Apr 2010	Eduard Hovy	Annotation Time: 3:00 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: Despite a lot of recent attention, corpus annotation remains somewhat of an art. This talk is the main part of a tutorial intended to provide the attendee with an in-depth look at the procedures, issues, and problems in corpus annotation. After describing some currently available resources, services, and frameworks (including the QDAP annotation center, Amazon's Mechanical Turk, annotation facilities in GATE, and UIMA), it addresses the open questions, pitfalls, and problems that the annotation manager should avoid, highlighting the seven major issues at the heart of annotation for which there are as yet no standard and fully satisfactory answers or methods. For each of these it provides suggestions and a possibly helpful list of references.Your participation in order to critique the tutorial is appreciated!
31 Mar 2010	Haitao Mi (ICT China)	Lattice and Forest for SMT Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Statistical machine translation (SMT) has witnessed promising progress in recent years. Typically, an SMT system is characterized as a single-best pipeline, whose modules are independent to each other and only take as input single-best results from the previous module. With this assumption, each module will inevitably introduce errors in single-best outputs, which will propagate and accumulate along the pipeline, and eventually hurt the translation quality.In order to alleviate this problem, we use compact structures such as lattices and forests instead of single-best results in each module, and then integrate both lattice and forest into a single tree-to-string system. We explore the algorithms of lattice parsing, lattice-forest-based rule extraction and decoding. Experiments show a statistically significant improvement over a start-of-the-art forest-based baseline. More interestingly, we observe a significant reduction in rule-set size when extracting with a lattice, which implies better generalizability (with a smaller model).About the speaker:Haitao Mi is an Assistant Researcher in the Institute of Computing Technology, Chinese Academy of Sciences (CAS/ICT). He received his Ph.D. from CAS/ICT in 2009. His main research interests include syntax-based machine translation and statistical parsing. Additional information about him and his group can be found at http://nlp.ict.ac.cn/~mihaitao/
30 Mar 2010	Victoria Fossum	Integrating Parsing and Word Alignment in Syntax-Based Machine Translation (Ph.D. Defense practice talk) Time: 4:00 pm - 5:00 pm Location: 11th Floor Conference Room [1135] Abstract: Training a string-to-tree syntax-based statistical machine translationsystem to translate from a source language (e.g. Chinese or Arabic)into a target language (e.g. English) requires the followingresources: a parallel corpus (a large set of example sentences in thesource language that have been translated into the target language bya human); a word alignment (a word-to-word correspondence between eachsource-target sentence pair); and a parse tree (a syntacticrepresentation) of each sentence in the target language. From thesetraining examples, the system learns to translate source-languagesequences of words into target-language trees. In order to ensurebroad coverage, the parallel corpus of training examples must besufficiently large (on the order of millions of sentence pairs).Manually annotating such large corpora would be prohibitivelytime-consuming. Instead, these corpora must be word-aligned andparsed automatically.There are two problems with existing approaches to automatic wordalignment and parsing for syntax-based machine translation. First,these processes are noisy and introduce errors which impacttranslation quality. Second, these processes are typically performedindependently of one another. Since each process produces constraintsthat can be used to guide the other, by more closely integrating them,we can expect to improve the accuracy of each process. In thisthesis, we address these two problems as follows: first, we improveupon the accuracy of a state-of-the-art parser; second, we use wordalignments to improve parse accuracy; third, we use parses to improveword alignment accuracy; and fourth, we optimize parses and wordalignments simultaneously. We examine the impact of each of thesemethods upon parse quality, alignment quality, and translation qualityin a downstream syntax-based machine translation system. Our resultsdemonstrate that more closely integrating word alignment and syntacticparsing can indeed improve the accuracy of each process, and in somecases leads to an improvement in translation quality relative to astate-of-the-art syntax-based statistical machine translation system.
26 Mar 2010	Elsi Kaiser (USC)	Discourse coherence effects in language processing: A psycholinguistic approach Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk I will discuss some recent results from my lab on the relationship between reference resolution and coherence relations. Previous work found that pronoun interpretation is guided by the coherence relations between clauses (e.g., 'as a result', 'and then', 'and similarly'), e.g. Hobbs (1979), Kehler et al. (2008). For example, consider "Phil tickled Stan, and similarly Liz poked him" (preference to interpret 'him' as Stan) and "Phil tickled Stan, and as a result Liz poked him" (more consideration of Phil as the antecedent of 'him'). However, the linguistic and cognitive properties of these coherence representations are not yet fully understood, and it is also not yet clear whether this kind of coherence sensitivity extends straightforwardly to other kinds of reduced referring expressions in addition to pronouns (e.g. anaphoric demonstratives, which can in many languages be used to refer to humans as well). I will discuss experiments -- conducted using a visual-world eye-tracking paradigm as well as other methods -- that investigate the nature and generality of these coherence representations. In addition to investigating whether coherence effects extend to other reduced referring expressions, I have also explored the domain-generality of coherence representations, for example whether non-linguistic, visuo-spatial input (video clips of moving shapes) can prime (bias) subsequent reference resolution in a seemingly unrelated task. Time permitting, I will also discuss issues related to data analysis and the annotation of data collected through psycholinguistic experiments.Brief bio: Elsi Kaiser is an Assistant Professor of Linguistics at the University ofSouthern California, with a specialization in Psycholinguistics. Shereceived her Ph.D. from the University of Pennsylvania in 2003, and was apost-doc at the University of Rochester for two years before moving toUSC. Her current research focuses on the comprehension of variousreferential forms (including pronouns, reflexives and demonstratives) indifferent languages, which she investigates using a range of tools,including eye-tracking.
05 Mar 2010	Liang Huang	Incremental Parsing Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: (a 20-minute version of this talk was given at the ISD retreat, with no technical details.)Parsing is the task of finding the most probable interpretation for a given sentence, and is a central problem in NLP because it serves as the basis of many downstream applications such as machine translation, summarization, paraphrasing, and question answering. Improving parsing efficiency and accuracy will greatly improve the applicability of those applications.However, unlike human parsing which is amazingly efficient by scanning the sentence incrementally, current state-of-the-art parsers are either extremely slow (standard algorithms like CKY scale cubically with sentence length), or purely greedy in the search algorithm that only touches a tiny fraction of the (exponentially) large search space. We instead propose a dynamic programming algorithm that does incremental parsing and ambiguity packing along the way, such that the running time is (almost) linear, and yet searches over exponentially many trees. Empirical results are very good, but further details withheld -- come to the talk! This is a joint work with Kenji Sagae, USC/ICT.
05 Feb 2010	David Farwell (Universitat Politecnica de Catalunya)	Knowledge Acquisition and Textual Entailment: a proposed research program Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: The aim of this presentation is to describe a program of research in the area of automatic knowledge acquisition which has been submitted in response to the European Information and Communication Technologies FP7 Call 5, Objective 4.3: Intelligent Information Management. The objective of this research program is to develop data-driven techniques and tools for extracting common sense knowledge from unstructured text and applying it for making the approximate inferences needed in order to interpret the ambiguities of human language communication.The central activities include developing techniques and tools for:- converting texts into representations of the particular events and entities they refer to,- identifying relations between these entity and event instances such as shared participants, temporal and spatial juxtapositions, causal connections, entailments, and so on, thereby constructing representations of complex scenarios,- inducing from sets of like entity, event and scenario instances, representations of entity, event and scenario types,- using these entity, event and scenario types as background knowledge to support approximate inferencing (e.g., statistical inference rules such as poisoning probably entails death) within important interactive tasks such as Information Retrieval and web search. The technologies developed will be validated by applying them to two broad NLU tasks: faceted search for Information Retrieval in the domain of health information and open-domain web search for web browsing and UI improvements.
22 Jan 2010	David Chiang	Towards Tree-to-Tree Translation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on tree-to-tree translation models: models that are able to learn the relationship between the grammars of both the source and target language. I will discuss the reasons why tree-to-tree translation has been a challenge, review existing attempts at tree-to-tree models, and present some of our own work-in-progress on robustly modeling source and target language syntax for significant improvements in translation quality.
15 Jan 2010	Min-Yen Kan (National University of Singapore)	ForeCite: towards a more integrated scholarly digital library Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Scholarly digital libraries (DLs) have managed to scale upto handle millions of documents and now feature tools to trackcitations and references between articles. However, users of digitallibraries typically often access the DL merely to check references or todownload the PDF of the document. What features will thenext-generation DL need to inspire scholars to use digital library formore than accessing the document? In ForeCite, our digital libraryproject at NUS, we believe part of the answer lies in integratingcommon end user's concerns: annotation, sharing, off-and-online usageand focusing on the intra-document processing. I will describe anddemonstrate some of the preliminary components of the ForeCite system:including its web based front end, ParsCit (a backend open-sourcecitation segmentation system), and ForeCiteNote (TiddyWiki basedresearch notetaking system) and ForeCiteReader (Google Books-likeinterface for annotation and collaboration on notetaking, and FireCite(browser extension for recognizing citations on webpages).Speaker Bio: Min-Yen Kan (BS;MS;PhD Columbia Univ.) is an associate professor atthe National University of Singapore. His research interests includedigital libraries and applied natural language processing. Specificprojects include work in the areas of scientific discourse analysis,multiword expression extraction and understanding, machine translationand applied text summarization. Currently, he is an associate editorfor "Information Retrieval" and is the Editor for the ACL Anthology,the computational linguistics community's largest archive of publishedresearch. More information about him and his group can be found at theWING homepage: http://wing.comp.nus.edu.sg/
11 Dec 2009	Anselmo Penas (UNED, Spain)	Evaluating Question Answering Validation Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: During the last decade, Question Answering (QA) was redefined inside TREC as a kind of highly-precision-oriented Information Retrieval task where the introduction of NLP was necessary, specially for Answer Extraction purposes. The same general approach was activated at the Cross-Language Evaluation Forum (CLEF) in 2003, but for other European languages different than English, and with some different settings and subtasks. The talk will report the last 4-year cycle of the QA evaluation at CLEF, starting with the general methodology for long term QA evaluation at CLEF and the motivation for the Answer Validation task, continuing with the development of AVE in the three year campaign, and concluding with the goals, evaluation measure and results of the current QA evaluation setting after the AVE experience.
09 Dec 2009	Tomohide Shibata (Kyoto University)	Introduction of Our Research (text analysis and IR) Time: 3:30 pm - 4:30 pm Location: 11th Floor Large Conference Room [1135] Abstract: I am Tomohide Shibata, an assistant professor at Kyoto University,Japan. I am working with Prof. Kurohashi. I have been visitingProf. Hovy for three weeks. In this talk, I introduce ourresearch. Our research roughly consists of three fields: basic textanalysis, information retrieval and machine translation. Among them,basic text analysis and information retrieval, which I am engaged in,are introduced.In basic text analysis, we have been developed Japanese morphologicalanalyzer and parser, which are widely used in research community. Caseframes, which describe the relation between a verb and its casecomponents, are automatically constructed from a large Webcorpus. Synonym and is-a relations are automatically extracted from adictionary and Web corpus.In Information Retrieval, we are running a search engineinfrastructure called TSUBAKI. The features of TSUBAKI are (i) thesentence structure (dependency relation) is considered in the documentranking, and (ii) the expression divergence between a query and adocument is assimilated. We are also running a search resultclustering system based on TSUBAKI.
04 Dec 2009	Donald Metzler (Yahoo! Research)	Learning Query Concept Importance Using a Weighted Dependence Model Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as Web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an assumption that often does not hold, especially for longer, more complex queries. In this talk, I will describe the state-of-the-art practices for modeling query term dependencies for information retrieval using Markov random fields. Within this context I will discuss why many NLP-inspired approaches to the problem, such as query segmentation, have failed to show consistent improvements when applied to information retrieval tasks. Experimental results carried out on a number of TREC and Yahoo! Web search test collections will be presented showing the effectiveness of various types of term (in)dependence models. Brief bio:Donald Metzler is a Research Scientist in the Search and Computational Advertising group at Yahoo! Research. He obtained his Ph.D. from the University of Massachusetts in 2007. He is an active member of the information retrieval and web search communities, having served on the program committees of SIGIR, ECIR, HLT, EMNLP, WSDM, WWW, and ICML. He has published over 35 research papers, has 13 patents pending, and is the co-author of Search Engines: Information Retrieval in Practice. His research interests include information retrieval, web search, computational advertising, and applications of machine learning to large-scale text problems.
20 Nov 2009	Marco Pennacchiotti (Yahoo! Research)	Entity Extraction via Ensemble Semantics Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: In this talk I will present Ensemble Semantics (ES), a new general framework for information extraction developed at Yahoo!, that combines multiple sources of information and extractors. The ES framework is based on the hypothesis that although distributional and pattern-based extraction algorithms are complementary, they do not exhaust the semantic space; other sources of evidence can be leveraged to better combine them. In this presentation, I will focus on a specific implementation of ES for the task of entity extraction. I will report experimental results showing large gains in performance, by combining state-of-the-art distributional and pattern-based systems with a large set of features from a document webcrawl, one year of query logs, and a snapshot of Wikipedia. I will also propose an analysis of feature correlations and interactions showing the value of the different feature sets. I will conclude discussing some issues that can impact on the overall performance of entity extraction algorithms.
23 Oct 2009	Steve DeNeefe	Tree Adjoining Machine Translation (thesis proposal practice talk) Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Tree Adjoining Grammars have well-known advantages but are typically considered too difficult for practical systems. We propose that, when done right, adjoining improves translation quality without becoming computationally intractable. Using adjoining to model optionality allows general translation patterns to be learned without the clutter of endless variations of optional material. The appropriate modifiers can later be spliced in as needed to translate details.In this proposal, we describe challenges encountered by phrase-based and syntax-based machine translation (MT) systems today, and present an in-depth, quantitative comparison of both models. Then, we describe a novel model for statistical MT which addresses these challenges using a Synchronous Tree Adjoining Grammar. We introduce a method of converting these grammars to a weakly equivalent tree transducer for decoding. And we present a method for learning the rules and associated probabilities of this grammar from aligned tree/string training data.Finally, our initial results show that adjoining already delivers an end-to-end improvement of +0.8 BLEU over a baseline statistical syntax-based MT model on a medium-scale Arabic/English MT task. Furthermore, we demonstrate it is a competitive entry in the Urdu-English track of the 2009 NIST MT evaluation. We then propose improvements to the model, decoding, and extraction that promise to allow this new, linguistically-motivated MT model to surpass its syntax-based and phrase-based cousins in a wide range of scenarios and language pairs.
21 Oct 2009	Douglas W. Oard (Maryland)	Who 'Dat? Identity resolution in large email collections Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: Automated techniques that can support the human activities of search andsense-making in large email collections are of increasing importance for abroad range of uses, including historical scholarship, law enforcement andintelligence applications, and lawyers involved in "e-discovery" incidentto civil litigation. In this talk, I'll briefly describe some of the workto date on searching large email collections, and then for most of thetalk I will focus on the more challenging task of support forsense-making. Specifically, I'll describe joint work with Tamer Elsayedto automatically resolve the identity of people who are mentionedambiguously (e.g., just by first name) in a collection of email from afailed corporation (Enron). Our results indicate that for people who arewell represented in the collection we can use a generative model to guessthe right identity about 80% of the time, and for others we are rightabout half the time. I'll conclude the talk with a few remarks on ournext directions for techniques, evaluation, and additional types ofcollections to which similar ideas might be applied.About the Speaker: Douglas Oard is an Associate Professor at the University of Maryland,College Park, with joint appointments in the College of InformationStudies and the Institute for Advanced Computer Studies; he is onsabbatical at Berkeley's iSchool for the Fall 2009 semester. Dr. Oardearned his Ph.D. in Electrical Engineering from the University ofMaryland, and his research interests center around the use of emergingtechnologies to support information seeking by end users. His recent workhas focused on interactive techniques for cross-language informationretrieval and techniques for search and sense-making in conversationalmedia. Additional information is available athttp://www.glue.umd.edu/~oard/.
09 Oct 2009	Nandakishore Kambhatla (IBM India)	Extracting Social Networks and Biographical Facts from Conversational Speech Transcripts Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: We present a general framework for automatically extracting social networks and biographical facts from conversationalspeech. Our approach relies on fusing the output produced by multiple information extraction modules, including entityrecognition and detection, relation detection, and event detection modules. We describe the specific features and algorithmicrefinements effective for conversational speech. These cumulatively increase the performance of social networkextraction from 0.06 to 0.30 for the development set, and from 0.06 to 0.28 for the test set, as measured by f-measure on theties within a network. The same framework can be applied to other genres of text -- we have built an automatic biographygeneration system for general domain text using the same approach.--Brief Bio: Nanda Kambhatla has nearly 17 years of research experience in the areas ofNatural Language Processing (NLP), text mining, information extraction, dialog systems, andmachine learning. He holds 6 U.S patents and has authored over 30 publications in books,journals, and conferences in these areas. Nanda holds a B.Tech in Computer Science and Engineeringfrom the Institute of Technology, Benaras Hindu University, India, and a Ph.D in ComputerScience and Engineering from the Oregon Graduate Institute of Science & Technology, Oregon, USA.Currently, Nanda is the manager of the Data Analytics Group at IBM's India Research Lab (IRL), Bangalore. The group is focused on research on machine translation, Natural Language Processing, text analysis and machine learning techniques for developing analyticssolutions to help IBM's services divisions. Most recently, Nanda was the manager of the StatisticalText Analytics Group at IBM's T.J. Watson Research Center, the Watson co-chair of the NaturalLanguage Processing PIC, and the task PI for the Language Exploitation Environment (LEE) subtaskfor the DARPA GALE project. He has been leading the development of information extractiontools/products and his team has achieved top tier results in successive Automatic Content Extraction(ACE) evaluations conducted by NIST for extracting entities, events and relations from text frommultiple sources, in multiple languages and genres.Earlier in his career, Nanda has worked on natural language web-based and spoken dialog systems at IBM. Before joining IBM, he has worked on information retrieval and filtering algorithms as a senior research scientist at WiseWire Corporation, Pittsburgh and on image compression algorithms while working as a postdoctoral fellow under Prof. Simon Haykin at McMaster University, Canada. Nanda's research interests are focused on NLP and technology solutions for creating, storing, searching, and processing large volumes of unstructured data (text, audio, video, etc.) and specifically on applications of statistical learning algorithms to these tasks.
11 Sep 2009	David Chiang	Tutorial on HPC Time: 3:00 pm - 4:00 pm Location: 11th Floor Large Conference Room [1135] Abstract: This tutorial will be a short introduction to using the Linux cluster atUSC's High-Performance Computing (HPC) facility. Topics will include:(1) basics of starting jobs on the cluster using Torque/PBS,(2) dealing with common problems like jobs not starting orspontaneously dying,(3) maximizing the performance of your jobs (both yours and otherpeople's), e.g., using the correct filesystem and tuning it for better speed,(4) embarrassingly parallel processing and poor-man's workflows.It will NOT coverHadoop,MPI,real workflow management tools like Condor.
28 Aug 2009	Adam Pauls (UC Berkeley) Michael Auli (Edinburgh)	Intern Final Talks Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Tree-to-String Alignment ModelsMachine translation systems typically rely on some form alignment as apreprocessing step. Typically, these alignments take the form ofword-to-word alignments. In this talk, we will introduce severalmodels aimed at aligning foreign words to either English words ornodes in the English parse tree. Such word-to-node alignments offerseveral potential advantages over traditional word-to-wordalignments. Firstly, since the extraction process for some syntacticsystems explicitly considers the English trees, we expect that alsoconsidering the trees at alignment time will produce alignments thatwill better suit the extraction process. Secondly, aligning foreignfunction words to English tree nodes can admits highly desirablesyntactic transfer rules which cannot be directly as word-to-wordalignments. Finally, word-to-node alignments can effectively modelmany-to-one alignments. We present four models of increasingcomplexity and show preliminary results for each model.
27 Aug 2009	Erica Greene (Haverford) Paramveer Dhillon (Penn)	Intern Final Talks Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: TALK 1: Erica GreeneTitle: A Statistical Foray into PoetryAbstract: Although the analysis and generation of poetry is often considered anexclusively human task, we have taken some initial steps to automatethe process. We build a series of finite state transducers to analyzepoetic meter and train them on a handmade corpus of poetry. We thenuse these trained transducers to generate poetry. Specifically, wefocus on generating sonnets and limericks.------------------------------------------TALK 2: Paramveer DhillonTitle: Learning to simplify target language for MT + Unsupervised log-linearmodels for Word AlignmentAbstract: We consider the Machine Translation task for the language pair(Chinese and English), where English is the target language. There arelots of redundancies in English language, e.g. It has capitalization,i.e. the first word of each sentence is capitalized, and it hasdifferent morphology i.e. it has noun and verb endings; none of whichare present in Chinese. In a way, due to these redundancies, we arelearning that a single Chinese word "tamen" translates to "They" and"they" and another Chinese word translates to "run", "runs" and"running". We present generative models which learn to "cluster" thetarget language vocabulary, by removing the above redundancies, namely(Capitalization and Different morphology). We show results on how this"clustering" affects the translation quality in end-to-end MTexperiments. In the last part of the talk, I would talk about using unsupervisedlog-linear(discriminative) models for improving word alignments. Thereare very few precedents of using discriminative models for wordalignment in totally unsupervised settings. (Taskar et. al. '05) and(Lacoste-Julien et. al. '06) used maximum weight bipartite matching in"nearly" unsupervised setting and (Blunsom et. al. '06) used CRFs forsupervised word alignment. We use log-linear models in totallyunsupervised settings to do word alignments. Speicifically we useContrastive Estimation (Smith et. al. '05) to shift the probabilitymass to the correct set of alignments from a well-chosen"neighborhood" of those alignments. In the end I will show somepreliminary word alignment results using our approach.
26 Aug 2009	Sujith Ravi	Natural Language Decipherment: Solving Problems in Natural Language Processing without Labeled Data (Thesis Proposal practice talk) Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Natural Language Decipherment: Solving Problems in Natural Language Processing without Labeled Data (Thesis Proposal practice talk)A wide variety of problems in NLP require parallel data to train supervised models to perform different tasks. For example, in machine translation (where the task is to translate between two languages automatically) parallel data containing source/target language sentence pairs is required to train various models which can then be used to translate new sentences or documents. The dependency on parallel data for many of these NLP tasks limits their applications to specific domains, or language pairs for which a lot of training data is readily available. On the other hand, collecting parallel data for new domains, language pairs, etc. is a costly as well as time-intensive operation. For such tasks, the development of novel unsupervised approaches which require only {em non-parallel} data for training can enable their application to new domains and potentially broaden the impact and benefits of NLP research to wider areas.A similar problem has been tackled by cryptographers and archaeologists in a different context---for "decipherment" purposes. During the 1940's and 1950's, mathematicians and scientists worked on code-breaking operations, which spurred the development of many research ideas for modern computer science. For such problems, it is highly unlikely to assume the availability of parallel data relating the ciphertext and plaintext, yet cryptographers and archaeologists have attempted to solve such tasks using various decipherment techniques along with other non-parallel sources of information. In this thesis proposal practice talk, I will show how we combine the two ideas (decipherment and unsupervised learning for NLP problems) together and present a unified decipherment-based approach for modeling a wide range of problems in NLP. Instead of relying on parallel data, I propose to use alternate sources of linguistic knowledge and large quantities of readily available monolingual data to induce strong bilingual connections in problems such as machine transliteration and translation. The talk will describe how various NLP problems such as unsupervised part-of-speech tagging, word alignment, transliteration, and machine translation can be formulated as decipherment tasks. I will present decipherment algorithms for tackling many of these problems and show that it is possible to achieve good results for many problems of interest in NLP without using any parallel data at all.
21 Aug 2009	Liang Huang	Bilingually-Constrained (Monolingual) Shift-Reduce Parsing Time: 3:00 pm - 4:15 pm Location: 4th Floor Conference Room Abstract: Jointly parsing two languages has been shown to improve accuracies oneither or both sides. However, its search space is much bigger thanthe monolingual case, forcing existing approaches to employcomplicated modeling and crude approximations. Here we propose a muchsimpler alternative, bilingually-constrained monolingual parsing,where a source-language parser learns to exploit reorderings asadditional observation, but not bothering to build the target-sidetree as well. We show specifically how to enhance a shift-reducedependency parser to use alignment features to resolve shift-reduceconflicts. Experiments on the bilingual portion of Chinese Treebankshow that, with just 3 bilingual features, we can improve parsingaccuracies by 0.6% for both English and Chinese, with negligible (~6%)efficiency overhead, thus much faster than biparsing. http://www.cis.upenn.edu/~lhuang3/biparsing.pdf
24 Jul 2009	Adam Pauls (UC Berkeley) Ulf Hermjakob	Practice talks for EMNLP Time: 3:00 pm - 4:15 pm Location: 11 Large Abstract: K-Best A* Parsing (Adam Pauls)A* parsing makes 1-best search efficient bysuppressing unlikely 1-best items. Existing k-best extraction methods can efficiently searchfor top derivations, but only after an exhaus-tive 1-best pass. We present a unified algo-rithm for k-best A* parsing which preservesthe efficiency of k-best extraction while giv-ing the speed-ups of A* methods. Our algo-rithm produces optimal k-best parses under thesame conditions required for optimality in a1-best A* parser. Empirically, optimal k-bestlists can be extracted significantly faster thanwith other approaches, over a range of gram-mar types.------------------------------------------Improved Word Alignment with Statistics and Linguistic Heuristics (Ulf Hermjakob)We present a method to align words in a bitext that combines elementsof a traditional statistical approach with linguistic knowledge.We demonstrate this approach for Arabic-English, using an alignmentlexicon produced by a statistical word aligner, as well as linguisticresources ranging from an English parser to heuristic alignment rulesfor function words. These linguistic heuristics have been generalizedfrom a development corpus of 100 parallel sentences.Our aligner, UALIGN, outperforms both the commonly used GIZA++ alignerand the state-of-the-art LEAF aligner on F-measure and producessuperior scores in end-to-end statistical machine translation,+1.3 BLEU points over GIZA++, and +0.7 over LEAF.
23 Jul 2009	Mark Hopkins (Language Weaver)	Cube Pruning as Heuristic Search (Practice talk for EMNLP) Time: 3:00 pm - 3:45 pm Location: 11 Large Abstract: Cube pruning is a fast inexact method for generating the items of abeam decoder. Here we show that cube pruning is essentiallyequivalent to A* search on a specific search space with specificheuristics. We use this insight to develop faster and exact variantsof cube pruning.
17 Jul 2009	Paramveer Dhillon (Penn)	Transfer Learning for WSD & Non-local constraints for Named Entity Recognition Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: This talk will be divided into two parts. In the first part I willtalk about using Transfer Learning techniques to improve the task ofWord Sense Disambiguation (WSD).Usually in supervised WSD, we suffer due to paucity of labeled data asthere are some words that occur less frequently in the data and itsvery difficult to get enough labeled data for these words. In suchcases it is very difficult to build high accuracy supervised learningmodels for these words. So, we propose an approach called TransFeat(based on the MDL principle) which ``transfers information", fromsimilar words in the form of a feature relevance prior to get improvedaccuracies on these rare words. Besides this, our experiments showthat we also get decent improvement in accuracy for words that havemore amount of labeled data available. TransFeat gives accuracies thatare in the worst case comparable to state-of-the-art on ONTONOTES andSENSEVAL-2 datasets. In the second part of the talk I will talk about incorporatingnon-local constraints in Named Entity Recognition (NER) systems. Themain idea is that some linguistic constraints (e.g. every occurrenceof the word ``Einstein" in the data should have the tag PERi.e. person ) are very useful and can give improved performance butthey are non - local and hence are intractable and can not beefficiently modeled using state-of-the-art sequence modeling methodslike CRFs. Though people have used Skip-chain CRFs (with LoopyBP)(Sutton and McCallum '04) and Gibbs Sampling (Finkel and Manning'05) to enforce these non-local constraints, but they turn out to bereally inefficient and custom-tailored to one particular kind ofconstraints (say) consistency constraints of the type mentionedabove. We propose a constrained version of EM in which a general setof constraints (not limited to consistency constraints!) can beincorporated into the model. In the end I will show some results ofthis approach on CoNLL 03 English and CoNLL 02 Spanish NER shared tasks.
16 Jul 2009	Yang Liu (ICT China)	Weighted Alignment Matrices for Statistical Machine Translation Time: 10:30 am - 11:30 am Location: 11 Large Abstract: Current statistical machine translation systems usuallyextract rules from bilingualcorpora annotated with 1-best alignments. They are prone to learnnoisy rules due to alignment mistakes. We propose a new structurecalled weighted alignment matrixto encode all possible alignments for a parallel text compactly. Thekey idea is to assign a probability to each word pair to indicate howwell they are aligned. We design new algorithms for extracting phrasepairs from weighted alignment matrices and estimating theirprobabilities. Our experiments on multiple language pairs show thatusing weighted matrices achieves consistent improvements over usingn-best lists in significant less extraction time.About the speaker:Yang Liu is an Assistant Researcher at Institute of ComputingTechnology (ICT), Chinese Academy of Sciences. He received his PhDdegree in Computer Science from ICT in 2007. His major researchinterests include statistical machine translation and Chineseinformation processing. He has been working on syntax-based modeling,word alignment, and system combination. His paper on tree-to-stringtranslation won the Meritorious Asian NLP Paper Award of COLING/ACL2006. He served as Reviewers for TALIP, TSLP, JNLE, ACL, EMNLP, AMTA, and SSST.
15 Jul 2009	Yang Liu (ICT China)	An Overview of Tree-to-String Translation Models Time: 4:00 pm - 5:00 pm Location: 11 Large Abstract: Recent research on statistical machine translation has lead to the rapid development of syntax-based translation models, whichexploit syntactic information to direct translation. In this talk, Iwill give an overview of tree-to-string translation models, one of thestate-of-the-art syntax-based models. In a tree-to-string model, the source side is a phrase structure parse tree and the target side is astring. This talk includes the following topics: (1) tree-based tree-to-string model, (2) tree-sequence based tree-to-string model,(3) forest-based tree-to-string model, and (4) context-awaretree-to-string model. Experimental results show that the forest-basedtree-to-string system outperforms Hiero significantly on Chinese-to-English translation.About the speaker:Yang Liu is an Assistant Researcher at Institute of ComputingTechnology (ICT), Chinese Academy of Sciences. He received his PhDdegree in Computer Science from ICT in 2007. His major researchinterests include statistical machine translation and Chineseinformation processing. He has been working on syntax-based modeling,word alignment, and system combination. His paper on tree-to-stringtranslation won the Meritorious Asian NLP Paper Award of COLING/ACL2006. He served as Reviewers for TALIP, TSLP, JNLE, ACL, EMNLP, AMTA, and SSST.
10 Jul 2009	Kevin Knight	Excerpts from ACL-09 Tutorial on "Topics in Machine Translation" Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Philipp Koehn and I will do a machine translation tutorial at ACL.Instead of an introductory tutorial, we'll do short 15-minute segmentson various hot topics in MT research. For the ISI NL seminar, I'llpresent 3 or 4 of those topics, determined by audience vote.
26 Jun 2009	Steve DeNeefe	Synchronous Tree Adjoining Machine Translation (Practice talk for EMNLP) Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: Tree Adjoining Grammars have well-known advantages, but are typicallyconsidered too difficult for practical systems. We demonstrate that,when done right, adjoining improves translation quality withoutbecoming computationally intractable. Using adjoining to modeloptionality allows general translation patterns to be learned withoutthe clutter of endless variations of optional material, with extrainformation spliced in as needed. In this paper, we describe a novel method for learning a type ofSynchronous Tree Adjoining Grammar and associated probabilities fromaligned tree/string training data. We introduce a method ofconverting these grammars to a weakly equivalent tree transducer forefficient decoding. Finally, we show that adjoining results in anend-to-end improvement of +0.8 BLEU over a baseline statisticalsyntax-based MT model on a large-scale Arabic/English MT task.
19 Jun 2009	Adam Pauls (UC Berkeley)	Hierarchical Search for Parsing (and Machine Translation) Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Both coarse-to-fine and A* parsing use simple grammars to guide search incomplex ones. We compare the two approaches in a common, agenda-basedframework, demonstrating the tradeoffs and relative strengths of eachmethod. Overall, coarse-to-fine is much faster for moderate levels of searcherrors, but below a certain threshold A* is superior. In addition,we present the first experiments on hierarchical A* parsing, inwhich computation of heuristics is itself guided bymeta-heuristics. Multi-level hierarchies are helpful in bothapproaches, but are more effective in the coarse-to-fine case becauseof accumulated slack in A* heuristics.
29 May 2009	Marta Recasens Potau (Universitat de Barcelona)	Learning-based Coreference Resolution for Spanish and Catalan Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: The task of coreference resolution identifies those expressions in a text that point to the same discourse entity. Natural language applications such as information extraction, question answering and machine translation can greatly benefit from its output (the different pieces of information in connection with the same entity are linked, pronouns are disambiguated, etc.). The task is extremely complex since a number of knowledge sources come into play, from morphology to discourse structure and world knowledge. In this talk I present the results of my PhD research up to now, including the development of two 400k-word corpora for Spanish and Catalan (AnCora) annotated at various levels (morphology, syntax, semantics, pragmatics), a 100k-word corpus for English, and a series of experiments towards building a learning-based coreference resolution system. More specifically, I'll discuss issues concerning the definition of the annotation scheme, the selection of features for machine learning, the effect of sample selection, and I'll introduce CISTELL, the new learning-approach we propose for coreference resolution.
22 May 2009	Victoria Fossum Dirk Hovy	Practice talks for NAACL HLT Time: 3:00 pm - 4:00 pm Location: 11th flr CR Abstract: Combining Constituent Parsers (Victoria Fossum: 3:00pm -- 3:30pm)Combining the 1-best output of multiple parsers via parse selection orparse hybridization improves f-score over the best individual parser(Henderson and Brill, 1999; Sagae and Lavie, 2006). We propose three ways to improve upon existing methods for parser combination.---------------------------------------------------------Disambiguation of Preposition Sense Using Linguistically MotivatedFeatures (Dirk Hovy: 3:30pm -- 4:00pm) Classifying polysemous words into their proper sense classes ispotentially useful to any NLP application that needs to extractinformation from text or build a semantic representation of thetextual information. Like instances of other word classes, manyprepositions are ambiguous, carrying different semantic meanings(including notions of instrumental, accompaniment, location, etc.)In this paper, we present a supervised classification approach fordisambiguation of preposition senses. We use the SemEval 2007Preposition Sense Disambiguation datasets to evaluate our system andcompare its results to those of the systems participating in theworkshop. We derived linguistically motivated features from both sidesof the preposition. Instead of restricting these to a fixed windowsize, we utilized the phrase structure. Testing with five differentclassifiers, we can report an increased accuracy (76.4%) thatoutperforms the best system in the SemEval task.
15 May 2009	David Chiang	Practice talks for NAACL HLT Time: 3:00 pm - 4:00 pm Location: 4th flr CR Abstract: 11,001 New Features for Statistical Machine Translation (David Chiang)- Winner of Best Paper Award at NAACL/HLT 2009 We use the Margin Infused Relaxed Algorithm of Crammer et al. to add alarge number of new features to two machine translation systems: theHiero hierarchical phrase based translation system and oursyntax-based translation system. On a large-scale Chinese-Englishtranslation task, we obtain statistically significant improvements of+1.5 BLEU and +1.1 BLEU, respectively. We analyze the impact of the new features and the performance of the learning algorithm.
14 May 2009	Sujith Ravi	Practice talks for NAACL HLT Time: 3:00 pm - 4:00 pm Location: 4th flr CR Abstract: Talk-1: Learning Phoneme Mappings for Transliteration without Parallel DataWe present a method for performing machine transliteration without any parallel resources. We frame the transliteration task as a decipherment problem and show that it is possible to learn cross-language phoneme mapping tables using only monolingual resources. We compare various methods and evaluate their accuracies on a standard name transliteration task.This is joint work with Kevin Knight.----------------------------------------------------Talk-2: A New Objective Function for Word AlignmentWe develop a new objective function for word alignment that measures the size of the bilingual dictionary induced by an alignment. A word alignment that results in a small dictionary is preferred over one that results in a large dictionary. In order to search for the alignment that minimizes this objective, we cast the problem as one of integer linear programming. We then extend our objective function to align corpora at the sub-word level, which we demonstrate on a small Turkish-English corpus.This is joint work with Tugba Bodrumlu and Kevin Knight.
08 May 2009	Andrew Kehler (UCSD)	Coherence and the (Psycho-) Linguistics of Pronoun Interpretation Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: More than three decades of research has sought to uncover theprinciples that determine how hearers interpret pronouns in context.This work has focused predominantly on identifying so-called'preferences' or 'heuristics' that hearers utilize based on linguisticproperties of antecedent expressions. This focus is a departure fromthe type of approach outlined in Hobbs (1979), which argues that themechanisms that drive pronoun interpretation are driven predominantlyby semantics, world knowledge, and inference, with particularreference to how these are used to establish the coherence ofdiscourses.In this talk, I report on new experimental evidence in support of acoherence-driven analysis, and describe how the analysis canaccommodate a range of previous findings suggestive of conflictingpreferences and biases. Case studies of four commonly-citedpreferences are described, specifically (i) the parallel grammaticalrole preference (e.g., Smyth 1994), (ii) thematic role preferences(e.g., Stevenson et al. 1994), (iii) implicit causality biases (e.g.,Caramazza et al. 1977), and (iv) the subject assignment strategy(e.g., Crawley et al. 1990). In each case, the experimental resultsoffer an explanation of what the underlying source of the bias is, andpredicts in what contexts evidence for it will surface.These results suggest that pronoun interpretation is incrementallyinfluenced in part by the probabilistic expectations that hearers haveabout how the discourse will be coherently continued. They are alsoargued to leave various myths by the roadside, e.g., that pronouninterpretation can be profitably thought of as a 'search and match'procedure, and that coherence relations need not be controlled for inexperimental stimuli.This talk includes joint work with Laura Kertz, Hannah Rohde, andJeffrey Elman.
17 Apr 2009	Rahul Bhagat	Learning Paraphrases from Text (Ph.D. Defense practice talk) Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Paraphrases are textual expressions that convey the same meaning using different surface forms. Capturing the variability of language, they play an important role in many natural language applications including question answering, machine translation, and multi-document summarization. In linguistics, paraphrases are characterized by approximate conceptual equivalence. Since no automated semantic interpretation systems available today can identify conceptual equivalence, paraphrases are difficult to acquire without human effort. The aim of this thesis is to develop methods for automatically acquiring and filtering phrase-level paraphrases using a monolingual corpus.Noting that the real world uses far more quasi-paraphrases than the logically equivalent ones, we first present a general typology of quasi-paraphrases together with their relative frequencies. To our knowledge the first one ever. We then present a method for automatically learning the contexts in which quasi-paraphrases obtained from a corpus are mutually replaceable. Knowing that quasi-paraphrases are often inexact because they contain semantic implications which can be directional, we present an algorithm called LEDIR to learn the directionality of quasi-paraphrases. Since semantic classes play a crucial role in our work, we also investigate the use of a semi-supervised clustering algorithm for learning semantic classes.We next investigate the task of learning surface paraphrases, i.e., paraphrases that do not require the use of any syntactic interpretation. Since one would need a very large corpus to find enough surface variations, we use a really large but unprocessed corpus of 150GB (25 billion words) obtained from Google News to do this learning. We show that these paraphrases can be used to learn surface patterns for relation extraction. Finally, we use paraphrases to learn patterns for domain-specific information extraction.Thus, in this thesis we define quasi-paraphrases, present methods to learn them from a corpus, and show that quasi-paraphrases are useful for information extraction.
27 Mar 2009	David Chiang	Tutorial on Hadoop Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Hadoop is an open-source implementation of the Map/Reduce framework introduced by Google Research. It is a simple abstraction for describing parallelizable algorithms that admits very efficient execution: in one case, one of my (poorly implemented) algorithms was improved from a typical runtime of 72 hours to 3 hours. I will give a short introduction to Hadoop that is highly colored by my experiences with it and the likely experiences of other natural language processing researchers at ISI. I will show how to run Hadoop on HPC, how to use Hadoop Streaming (which allows implementation in any language you choose), and how to define Map/Reduce algorithms for a few incarnations of a typical NLP task, relative-frequency estimation of a large probability distribution. Input from others who are more experienced with Hadoop than I am is welcome!
19 Mar 2009	Rutu Mulkar	Discovering Causal and Temporal Relations in Biomedical Texts (practice talk for AAAI Spring Symposium) Time: 2:00 pm - 2:30 pm Location: 4th floor CR Abstract: In previous work on "Learning by Reading" we successfully extracted entities, states and events from technical natural language descriptions of processes. The research described here is aimed at the automatic discovery of causal and temporal ordering relations among states and events, specifically, among molecular and other events in biomedical articles. We have annotated causal and temporal relations in articles on the cell cycle, and we discuss our annotation guidelines and the issue of inter-annotator agreement. We then describe the natural language parsing and the inference system we use to extract these relations. We have created axioms manually for this system, focusing on temporal, causal and aspectual information and we have used semi-automatic means to augment these axioms. We have evaluated the performance of this system, and the results are promising.
06 Mar 2009	Andreas Maletti	Minimizing Deterministic Weighted Tree Automata Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Weighted tree automata are equivalent to weighted tree grammars, which can be used, for example, to easily model weighted context-free grammars. In constrast to context-free grammars, tree automata work directly on a tree representation and not on strings. We will introduce weighted tree automata and review the important results on minimization of them. For example, it is known that deterministic devices over commutative semifields (commutative semirings with multiplicative inverses) can be effectively minimized. In the main part of the talk, we present the first efficient algorithm for this minimization. If the operations can be performed in constant time, then our algorithm constructs an equivalent minimal (with respect to the number of states) deterministic automaton in time linear in the maximal rank of the input symbols, the number of (useful) transitions, and the number of states of the input automaton.
27 Feb 2009	Carlos Busso (USC)	Multimodal Processing of Human Behavior in Intelligent Instrumented Spaces: A Focus on Expressive Human Communication Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Advances in technologies to capture and process multimedia signals are enabling new opportunities for understanding and modeling human behavior, and designing new human-centered applications. Intelligent environments equipped with a range of audio-visual sensors provide suitable means for automatically monitoring and tracking the behavior, strategies and engagement of the participants in multiperson interactions such as meetings, at various levels of interest. We describe a case study of a "Smartroom" being developed at USC in which high-level features are calculated from active speaker segmentations, automatically annotated by our system, to infer the interaction dynamics between the participants. The results show that it is possible to accurately estimate in real-time not only the flow of the interaction, but also how dominant and engaged each participant was during the discussion.Additionally, we describe analysis of human expressive behavior that can be afforded by such audio-visual data. We describe an analysis of the interrelation between facial gestures and speech using a multimodal approach. Using a controlled setting, motion capture technology was used to simultaneously acquire speech and detailed facial information. Our results indicate that the verbal and non-verbal channels of human communication are internally and intricately connected. The interplay is observed across the different communication channels such as various aspects of speech, facial expressions, and movements of the hands, head and body, and is greatly affected by the linguistic and emotional content of the message being communicated. As a result of the analysis, applications in automatic emotion recognition and synthesis of expressive communication are presented.[This research was supported in part by funds from the NSF, NIH, and the Department of the Army]
13 Feb 2009	Joseph Tepperman (Signal Analysis and Interpretation Laboratory, USC)	Estimating Subjective Judgments of Speech on Multiple Levels Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: People make explicit subjective judgments of speech when doing things like tutoring students in a foreign language, or testing a child's reading skills. On what do we base these judgments, and how can they be made automatically? The "quality" of speech does not exist on any one scale alone, and is not limited strictly to pronunciation - it is manifested through a multiplicity of simultaneous and interacting cues of various sizes. In this talk I'll discuss modeling strategies for categorical pronunciation on several scales, cognitive models for estimating student knowledge demonstrated through speech, and applications in the fields of education and speech synthesis.
30 Jan 2009	Kevin Knight	Sixty Years of Statistical Machine Translation Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: This high-level survey will describe the results of statistical machine translation (SMT) research since 1948. Part of the survey will cover the explosion of work in the past few years that has resulted from intense interest on the part of scientists, funders, and industry. We will also examine the roots of SMT in World War II decipherment activities. Some of the concepts from that era have become core to the field, while others still remain to be picked up.
23 Jan 2009	Roger Levy (UCSD)	Noise and memory in rational human language comprehension Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Considering the adversity of the conditions under which linguistic communication takes place in everyday life---ambiguity of the signal, environmental competition for our attention, speaker error, our limited memory, and so forth---it is perhaps remarkable that we are as successful at it as we are. Perhaps the leading explanation of this success is that (a) the linguistic signal is redundant, (b) diverse information sources are generally available that can help us obtain infer the intended message (or something close enough) when comprehending an utterance, and (c) we use these diverse information sources very quickly and to the fullest extent possible. This explanation can be thought of as treating language comprehension as a rational, evidential process. Nevertheless, there are number of prominent phenomena reported in the sentence processing literature that remain clear puzzles for the rational approach. In this talk I address three such phenomena, whose common underlying thread is an apparent failure to use information available in a sentence appropriately in global or incremental inferences about the correct interpretation of a sentence. I argue that the apparent puzzle posed by these phenomena for models of rational sentence comprehension may derive from the failure of existing models to appropriately account for the environmental and cognitive constraints---namely, noisy input and limited memory---under which comprehension takes place. I present two new probabilistic models of language comprehension under noisy input and limited memory, and show that these models lead to solutions to the above puzzles. More generally, these models suggest how appropriately accounting for environmental and cognitive constraints can lead to a more nuanced and ultimately more satisfactory picture of key aspects of human cognition.
17 Dec 2008	Liang Huang (UPenn => Google Research)	Tree-based and Forest-based Translation Time: 3:00 pm - 4:00 pm Location: 4th Floor CR Abstract: What is in common, and what is different, between translating from English to Chinese and compiling C++ into machine code?In this talk I will first introduce a tree-based (aka syntax-directed) paradigm for machine translation, inspired by both human translators and compilers. In this paradigm, a source language sentence is first parsed into a syntactic tree, which is then recursively converted into a target language sentence via tree-to-string transformation rules. Since the translation process is driven by the syntax, this approach resembles the classical "syntax-directed translation" method in compiling theory.However, natural languages are crucially different from programming languages in that they are fundamentally ambiguous. So we don't (and will probably never) have perfect parsers, and parsing errors adversely affect translation quality. To alleviate this problem, an obvious idea is to use the top-k parses, rather than a single 1-best, but this only helps a little bit due to the limited scope of the k-best list. We instead propose a "forest-based approach", which translates a packed forest encoding exponentially many parses in a compact (polynomial) space by sharing common subtrees. Large-scale experiments showed very significant improvements (over the 1-best baseline) in terms of translation quality, which outperforms the best reported systems to date. More interestingly, translating a forest of millions of trees is even faster than translating on top-30 individual trees thanks to dynamic programming.This talk includes joint work with Kevin Knight and Aravind Joshi (first part), and with Haitao Mi and Qun Liu (second/third parts).Short Bio: Liang Huang recently completed his PhD study at the University of Pennsylvania, co-supervised by Aravind Joshi and Kevin Knight (USC/ISI). He is mainly interested in the theoretical aspects of computational linguistics, in particular, efficient algorithms in parsing and machine translation, generic dynamic programming, and formal properties of synchronous grammars. His thesis develops a set of "forest-based methods" that have been applied to many problems in NLP including k-best parsing, forest rescoring and reranking, and forest-based translation. His awards include an Outstanding Paper Award at ACL 2008, and a University Teaching Award at Penn in 2005. http://www.cis.upenn.edu/~lhuang3/
07 Nov 2008	Daniel Marcu	The best/worst Speech Recognition, Language Modeling, and Machine Translation ideas Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: A group of 60 researchers have been asked to comment on what they perceive to be- the most important contributions in the fields of speech recognition, language modeling, and machine translation;- past ideas that failed to lead to substantial improvements;- and contributions that are most likely to have a material impact in the future. This talk summarizes the perceptions and trends identified in the collection of answers provided by the researchers.
17 Oct 2008	Jens Voeckler	Parsing XRS with(out) regular expressions Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: If you ever needed to extract information, e.g. LHS, RHS words, features, etc., from an XRS rules, this talk is for you. Over the years, a variety of regular expressions have been used to obtain data from XRS rules. However, in light of recent pipeline efforts, the copy-n-paste culture lead to expressions that were sometimes too complex for the task at hand, unnecessarily slowing down processing steps, or too trivial to work correctly on boundary cases. A unified effort by Steve, David, Wei, Michael and Jens culminated in the NLPRules module for Perl. While the talk centers on the Perl module, and some surprising benchmark results, any language supporting libpcre (perl compatible regular expression) will benefit from the insights, and from knowing the right regular expression for the task at hand.
14 Oct 2008	Victoria Fossum + David Chiang	Practice talks for AMTA/EMNLP Time: 3:00 pm - 4:15 pm Location: 11 Large Abstract: Using Bilingual Chinese-English Word Alignments to Resolve PP-Attachment Ambiguity in English (practice talk for AMTA)Errors in English parse trees impact the quality of syntax-based MT systems trained using those parses. Frequent sources of error for English parsers include PP-attachment ambiguity, NP-bracketing ambiguity, and coordination ambiguity. Not all ambiguities are preserved across languages. We examine a common type of ambiguity in English that is not preserved in Chinese: given a sequenc "VP NP PP", should the PP be attached to the main verb, or to the object noun phrase? We present a discriminative method for exploiting bilingual Chinese-English word alignments to resolve this ambiguity in English. On a heldout test set of Chinese-English parallel sentences, our method achieves 86.3% accuracy on this PP-attachment disambiguation task, an improvement of 4% over the accuracy of the baseline Collins parser (82.3%).Online Large-Margin Training of Syntactic and Structural Translation Features (practice talk for EMNLP)Minimum-error-rate training (MERT) is a bottleneck for current development in statistical machine translation because it is limited in the number of weights it can reliably optimize. Building on the work of Watanabe et al., we explore the use of the MIRA algorithm of Crammer et al. as an alternative to MERT. We first show that by parallel processing and exploiting more of the parse forest, we can obtain results using MIRA that match or surpass MERT in terms of both translation quality and computational cost. We then test the method on two classes of features that address deficiencies in the Hiero hierarchical phrase based model: first, we simultaneously train a large number of Marton and Resnikx92s soft syntactic constraints, and, second, we introduce a novel structural distortion model. In both cases we obtain significant improvements in translation performance. Optimizing them in combination, for a total of 56 feature weights, we improve performance by 2.6 Bleu on a subset of the NIST 2006 Arabic-English evaluation data.(Joint work with Yuval Marton and Philip Resnik)
10 Oct 2008	Sujith Ravi + Steve DeNeefe	Practice talks for AMTA/EMNLP Time: 3:00 pm - 4:15 pm Location: 11 Large Abstract: Automatic Prediction of Parser Accuracy (practice talk for EMNLP)Statistical parsers have become increasingly accurate, to the point where they are useful in many natural language applications. However, estimating parsing accuracy on a wide variety of domains and genres is still a challenge in the absence of gold-standard parse trees.We propose a technique that automatically takes into account certain characteristics of the domains of interest, and accurately predicts parser performance on data from these new domains. As a result, we have a cheap (no annotation involved) and effective recipe for measuring the performance of a statistical parser on any given domain.(Joint work with Kevin Knight and Radu Soricut)Overcoming Vocabulary Sparsity in MT Using Lattices (practice talk for AMTA)Source languages with complex word formation rules present a challenge for statistical machine translation (SMT). In this paper, we take on three facets of this challenge: (1) common stems are fragmented into many different forms in training data, (2) rare and unknown words are frequent in test data, and (3) spelling variation creates additional sparseness problems. We present a novel, lightweight technique for dealing with this fragmentation, based on bilingual data, and we also present a combination of linguistic and statistical techniques for dealing with rare and unknown words. Taking these techniques together, we demonstrate +1.3 and +1.6 BLEU increases on top of strong baselines for Arabic-English machine translation.(Joint work with Ulf Hermjakob and Kevin Knight)
26 Sep 2008	Eugene Charniak (Brown University)	EM Works for Pronoun-Anaphora Resolution Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: EM (the Expectation Maximization Algorithm) is a well known technique for unsupervised learning (where one does not have any hand labeled solutions available, but instead one must learn from the raw text). Unfortunately EM is known to fail to find good solutions in many (most?) applications on which it is tried. In this talk we present some recent work on using EM to learn how to resolve pronoun-anaphora: determining that "the dog" is the antecedent of "he" and "his" in "When Sally fed the dog he wagged his tail". For this application EM works strikingly well, determining tens of thousands of parameters and resulting in a program that probably produces state of the art results, although because this is preliminary work, and pronoun-anaphora has no standard evaluation metrics, this is just a guess.About the Speaker:Eugene Charniak is Professor of Computer Science. and Cognitive Science at Brown University. He received an A.B. degree in Physics from University of Chicago and a Ph.D. from M.I.T. in Computer Science. He has published four books: Computational Semantics, with Yorick Wilks (1976); Artificial Intelligence Programming (now in a second edition) with Chris Riesbeck, Drew McDermott, and James Meehan (1980, 1987); Introduction to Artificial Intelligence with Drew McDermott (1985); and Statistical Language Learning (1993). He is a Fellow of the American Association of Artificial Intelligence and was previously a Councilor of the organization. His research has always been in the area of language understanding or technologies which relate to it, such as knowledge representation, reasoning under uncertainty, and learning. Over the last few years he has been interested in statistical techniques for language understanding. His research in this area has included work in the subareas of part-of-speech tagging, probabilistic context-free grammar induction, and, more recently, syntactic disambiguation through word statistics, efficient syntactic parsing, and lexical resource acquisition through statistical means.
19 Sep 2008	Fei Sha (USC)	Large margin based parameter estimation for hidden Markov models Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: In many application domains, we face the task of characterizing the distribution of continuous random variables. For instance, in automatic speech recognition (ASR), these variables are acoustic properties of speech signals. For such tasks, Gaussian mixture models (GMMs) are widely used as an very effective density estimator. Particularly, in the context of ASR, they are embedded in continuous-density hidden Markov models (CD-HMMs) to yield emission probabilities, i.e., the likelihoods of acoustic observations conditioned on hidden states such as phonemes. Meanwhile, the transition probabilities in CD-HMMs attempt to capture temporal properties of speech signals. Similar modeling choices arise in other applications, for instance, in activity recognition.Various techniques have been developed to estimate the parameters of CD-HMMs. In particular, discriminative techniques such as conditional maximum likelihood and minimum classification error have attracted significant research attention. When carefully and skillfully implemented, they often lead to lower error rates (in speech recognition) than traditional techniques of maximum likelihood estimation.In this talk, I will describe a new discriminative technique that is based on the principle of large margin, a key framework in many machine learning algorithms including support vector machines and boosting. The new technique differs from previous discriminative methods for ASR in the goal of margin maximization. In particular, in our large margin training of CD-HMMs, model parameters are optimized to maximize the gap (or the margin) between correct and incorrect classifications. I will present an extensive empirical evaluation of our approach on two benchmark problems in speech recognition: phonetic classification and recognition on the TIMIT speech database. In both tasks, large margin systems obtain significantly better performance than systems trained by maximum likelihood estimation or competing discriminative frameworks. An in-depth analysis also reveals someinteresting features of our approach, which contribute to the superior performance.Towards the end of the talk, I will discuss briefly the connection of our work to the structured prediction problems in the machine learning community. I will also discuss the future direction of this line of work and other application potentials.
22 Aug 2008	Amittai Axelrod (UW)	Intern Final Talk: Structural constraints for efficient decoding. Time: 3:45 pm - 4:15 pm Location: 11 Large Abstract: String-to-tree machine translation decoders are effective but very slow, especially compared to other decoding approaches. We explore various methods to identify constraints on the search space, with the aim of improving the efficiency of the syntax-based decoder.
22 Aug 2008	Catalin Tirnauca (Univ. Rovira i Virgili)	Intern Final Talk: On the Consistency of Probabilistic Context-Free Grammars Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: Probabilistic context-free grammars can describe probability distributions over strings, i.e., the sum of probabilities of all generated strings is 1.This condition is often called consistency. It has applications in fields of natural language processing such as probabilistic parsing (disambiguate by picking the parse with the highest score), or speech recognition (rank hypotheses returned by a speech recognizer). The talk is a survey of some of the previous results. We investigate how we can determine if a probabilistic context-free grammar is consistent, and if such a test can always be done. Also, we study a method, namely normalization, which guarantees consistent probabilistic context-free grammars. Moreover, we mention briefly some techniques that train probabilistic context-free grammars and guarantee consistency.
20 Aug 2008	John DeNero (Berkeley)	Intern Final Talk: Minimum Risk Decoding over Forests Time: 3:45 pm - 4:15 pm Location: 11 Small Abstract: Minimum Bayes risk (MBR) decoding improves the output ofmachine translation systems by selecting a translation that matches alarge proportion of the k-best hypotheses of a system. We extend thisidea to apply to packed forests by selecting an output sentence thatmatches a large proportion of all hypotheses in the pruned forest ofderivations from a syntax-based translation system.
20 Aug 2008	Kyle Gorman (Penn)	Intern Final Talk: The Entropy of English given French Time: 3:00 pm - 3:30 pm Location: 11 Small Abstract: The fundamental task in statistical machine translation (SMT) is tocharacterize the probability of a target sentence given its sourcetranslation; for translating French as English, P(f \| e). By applyingBayes Rule, we derive the fundamental theorem of SMT: e maximizingP(e) P(f \| e). Advances in SMT come from improving estimations ofthese two terms, or from more efficient ways of searching for optimalsolutions (Brown et al. 1993). In the case of language modeling, Shannon (1949) and Brown et al.(1992) identified upper and lower bounds for the per-character entropyof English, H(e), for humans and machines, respectively. We ask thesame question for SMT, H(e \| f), comparing the results for humantranslators and a simple machine baseline based on IBM Model 1. Thesenumbers are the upper and lower bounds for SMT systems trained onparallel data.
18 Jul 2008	Sujith Ravi	Deciphering Ciphers Optimally Using Only Minimal Knowledge of the Source Language Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: I will be talking about deciphering letter-substitution ciphers optimally using only minimal knowledge (bigrams, trigrams, etc.) of the source language, instead of relying on large look-up dictionaries. We also plan to show how our empirical results compare with Shannon's predictions on the equivocation curves and unicity distance measure.
11 Jul 2008	Jonathan May	Thesis Proposal Practice Talk: A Weighted Tree Transducer Toolkit for Syntactic Natural Language Processing Models Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Solutions for many natural language processing problems such as speech recognition, transliteration, and translation have been described as weighted finite-state transducer cascades. The transducer formalism is very useful for researchers, not only for its ability to expose the deep similarities between seemingly disparate models, but also because expressing models in this formalism allows for rapid implementation of real, data-driven systems. Finite-state toolkits can interpret and process transducer chains using generic algorithms and many real-world systems have been built using these toolkits. Current research in NLP makes use of syntax-rich models that are poorly suited to extant transducer toolkits, which process linear input and output. Tree transducers can handle these models, and a weighted tree transducer toolkit with appropriate generic algorithms will lead to the sort of gains in syntax-based modeling that were achieved with string transducer toolkits. In this thesis proposal practice talk I will briefly trace the history of finite-state transducers and automata as they relate to natural language processing and the evolution of formalisms and the toolkits that support them, leading up to motivation for the design and creation of Tiburon, the toolkit referenced in this talk's title. I will describe previous, current, and future work on Tiburon's algorithms and the effectiveness of both algorithms and software at cleanly representing syntax-based NLP models from the literature and at constructing and evaluating novel models.
13 Jun 2008	Ellen Riloff	Effective Information Extraction with Relevant Regions and Semantic Affinity Patterns Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: I will briefly overview the landscape of event-oriented informationextraction (IE) systems and explain why it is especially challengingto learn IE systems without annotated training data. Then I willdescribe one attempt to do so by decoupling the tasks of findingrelevant text regions and applying extraction patterns. First, aself-trained relevant sentence classifier identifies relevant regionsin documents. Second, a "semantic affinity" measure identifiesdomain-relevant extraction patterns. We further distinguish between"primary" patterns and "secondary" patterns and apply the patternsselectively in the relevant regions. This approach is weaklysupervised, requiring only a few seed patterns plus relevant andirrelevant (but unannotated) documents for training. The resulting IEsystem achieves reasonably good performance, despite the fact that therelevant region classifier leaves a lot to be desired.
06 Jun 2008	Tom Murray (USC)	Knowledge as a Constraint on Uncertainty for Unsupervised Classification Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: This talk investigates the use of domain knowledge to constrain and improve the unsupervised learning of a classifier, by placing limits or biases on the possible hypotheses for each input. Theoretically, we view the contribution of the knowledge source as a reduction in the uncertainty of the model's decisions, quantified by the resulting conditional entropy of the label distribution given the input corpus. Evaluating on the simple case of an unsupervised HMM tagger, we find surprising levels of improvement from little knowledge, with more stable and efficient training convergence and label assignment, and a high degree of correlation between classification entropy and model performance. We conclude that, while we should always seek better generic models and techniques, for applications in an unsupervised setting, knowledge may still be key.
30 May 2008	Steve DeNeefe	BLEU Sway Issues: one way to get statistical significance, two ways to get a better score, and three ways to thwart them Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: BLEU the de facto standard for evaluation and development of statistical machine translation systems. We describe three real-world situations involving comparisons between different versions of the same systems where one can obtain improvements in BLEU scores that are questionable or even absurd. We propose a very conservative modification to BLEU that addresses these issues while improving correlation with human judgements, then explore some deeper modifications that alleviate the problems further.
16 May 2008	David Newman (UCI)	Theory and Applications of Topic Modeling Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Topic models, a class of Bayesian probabilistic models for discretedata, have recently gained popularity in applications ranging fromdocument modeling to computer vision. Since the introduction ofLatent Dirichlet Allocation (LDA) in 2003, there have been numerousextensions to this archetype. I will review the theory behind LDA,and discuss subsequent models, including (some of): Correlated TopicModel, Dynamic Topic Model, Hierarchical Topic Model, Special WordsTopic Model, Hierarchical Dirichlet Process Model, Pachinko AllocationMachine, Topics and Syntax Model, Bi-LDA, Author-Topic Model,Supervised Topic Model, Spatial LDA, etc.
09 May 2008	John DeNero (Berkeley)	Inference in phrase alignment models Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Models that align phrases instead of words offer anappealing alternative to the standard relative frequency estimates ofphrase translation probabilities. But, while some effective wordalignment models (Model 1, Model 2 & HMM) can be estimated tractablywith EM, phrase alignment models cannot. I'll talk about how to showthat estimation and inference under these models is intractable.Then, I'll present two useful approximation techniques.First, I'll talk about how to cast phrase alignment search as aninteger linear programming (ILP) problem and find the optimalalignment reliably and quickly with off-the-shelf ILP software. Someapplications of this technique include training phrase alignmentmodels and interpreting the output of word alignment models.Second, we'll look at how to estimate translation probabilities undera phrase alignment model using a Gibbs sampling procedure. Thesampler has some nice asymptotic convergence properties and also seemsto produce good results in practice. I'll walk through the differentmodels we've trained and how they performed. Time permitting, I'll also talk about some of the ways in which wecould potentially extend this work to syntactic MT.
02 May 2008	Zornitsa Kozareva	Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: We present a novel approach to weakly supervised semantic class learning fromthe web, using a single powerful hyponym pattern combined with graphstructures, which capture two properties associated with pattern-basedextractions: popularity and productivity. Intuitively, a candidate is popularif it was discovered many times by other instances in the hyponym pattern. Acandidate is productive if it frequently leads to the discovery of otherinstances. Together, these two measures capture not only frequency ofoccurrence, but also cross-checking that the candidate occurs both near theclass name and near other class members. We developed two algorithms that beginwith just a class name and one seed instance and then automatically generate aranked list of new class instances. We conducted experiments on four semanticclasses and consistently achieved high accuracies.
25 Apr 2008	David Chiang	Tutorial: Randomized data structures for large statistical NLP models Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Randomized algorithms are those which use randomness to achieve efficient performance with a bounded probability of error; typically, the bound is adjustable and the performance depends on the bound. Randomized data structures, likewise, use randomness to achieve efficient storage with a bounded probability of error. I will give an overview of the use of such data structures, namely, Bloom filters and "Bloomier" filters, for storing very large n-gram language models, and will discuss possibilities for using randomized data structures for other purposes as well.
18 Apr 2008	Rahul Bhagat	Learning Paraphrases from Text Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Paraphrases are textual expressions that convey the same meaning using different words. They capture variability, which is a common phenomenon in language. Given this, paraphrases have been shown to be useful in many natural language applications like Question-Answering, Machine Translation, Summarization and Information Retrieval. In this talk, I'll discuss the phenomenon paraphrasing and focus on methods for automatically acquiring paraphrases from text.
11 Apr 2008	Jonathan May	Syntactic Re-Alignment Models for Machine Translation Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: We present a method for improving word alignment for statistical syntax-based machine translation that employs a syntactically informed alignment model closer to the translation model than commonly-used word alignment models. This leads to extraction of more useful linguistic patterns and improved BLEU scores on translation experiments in Chinese and Arabic.
04 Apr 2008	Ulf Hermjakob	Name Translation in Statistical Machine Translation: Learning When to Transliterate Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: We present a method to transliterate names in the framework ofend-to-end statistical machine translation. The system is trained tolearn when to transliterate. For Arabic to English MT, we developed and trained a transliterator on abitext of 7 million sentences and Google's English terabyte ngrams andachieved better name translation accuracy than 3 out of 4 professionaltranslators. The talk also includes a discussion of challenges in nametranslation evaluation.
25 Mar 2008	Jason Riesa	Tutorial on Arabic Orthography Time: 10:30 am - 11:30 am Location: 11 Large Abstract: This tutorial is intended to provide attendees with working knowledge of the Arabic writing system. No previous experience with Arabic is required. At the end of this tutorial you should be able to read and segment individual Arabic characters, read common ligatures, identify possible affixes on stems, and understand the various lexical normalizations used in Arabic text preprocessing. The focus will be on the formal writing system in printed text for Modern Standard Arabic, although handwriting will be briefly discussed.
18 Jan 2008	Victoria Fossum	Using Syntax to Improve Word Alignment Precision for Syntactic Machine Translation Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Automatically word-aligning a parallel bitext in the source and target languages constitutes the first stage of most statistical machine translation pipelines. Automatic word alignment is error-prone, and produces many incorrect links. Incorrect links that violate syntactic correspondences interfere with the extraction of string-to-tree transducer rules for syntactic machine translation. We present an algorithm for identifying and deleting incorrect word alignment links, using features of the extracted rules. We obtain gains in both alignment quality and translation quality in Chinese-English and Arabic-English translation experiments, relative to a GIZA++ union baseline.
11 Jan 2008	Kevin Knight	How to Make EM Do What You Want Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: I'll talk about some unsupervised learning experiments -- how I was satisfied with the initial results, how I became very dissatisfied, and how I became (somewhat) satisified again.
14 Dec 2007	Marieke van Erp	MITCH: Mining for Information in Texts from the Cultural Heritage Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Naturalis, the Dutch National Museum of Natural History, harbours one of the largest treasures of the world: the key specimens of millions of animals found throughout the world through centuries of biological expeditions. While the depot where the animals are stored is a technical marvel, Noah's ark of the 21st century, it is hard to search through it. Research in taxonomy, the evolution of life and biodiversity revolves around the specimens in the depot. The main key to accessing the depot are(mostly) handwritten expedition logs and registration books, which are currently being photographed and keyed in to be stored in searchable digital archives. Such digital logs already enable a kind of "Biogoogle" search, but actual research questions are more complicated ("how did this kind of frog develop over the last century in the Amazon rainforests?"), and demand more intelligent handling. This is where the MITCH project comes in.The goal of MITCH is to turn the field logs and registration books into a populated semantic network, in which concepts such as animal specimens are related to all other concepts that define them: where, when, under which circumstances and by whom were they found, who described them first in the academic literature, who prepared them for storage in the Naturalis depot, which registration number was assigned to them, etc. This means that all textual descriptions of a specimen need to be parsed into exactly these concepts and their relations. All of this needs to be done at a scale that goes far beyond the human capacity, as tens of thousands of digitized but unanalysed textual records are waiting for semantic analysis. This necessitates the use of state-of-the-art machine learning methods that learn from examples automatically. The project addresses its goals on three levels. The basic level is the development and application of automatic data cleaning and markup tools. On top of this, semi-structured textual material such as fieldbook logs and scientific papers, are semi-automatically converted to a searchable knowledge base. Search results are visualised by displaying maps and specimen photos. The conversion phase assumes the active intervention of domain experts, such as collection managers, to correct and steer the automatic extraction procedure. At the top level, information resources are cross-linked using a domain ontology, populating a semantic network that can be hooked up to any other standardised cultural heritage knowledge base or to a search engine.
02 Nov 2007	Bill Rounds (Michigan and Stanford)	Constructions, Constraints, Transducers, and TAGs: A unifying view through Feature Logic Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: The value of mathematical formalisms for speech recognition, language generation, and machine translation has long been recognized. Not so much work, though, has been spent reconciling these formalisms with linguistic theories. In this talk I'll propose a theoretical descriptive mechanism based on feature logic, which is central to construction and constraint-based linguistic theories like construction grammar and HPSG, and which can be used to view tree transducers and tree-adjoining grammars as giving rise to a construction-based framework.
19 Oct 2007	Slav Petrov (Berkeley)	Learning and Inference for Hierarchically Split PCFGs Time: 10:30 am - 11:30 am Location: 11 Large Abstract: Treebank parsing can be seen as the search for an optimally refinedgrammar consistent with a coarse training treebank. We describe amethod in which a minimal grammar is hierarchically refined using EMto give accurate, compact grammars. The resulting grammars areextremely compact compared to other high-performance parsers, yet theparser gives the best published accuracies on several languages, aswell as the best generative parsing numbers in English. In addition,we give an associated coarse-to-fine inference scheme which vastlyimproves inference time with no loss in test set accuracy.
17 Oct 2007	Jon Patrick (Univ. of Sydney)	Enhancement Technologies for ICU Information Systems Time: 3:30 pm - 4:30 pm Location: 11 Large Abstract: The School of Information Technologies at the University of Sydney hashad a 3 year partnership with the Intensive Care Unit at the RoyalPrince Alfred Hospital, Sydney. In that time they have managed 8 jointprojects aimed at producing software solutions that enhanceproductivity in the Unit and in some cases enabled entirely newfunctionalities in their information systems. The principle motivationfor the research is the processing of the narratives in clinical notesbut concomitant problems in information systems have also been tackledand the combination of the two disciplines have led to the two relatedprocessing systems to be described in this presentation.- Ward Rounds Information Systems (WRIS) & Handovers -The WRIS is designed to support the work of all clinical staff intheir ward rounds activities. The system, when activated,automatically populates from the resident clinical database a proforma report with the most recent relevant data about the patient,such as vital signs, pathology reports, and other diagnosticmeasurements, presented as a web page. The clinical staff then writetheir progress notes into the web page which converts the text toSNOMED CT codes and other relevant concepts and entities. Theclinician is given the opportunity to change any analyses done by theprocessor. This clinician approved data is loaded to the patientrecord. The essential elements of this system, that is computing anextract of the patient record, accepting narrative input, andanalysing the text for coding, is a productivity gain of itself, butmore importantly, also constitutes the beginning of a hospital wideHandovers System for use throughout each step in the patientjourney. This system is being tested at the RPAH ICU in readiness forward usage. The impact of this system in improving the quality andsafety of handovers has the potential to be very significant.- Clinical Data Analytics Language (CDAL) -General purpose access to data from clinical information systems,beyond retrieval for point of care work, is needed for many aspects ofthe hospital's work particularly for clinical research, logistics &operational planning, and auditing patient safety. Most currentclinical systems only provide access to data identified in standardreports with no flexibility to make ad hoc enquiries or to pursue newdirections of enquiry. The clinical data analytics language developedenables the expression of any question that can be answered from thedata in the database in a restricted natural language. A prototype ofthe language has been developed for the CareVue information systemused in the ICU at the Royal Prince Alfred Hospital. It provides forthe use of local medical dialects, SNOMED CT terminology including allforms of collective expressions in SNOMED (e.g. infectious diseases),specification of patient groups, a variety of statistical functions,and constraints over any medical variable, Time, and Location. CDAL isgeneral in that it can be bolted on to any clinical information systemand is applicable to any clinical specialisation.
12 Oct 2007	David Talbot (Edinburgh)	Scalable Language Modeling: Breaking the Curse of Dimensionality Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Randomized data structures can help us scale discrete models encountered in NLP. This talk will describe their use in language modeling and present some more general related results.N-gram language models are fundamental to speech recognition and machine translation. Unfortunately, the n-gram parameter space grows exponentially with the dimension of the feature vector. I will describe how randomization can be used to remove the space-dependency of such models on the a priori parameter space.The novel extensions of the Bloom filter that I will present are able to take advantage of the entropy of the distribution of values assigned to feature vectors to save space in a discrete statistical model. I will review some results applying these models to language modeling in machine translation and relate their space-requirements to a novel lower bound on the general problem of querying a map of key/value pairs.No prior knowledge of randomized data structures will be assumed.
05 Oct 2007	Sujith Ravi	Will this parser work with my data? - Predicting Parser Accuracy without Gold-Standard information Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: There are many tools available to the NLP community for Natural Language Parsing, (i.e converting a raw sentence in to a parse-tree). NLP researchers usually use some "off-the-shelf" parser which has been trained on the Wall Street Journal (WSJ) corpora and then apply the WSJ-trained parser to their data. This works in many cases, especially for systems which use data from WSJ or similar corpora. However, in real life applications, the data may be compiled from many different sources and span different genres, and may not be similar to the WSJ corpora in terms of sentence structure, etc . A particular parser might parse well on some corpora and not so well on others. Choosing the right parser for your data may have an impact on the performance of the NLP system as a whole. But in order to measure the accuracy of any parser for a given corpus, we require a set of gold-standard parse trees corresponding to the sentences within the corpus. Generating gold-standard set takes a lot of manual work and in many real-life applications, it is not a feasible task to generate gold-standard parses for large corpora. We attempted to build a system which can predict the accuracy (in terms of f-measure value) of the Charniak parser (a popular parsing tool) on any given sentence corpus. Without using any additional information (i.e gold std. parses), our system predicts "how accurately the Charniak parser could parse the given corpus". In order to evaluate our system's predictions on a particular corpus, we compute the "Correlation" measure between the "actual accuracies (using Gold-standard)" vs. "predicted accuracies (from our system)" for the given corpus. We tested our system on different corpora and using different methods and will present these results.
29 Aug 2007	Carmen Heger (Dresden) Michael Bloodgood (Delaware)	Summer Intern Presentations: Composition of Tree Transducers AND Using the Perceptron Algorithm to Tune Large Numbers of Feature Weights for Syntax-Based Statistical Machine Translation Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Composition of Tree TransducersSince finite state (string) transducers are not expressive enough for many NLPapplications, computational linguistics started to investigate treetransducers for the task of machine translation, for example. Quite somesuccessful work has been done on generalizing results from string transducersto tree transducers. But when it comes to composition results are notsatisfying because generally tree transducers are not closed undercomposition. Still we think that most of the tree transducers used in NLP arecomposable and that is why we defined the problem of the composition for twoindividual transducers instead of the whole class. During the summer westarted with linear nondeleting tree transducers with epsilon rules andapproached an algorithm to decide for two such transducers whether theircomposition is again in the same class.Using the Perceptron Algorithm to Tune Large Numbers of Feature Weights for Syntax-Based Statistical Machine Translation Current state-of-the-art syntax-based statistical machine translationsystems produce many candidate translations out of which the output translationis selected by taking the argmax over all candidates i of <w,f_i> where w is aweight vector and f_i is a vector of the feature values for candidate i. Thefeatures used by the system and their corresponding weights have a major impacton a system's performance. Currently, Minimum Error Rate Training (MERT) is used totune the weights of the features. A drawback of this is that it isn't tractableto tune large numbers of feature weights. I will discuss using the perceptronalgorithm to tune feature weights for statistical machine translation. If I get interestingresults before my talk, I may also dicsuss new classes of features (potentially very largenumbers of features) that can be used for improving MT performance.
24 Aug 2007	Wei Ho (Princeton) Jennifer Gillenwater (Rice)	Summer Intern Presentations: Noisy Language Models AND Context for Syntax-Based Translation Rules Time: 3:30 pm - 5:00 pm Location: 11 Large Abstract: Noisy Language ModelsThe language models used in statistical machine translation are oftenquite large, requiring significant memory and sometimes pre-processingin order to be utilized effectively. It would be desirable to have amore compact representations of language models while minimizing theimpact on translation quality. Various quantization methods and lossystorage of language models will be presented.Context for Syntax-Based Translation RulesThe rules that a translation system employs should be applicable inmany contexts. This ensures that a rich language is expressible witha minimum number of rules. However, when rules that are applicable intoo many contexts are combined, they result in nonsensicaltranslations. How can we keep rules general but constrain the contextof their use? This summer we explored the approach of constrainingthe context by conditioning on various neighboring elements of eachrule.
16 Aug 2007	Anoop Sarkar (Simon Fraser)	Extensions of Regular Tree Grammars and their relation to Tree Adjoining Grammars Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: There is a hierarchy of generative devices that generate trees:starting with regular tree languages (RTLs), which are containedwithin context-free tree languages (CFTLs), and so on. The stringyield of the RTLs is exactly the set of Context-Free Languages,while the yield of the CFTLs is exactly the set of Indexed Languages.In this talk we introduce Adjoining Tree Languages (ATLs) which sitin between RTLs and CFTLs.The yield of ATGs is exactly the set of Tree-Adjoining Languages.Just like RTGs are stronger than CFGs, ATGs are stronger than TAGs.In addition we will show that the ATG notation simplifies many ofthe foundational proofs for TAGs including proofs of the closureproperties. In particular, ATLs do not use adjunction constraints,and thus are much easier to understand than TAGs.We compare ATGs with previously proposed simplifications of CFTGs,called monadic simple CFTGs, which also have been shown to be weaklyequivalent to TAG (i.e. they generate the same set of stringlanguages). We consider the question of whether these two weaklyequivalent formalisms are strongly equivalent (i.e. generate exactlythe same set of tree languages).Finally, we will show that the standard definition used forprobabilistic TAG is (surprisingly) very different from the naturaldefinition of probabilistic ATL. Using an example of PP-attachmentambiguity we show that the two probabilistic models are differentfrom each other.About the speaker:Anoop Sarkar is an assistant professor in the Department of ComputingScience at Simon Fraser University. He received his PhD in 2002from the Department of Computer and Information Science at theUniversity of Pennsylvania, with Prof. Aravind Joshi as his advisor.His research work is on machine learning, especially semi-supervisedlearning, applied to the processing of natural language and stochasticformal grammars. Anoop Sarkar's web-page: http://www.cs.sfu.ca/~anoop
15 Jun 2007	Donghui Feng	Extracting Data Records from Unstructured Biomedical Full Text Time: 11:00 am - 11:30 am Location: 11 Large Abstract: In this paper, we address the problem of extracting data records andtheir attributes from unstructured biomedical full text. There hasbeen little effort reported on this in the research community. Weargue that semantics is important for record extraction orfiner-grained language processing tasks. We derive a data recordtemplate including semantic language models from unstruc-tured textand represent them with a dis-course level Conditional Random Fields(CRF) model. We evaluate the approach from the perspective ofInformation Extrac-tion and achieve significant improvements on systemperformance compared with other baseline systems.
15 Jun 2007	Alex Fraser	Getting the structure right for word alignment: LEAF Time: 10:30 am - 11:00 am Location: 11 Large Abstract: Automatic word alignment is the problem of automatically annotatingparallel text with translational correspondence. Previous generativeword alignment models have made structural assumptions such as the1-to-1, 1-to-N, or phrase-based consecutive word assumptions, whileprevious discriminative models have either made one of theseassumptions directly or used features derived from a generative modelusing one of these assumptions. We present a new generative alignmentmodel which avoids these structural limitations, and show that it iseffective when trained using both unsupervised and semi-supervisedtraining methods. Experiments show strong improvements in wordalignment accuracy and usage of the generated alignments inhierarchical and phrasal SMT systems improves the BLEU score.
08 Jun 2007	Liang-Chih Yu (Cheng Kung U)	Topic Analysis for Psychiatric Document Retrieval (Practice Talk for ACL) Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: Psychiatric document retrieval attempts to help people to efficientlyand effectively locate the consultation documents relevant to theirdepressive problems. Individuals can understand how to alleviate theirsymptoms according to recommendations in the relevant documents. Thiswork proposes the use of high-level topic information extracted fromconsultation documents to improve the precision of retrievalresults. The topic information adopted herein includes negative lifeevents, depressive symptoms and semantic relations between symptoms,which are beneficial for better understanding of users'queries. Experimental results show that the proposed approach achieveshigher precision than the word-based retrieval models, namely thevector space model (VSM) and Okapi model, adopting word-levelinformation alone.About the speaker:Liang-Chih Yu ( http://www.isi.edu/~liangchi) isnow a visiting student in the Information Sciences Institute (ISI) ofUniversity of Southern California (USC). My host advisor is Dr. EduardHovy. I am also a PhD candidate in the Department of Computer Scienceand Information Engineering, National Cheng Kung University, Tainan,Taiwan. My advisor is Dr. Chung-Hsien Wu. My research interestsinclude natural language processing, text mining, informationretrieval, ontology construction, spoken dialogue system.
08 Jun 2007	Jonathan May	Bisimulation Minimisation for Weighted Tree Automata Time: 3:30 pm - 4:00 pm Location: 11 Large Abstract: We describe existing forward and backward bisimulation minimisationalgorithms for nondeterministic automata and extend these algorithmsto weighted tree automata. The extended algorithms, which work for allsemirings, retain the time complexity of their counterparts forunweighted tree automata for additively cancellative semirings, andare only slightly higher (linear instead of logarithmic in the numberof states) on other semirings. We describe the effectiveness of animplementation of these algorithms on a typical task in naturallanguage processing. This is joint work with Johanna Hogberg, Umea University and AndreasMaletti, Technische Universitat Dresden.
01 Jun 2007	Jingbo Zhu	Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: In this paper, we analyze the effect of resampling techniques,including under-sampling and over-sampling used in active learning forword sense disambiguation (WSD). Experimental results show thatunder-sampling causes negative effects on active learning, butover-sampling is a relatively good choice. To alleviate thewithin-class imbalance problem of over-sampling, we propose abootstrap-based over-sampling (BootOS) method that works better thanordinary over-sampling in active learning for WSD. Finally, weinvestigate when to stop active learning, and adopt two strategies,max-confidence and min-error, as stopping conditions for activelearning. According to experimental results, we sug-gest a predictionsolution by considering max-confidence as the upper bound andmin-error as the lower bound for stopping conditions.
01 Jun 2007	Andrew S. Gordon	Generalizing Semantic Role Annotations Across Syntactically Similar Verbs Time: 3:30 pm - 4:00 pm Location: 11 Large Abstract: Large corpora of parsed sentences with semantic role labels (e.g. PropBank)provide training data for use in the creation of high-performance automaticsemantic role labeling systems. Despite the size of these corpora,individual verbs (or rolesets) often have only a handful of instances inthese corpora, and only a fraction of English verbs have even a singleannotation. In this paper, we describe an approach for dealing with thissparse data problem, enabling accurate semantic role labeling for novelverbs (rolesets) with only a single training example. Our approach involvesthe identification of syntactically similar verbs found in PropBank, thealignment of arguments in their corresponding rolesets, and the use of theircorresponding annotations in PropBank as surrogate training data.
25 May 2007	Wei Wang (Language Weaver)	Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: We show that phrase structures in Penn Treebank style parsesare not optimal for syntax-based machine translation. Weexploit a series of binarization methods to restructure thePeen Treebank style trees such that syntactified phrasessmaller than Penn Treebank constituents can be acquired andexploited in translation. We find that by employing the EMalgorithm for determining the binarization of a parse treeamong a set of alternative binarizations gives us the besttranslation result.
18 May 2007	Feng Pan	Computing Semantic Similarity between Skill Statements for Approximate Matching Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: (This will be an extended version of the talk for NAACL-HLT 2007. It'sbased on my summer internship work at IBM T.J. Watson Research Centerlast year.)The project aimed to address the problems encountered when trying tomatch available employees to open job positions, based on skillmatches. Currently, job search applications, like IBM's ProfessionalMarketplace, only find exact matches. A skill affinity computation isdesired to allow searches to be expanded to related/similar skills,and return more potential matches.In this talk, I will explore the problem of computing text similaritybetween verb phrases describing skilled human behavior for the purposeof finding approximate matches. Four parsers (Charniak's parser,Stanford's parser, IBM XSG slot grammar parser, and Lin's MINIPAR) areevaluated on a corpus of skill statements extracted from anenterprise-wide expertise taxonomy. A similarity measure utilizingcommon semantic role features extracted from parse trees was foundsuperior to an information-theoretic measure of similarity andcomparable to the level of human agreement.
11 May 2007	Steve DeNeefe	What Can Syntax-based MT Learn from Phrase-based MT? Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: We compare and contrast the strengths and weaknesses of a syntax-basedmachine translation model with a phrase-based machine translationmodel on several levels. We briefly describe each model, highlightingpoints where they differ. We include a quantitative comparison of thephrase pairs that each model has to work with, as well as the reasonswhy some phrase pairs are not learned by the syntax-based model. Wethen propose improvements to the syntax-based extraction techniques tocapture more phrases. We also compare the translation accuracy forall variations.
04 May 2007	Sheelagh Carpendale (Calgary)	Information Visualization and Collaboration Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Consider Donald Norman's quote, "The power of the unaided mind ishighly overrated. Without external aids, memory, thought, andreasoning are all constrained. But human intelligence is highlyflexible and adaptive, superb at inventing procedures and objects thatovercome its own limits. The real powers come from devising externalaids that enhance cognitive abilities." (Norman, 1993) Common methodsfor externalization include making sketches on whatever happens to behandy -- paper napkins, program margins, etc. -- and/or finding acolleague or two to discuss the problem with. It would seem then, thatvisualization and collaboration are natural possibilities for creatingpositive cognitive aids. I will discuss our approach to developinginteractive information visualizations both to support individuals andsmall groups of collaborators and briefly describe some of our recentresults.About the speaker: Sheelagh Carpendale holds a Canada Research Chair in InformationVisualization at the University of Calgary. Her research focuses onthe visualization, exploration and manipulation of information;visualizing such topics as ecological dynamics, uncertainty ininformation, social and communication information and investigatingthe development of information visualization environments that supportcollaboration. Dr. Carpendale's research in information visualizationand interaction design draws on her dual background in ComputerScience (BSc. and Ph.D. Simon Fraser University) and Visual Arts(Sheridan College, School of Design and Emily Carr, College of Art).
20 Apr 2007	Christopher Collins (Toronto)	Information Visualization to Support Computational Linguistics Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: We present a survey of resent research into using information visualizationto reveal new insights about linguistic data. Our recent work includesusing WordNet hyponymy as a basis for document visualization and visualizingthe uncertainty in machine translation in an instant messaging chatcontext. We will present our preliminary findings and prototypevisualization for machine translation data resulting from a week ofcollaboration with ISI researchers.About the speaker:Christopher Collins is a PhD candidate in information visualization andcomputational linguistics at the University of Toronto. He works with Prof.Gerald Penn and Prof. Sheelagh Carpendale (University of Calgary).
30 Mar 2007	Ido Dagan (Bar-Ilan U)	Textual entailment as a framework for applied semantics Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: We have recently proposed Recognizing Textual Entailment (RTE) as ageneric task that captures major semantic inferences across differentnatural language processing applications. The talk will first reviewthe motivation and definition of the textual entailment task and thePASCAL RTE-1,2&3 Challenges benchmarks. Then we will demonstratedirections for building textual entailment systems, based on knowledgeacquisition and inference, and for utilizing them within concreteapplications. Furthermore, we suggest that textual entailment modelingmay become a comprehensive framework for applied semanticsresearch. Such framework introduces useful variants of known semanticproblems and highlights important tasks which were hardly investigatedso far at an applied computational level. The semantic modelingperspective will be illustrated in more detail by a case study for anentailment-based variant of word sense disambiguation.About the speaker: Ido Dagan is a Senior Lecturer at the Department of Computer Scienceat Bar Ilan University, Israel. His areas of interest are largelywithin empirical NLP, particularly empirical approaches for appliedsemantic processing. In the last few years Ido and his colleaguesintroduced textual entailment as a generic framework for appliedsemantic inference and have organized the first three rounds of thePASCAL Recognizing Textual Entailment Challenges. Ido received hisPh.D. from the Technion. He has been a research fellow at the IBMHaifa Scientific Center and a Member of Technical Staff at AT&T BellLaboratories. During 1998-2003 he was co-founder and CTO ofFocusEngine and VP of Technology of LingoMotors.
23 Mar 2007	Hermann Helbig (U at Hagen, Germany)	Multilayered Extended Semantic Networks as a Knowledge Representation Paradigm and Interlingua for Meaning Representation Time: 3:00 pm - 4:30 pm Location: 4 CR Abstract: The talk gives an overview of Multilayered Extended Semantic Networks(abbreviated MultiNet), which is one of the most comprehensivelydescribed knowledge representation paradigms used as a semanticinterlingua in large-scale NLP applications and for linguisticinvestigations into the semantics and pragmatics of naturallanguage. As with other semantic networks, concepts are represented inMultiNet by nodes, and relations between concepts are represented asarcs between these nodes. Additionally to that, every node isclassified according to a predefined conceptual ontology forming ahierarchy of sorts, and the nodes are embedded in a multidimensionalspace of layer attributes and their values. MultiNet provides a set ofabout 150 standardized relations and functions which are described ina very concise way including an axiomatic apparatus, where the axiomsare classified according to predefined types. The representationalmeans of MultiNet claim to fulfill the criteria of universality,homogeneity, and cognitive adequacy. In the talk, it is also shown,how MultiNet can be used for the semantic representation of differentsemantic phenomena. To overcome the quantitative barrier in buildinglarge knowledge bases and semantically oriented computational lexica,MultiNet is associated with a set of tools including a semanticinterpreter NatLink for automatically translating natural languageexpressions into MultiNet networks, a workbench LIA for the computerlexicographer, and a workbench MWR for the knowledge engineer formanaging and graphically manipulating semantic networks. Theapplications of MultiNet as a semantic interlingua range from naturallanguage interfaces to the Internet and to dedicated databases, overquestion-answering systems, to systems for automatic knowledgeacquisition.About the speaker:Prof. Helbig is head of the chair Intelligent Information and CommunicationSystems at the University of Hagen, Germany. His main research areas areKnowledge Representation, Semantic Natural Language Processing, andQuestion-Answering. A CV can be found here.
09 Mar 2007	Kevin Knight	The Voynich Manuscript Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: The medieval Voynich Manuscript has been called "the mostmysterious document in the world". Its pages contain bizarre drawingsof strange plants and astrological diagrams, as well as an undecipheredscript of 20,000 running words, written in a character set that has neverbeen seen elsewhere. Its origin is also controversial, with many theoriesabounding. I will describe the document, show samples, explain where itmay have come from, and present some properties of the text. This will more of a history/mystery talk thana computer science talk.
26 Jan 2007	Gerald Penn (Toronto)	The Quantitative Study of Writing Systems Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: If you understood all of the world's languages, you would still not beable to read many of the texts that you find on the world wide web,because they are written in non-Roman scripts -- often ones that havebeen arbitrarily encoded for electronic transmission in the absence ofan accepted standard. This very modern nuisance reflects a dilemma asancient as writing itself: the association between a language as it isspoken and its written form has a sort of internal logic to it that wecan comprehend, but the conventions are different in every individualcase --- even among languages that use the same script, or betweenscripts used by the same language. This conventional associationbetween language and script, called a writing system, is indeedreminiscent of the Saussurean conception of language itself, aconventional association of meaning and sound, upon which modernlinguistic theory is based. Despite linguists' reliance upon writingto present and preserve linguistic data, however, writing systems werea largely forgotten corner of linguistics until the 1960s, when Gelbpresented their first classification. This talk will describe recent work that aims to place the study ofwriting systems upon a sound computational and statistical foundation.While archaeological decipherment may eternally remain the holy grailof this area of research, it also has applications to speechsynthesis, machine translation, and multilingual document retrieval.
12 Jan 2007	Kevin Knight	Capturing Natural Language Transformations Time: 2:00 pm - 3:30 pm Location: 11 Large Abstract: Knowledge representation is hard. As natural language scientists andengineers, we'd like something that- is expressive enough to capture how natural language works- permits tractable inference- admits learning algorithms for automatic knowledge acquisition- leads to modular system constructionThis talk will look at knowledge representation for capturing naturallanguage transformations. A lot of what we do falls into thiscategory. Examples of transformations include language translation(French to English), question answering (Question to Answer),transliteration (foreign script to Roman alphabet), summarization(long text to short text), parsing (string to tree), languagegeneration (meaning to string), etc.I'll show various knowledge formats (starting with simple finite-statetransducers) and show how they stack up on the 4 criteria above, usingtheorems and examples. We'll see that different types of tree andstring automata lead to good behavior on various subsets of the 4criteria, but getting 4 out of 4 is still elusive. This is a Krazy Theory talk -- since this kind of talk should not goon and on, I promise to finish within 50 minutes.
05 Jan 2007	Beata Klebanov (Hebrew U)	Experimental and Computational Investigation of Lexical Cohesion in Texts Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Lexical cohesion refers to structure created in a text by use of words withrelated meanings. Apart from its importance in theoretical and appliedlinguistics, lexical cohesion detection is used in NLP tasks like topicsegmentation, extractive summarization, spelling correction, etc. However, theintuitive potential of lexical cohesion for such tasks is often not realized inpractice, possibly due to shortcomings of detection algorithms.I will briefly describe an experiment with readers aimed at providing reliabledata for a computational investigation of lexical cohesion. We then discuss anumber of informative features for cohesion detection, drawing on sources likeWordNet, distributional information, free associations, and the structure ofinformation in the text itself. Finally, I report experimentswith supervised learning of lexical cohesion.About the speaker: Beata Beigman Klebanov is a PhD candidate at the Hebrew University of Jerusalem,Israel, currently a visiting scholar at Northwestern University. Beata'sinterests are in experimental, computational and applied research in textpragmatics.
15 Dec 2006	Jerry Hobbs	When Will Computers Understand Shakespeare? Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: In this talk I will examine problems encountered in coming to somekind of understanding of one sonnet by Shakespeare (his 64th), askwhat it would take to solve these problems computationally, andsuggests routes to the solution. The general conclusion is that weare closer to this goal as one might think. Or are we? Bio: Jerry Hobbs is famous primarily for having an office next to KevinKnight's and a parking space next to Ed Hovy's. He has readeverything of Shakespeare's that survives, including his will andplays of dubious authorship. But that was all a long time ago.
14 Dec 2006	Liang Huang (Penn)	Faster Decoding with Synchronous Grammars and n-gram Language Models Time: 1:30 pm - 3:00 pm Location: 11 Large Abstract: A major obstacle in syntax-based machine translation is theprohibitively large search space for decoding with an integratedlanguage model. We develop faster approaches for this problem basedon lazy algorithms for k-best parsing. When comparing againstChiang's technique of cube pruning, our method runs up to twice asfast without making more search errors or decreasing translationaccuracy as measured by BLEU. We demonstrate the effectiveness of thealgorithm on a large-scale translation system.Interestingly, these techniques can be applied to speed up bilexicalparsing as well, where the (bi-) lexical probabilities can be viewedas n-gram probabilities that causes non-monotonicity. This methodfits naturally into the coarse-to-fine grained multi-pass parsingschemes.To push this direction even further, we can generalize cube and lazycube pruning as generic tools for reducing complicated search spaces,as alternatives to the well-known A* and annealing techniques. This is joint work with David Chiang (ISI).
27 Nov 2006	Mark Hopkins (Potsdam)	Towards the Effective Exploitation of Syntax in Machine Translation Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: We discuss preliminary work on a possible approach to exploitingsyntax in an effective way for machine translation. The drivingguideline is to devise a machine translation system that can performeffectively, given a very limited quantity of parsed training data.
17 Nov 2006	David DeVault (Rutgers)	Scorekeeping in an Uncertain Language Game Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Practical dialogue systems must exploit context to interpret userutterances correctly. Received views of context and coordination inpragmatic theory equate utterance context with the occurrentsubjective states of interlocutors using notions like common knowledgeor mutual belief. We argue that these views are not well suited forpractical modeling due to the uncertainty and robustness of contextdependence in human-human dialogue. We present an alternativecharacterization of utterance context as objective and normative. Onthis view, an interlocutor's representation of context reflectsprivate uncertainty about the true objective context as determined byprior speaker meanings. As conversation moves forward, new utterancesprovide interlocutors with retrospective insight about each other'sprior meanings and therefore about what the true context really is.This view reconciles the need for uncertainty with received intuitionsabout coordination, and can directly inform computational approachesto dialogue.Joint work with Matthew Stone, Rutgers and Rich Thomason, MichiganAbout the Speaker: David DeVault is a Ph.D. candidate in the Department of ComputerScience at Rutgers University. He holds a B.S. in Engineering andApplied Science from the California Institute of Technology and anM.A. in Philosophy from Rutgers University. David's research aims todevelop techniques to allow computational agents to participate inflexible task-oriented conversations with human beings. His recentwork has drawn on design challenges encountered in building such anagent to try to articulate practical, learnable, and theoreticallysatisfying representations of context, utterance meaning, and speakerintention for implemented conversational systems.
03 Nov 2006	Jens-Soenke Voeckler	perl part 2 - advanced magick Time: 3:30 pm - 5:00 pm Location: 11 Large Abstract: Since part 1 of the Perl tutorial didn't cover the juicy bits (like aunique function in Perl), based on feedback from participants, I amoffering a part 2 "Perl - Advanced Magick" covering:o the slides from roughly page 40- The Schwartzian Transform- Dissecting a programo What to do, if you do need popen or backticks?o OO Perl - a starto C embedding - definitely only a "start here"o Useful recipes, e.g. interpolating variables in configurationscripts from Perl values. If there is something you are especially interested in seeing, pleasesend me an email
23 Oct 2006	Jens-Soenke Voeckler	perl - how to use it, not abuse it Time: 12:00 pm - 1:30 pm Location: 11 Large Abstract: If you speak a little perl, are an occasional perl-scripter, andwould like to know more about how to use it as a (p)ortable, (e)fficient, and (r)eadible (l)anguage, you may be interested in mybrown bag (read: bring your own) lunch seminar: I will talk about using Perl in a portable fashion, the environmentit is run in, and how avoid common mistakes and misconceptions. Perloffers more than a thousand ways to solve a problem, but some aremore portable or more efficient than others. If time permits, simplehands-on examples can be tried out during the talk, so power forlaptops will be provided.
29 Sep 2006	Ashish Venugopal (CMU)	Delayed LM Intersection and Left-to-Right N-Best Extraction for Syntax-Based MT Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: We begin by describing a set of pruning constraints that are appliedin the literature to effectively restrict the search space ofsynchronous PCFGs intersected with target language model contexts. Weapply these constraints to non-binarized grammars with a large numberof non-terminals and demonstrate effective parsing within theframework of Wu, 97.We then present a novel parsing approach that avoids language modelcontext intersection during parsing in favor of language model drivenn-best list extraction. The parsing step produces a sentencespanning parse forest which is explored in left-to-right target orderby the N-Best extraction method.This method avoids lossy pruning during the parsing process, searchinga much larger effective parse space than practically possible in thefull intersection scenario, and has the important benefit of allowingintegration of a high order language within the N-Best search process,rather than only in parse re-scoring.We demonstrate the impact of this parsing approach using the SPCFGapproach described in Zollmann, Venugopal, Vogel 06, which is similarto Galley et al., 04 and compare performance against fullintersection.This is joint work with Andreas ZollmannAbout the Speaker:Ashish Venugopal is a Ph.D candidate at the Language TechnologiesInstitute at Carnegie Mellon University, and holds B.S (SCS,Univ. Honors), M.S degrees from the same institution. He is a SeibelScholar and has received the annual Graduate Student Teaching Award atCarnegie Mellon. His research focus is on syntax augmented machinetranslation.
22 Sep 2006	Eduard Hovy	Toward a 'Science' of Annotation: Experiences from OntoNotes Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: As machine learning algorithms and their application for NLP becomebetter understood, attention turns toward the production of annotatedcorpora to which they can be applied. Numerous phenomena presentthemselves for annotation, including aspects in lexical semantics,discourse, pragmatics, and dialogue. But several questionsimmediately must be answered:1. How does one obtain a balanced corpus to annotate? What is abalanced corpus?2. How does one decide which aspects to annotate? How does oneadequately express the theory behind the phenomena in simple annotation steps?3. Which annotators does one hire? How does one ensure that they are adequately trained?4. How does one establish a simple, fast, and trustworthy annotationprocedure? What interfaces does one build? How does one ensure thatthe interfaces do not affect the annotation results?5. How does evaluate the results? What are the appropriate agreementmeasures? At which cutoff points should one re-do the annotations?How does one ensure improvement?6. How should one formulate and store the results? How does oneensure compatibility with other existing resources? How does one makeresults available for best impact?7. How does one report the annotation effort and results? How doesone actually get a paper on this work published at an importantconference? What should the paper contain? Despite their being so basic, there is almost no established procedureor standard set of answers to these questions today. In this talk Idiscuss some of these aspects, pointing to the lessons learned in theongoing OntoNotes project (joint with BBN, the University of Colorado(PropBank), the University of Pennsylvania (Treebank), and ISI).
25 Aug 2006	Victoria Fossum (Michigan)	Improving Precision of Word Alignments Using GHKM Syntax-Based Rule Extraction Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: Noisy word alignments negatively affect the quality of the translationrules extracted by the ISI syntax-based MT system. In the literature,alignment is typically treated as a separate process from subsequentstages in the MT pipeline. By contrast, we allow rule extraction toguide the alignment process.We present an unsupervised algorithm for identifying and removing "bad"links using GHKM syntax-based rule extraction. We show thatwe can improve upon the precision of GIZA union (measured against a goldstandard set of manually aligned Chinese-English sentence pairs),while only decreasing recall slightly. (Note: This is part of the Summer Intern Series)
25 Aug 2006	Jason Riesa	Minimally Supervised Morphological Segmentation with Applications to Machine Translation Time: 3:30 pm - 4:00 pm Location: 11 Large Abstract: Inflected languages in a low-resource setting present a data sparsity problem forstatistical machine translation. In this work, we present a minimallysupervised algorithm for morpheme segmentation on Arabic dialectswhich reduces unknown words at translation time by over 50%, totalvocabulary size by over 40%, and yields a significant increase inBLEU score over a previous state-of-the-art phrase-based statistical MT system.
23 Aug 2006	Oana-Diana Postolache	Towards combining Searn and Syntax-Based Machine Translation (SBMT) Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: This talk is about modeling the Syntax-Based Machine Translation(SBMT) problem within the Searn (Search & Learn) framework developed by Hal Daume inhis PhD thesis. I will present the way we define the states, actionsand the search space and how to implement the cost function. (Note: This is part of the Summer Intern Series)
23 Aug 2006	Joseph Turian (NYU)	Speeding-up Syntax-based Decoding Time: 3:30 pm - 4:00 pm Location: 11 Large Abstract: TBA (Note: This is part of the Summer Intern Series)
18 Aug 2006	Chenhai Xi	Name Entity Transliteration Discovery from Large Bilingual Comparable Corpora Time: 3:00 pm - 3:30 pm Location: 11 Large Abstract: In this summer project, we investigate a scalable method to extractChinese-English name transliterations from large comparable corpora,which consist of two languages discussing same or similar topics. We showthat bigram Jaccard coefficient is a good similarity method to compare Englishand Chinese names, at Chinese pronunciation (Pinyin) level. Based on this phoneticsimilarity score, an efficient randomized algorithm is then used tofind name pair candidates from English and Chinese lists. Finally, contextinformation, such as dates, frequency, place and titles are combined with thephonetic similarity to improve the accuracy of the name pairs list. (Note: This is part of the Summer Intern Series)
11 Aug 2006	Idan Szpektor (Bar-Ilan U)	Textual Entailment: Framework, Learning and Applications Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Textual Entailment has been proposed recently as a generic frameworkfor modeling semantic variability in many Natural Language Processingapplications, such as Question Answering, Information Extraction,Information Retrieval and Document Summarization. The TextualEntailment relationship holds between two text fragments, termed textand hypothesis, if the truth of the hypothesis can be inferred fromthe text.In this talk, the Textual Entailment framework will be introduced.I'll then present an algorithm for large-scale Web-based acquisitionof entailment rules, a type of knowledge needed for robust inference.Finally, I will present an unsupervised Relation Extraction approachbased on the Textual Entailment framework.About the speaker:Idan Szpektor is a PhD student under the supervision of Dr. Ido Daganat Bar Ilan University, Israel. His current research activity is inacquisition of knowledge for textual entailment.
04 Aug 2006	Shou-de Lin	Ph.D. defense practice talk Time: 3:30 pm - 4:30 pm Location: 11 Large Abstract: This is a practice talk for my Ph.D. defense, whichwill be held on Aug 24th 3-5pm, SAL 322.An important problem in the area of homeland security and frauddetection is to identify abnormal entities in large datasets.Although there are methods from knowledge discovery and data miningfocusing on finding anomalies in numerical datasets, there has beenlittle work aimed at discovering abnormal or suspicious instances inlarge and complex semantic graphs whose nodes are richly connectedwith many different types of links. In this talk, I will describe anovel, domain-independent and unsupervised framework to identify suchinstances. Besides discovering suspicious instances, we believe thatto complete the discovery process and to deal with the "curse offalse positives", a system has to convince the users by providingexplanations for its findings. Therefore, in the second part of thetalk I will describe an explanation mechanism to automaticallygenerate human-understandable explanations for the discoveredresults. Experimental results show that our discovery systemoutperforms state-of-the-art unsupervised network algorithms used toanalyze the 9/11 terrorist network by a large margin. Additionally, ahuman study we conducted demonstrates that our explanation system,which provides natural language explanations for its findings,allowed human subjects to perform complex data analysis in a muchmore efficient and accurate manner
28 Jul 2006	Qin Iris Wang (Alberta)	Improved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: This talk is about an improved approach for learning dependency parsersfrom treebank data. Our technique is based on two ideas for improvinglarge margin training in the context of dependency parsing. First, weincorporate local constraints that enforce the correctness of eachindividual link, rather than just scoring the global parse tree. Second,to cope with sparse data, we smooth the lexical parameters according totheir underlying word similarities using Laplacian Regularization. Todemonstrate the benefits of our approach, we consider the problem ofparsing Chinese treebank data using only lexical features, that is,without part-of-speech tags or grammatical categories. We achieve stateof the art performance, improving upon current large margin approaches.Here is the link for the paper:http://www.cs.ualberta.ca/~wqin/papers/depar_margin_conll06.pdfAbout the speaker: Qin Iris Wang is a Ph.D. student from the University of Alberta,working with Dekang Lin and Dale Schuurmans. Her research interestsare in natural language processing and machine learning. Specifically,she has been working on dependency parsing using both generative anddiscriminative methods.
11 Jul 2006	Dragos Munteanu + Joseph Turian	Practice Talks for ACL Time: 2:30 pm - 4:00 pm Location: 11 Large Abstract: Extracting Parallel Sub-Sentential Fragments from Non-Parallel CorporaDragos MunteanuWe present a novel method for extracting parallel sub-sentential fragmentsfrom comparable bilingual corpora. Currently, the state of the art incomparable corpus mining is only able to extract full sentence pairs whichare judged to be parallel. We advance the state of the art by showing howto obtain useful data even from not-fully-parallel sentences. By analyzingsentence pairs using a signal-processing-inspired approach, we detectwhich segments of the source sentence are translated into segments of thetarget sentence, and which are not. We evaluate the quality of theextracted data by showing that it improves the performance of astate-of-othe-art machine translation system.Advances in Discriminative ParsingJoseph Turian The present work advances the accuracy and training speed ofdiscriminative parsing. Our discriminative parsing method has nogenerative component, yet surpasses a generative baseline on constituentparsing, and does so with minimal linguistic cleverness. Our model canincorporate arbitrary features of the input and parse state, and performsfeature selection incrementally over an exponential feature space duringtraining. We demonstrate the flexibility of our approach by testing itwith several parsing strategies and various feature sets.
30 Jun 2006	David Chiang and Kevin Knight	Synchronous Grammars and Tree Transducers Time: 2:00 pm - 5:00 pm Location: 11 Large Abstract: (Practice tutorial for ACL/COLING 2006)Once upon a time, synchronous grammars and tree transducers were esoterictopics in formal language theory, far removed from the practice ofbuilding real, large-scale natural language systems. However, these toolsare now rapidly becoming essential for modeling machine translation andother complex language transformations. It has therefore become practicaland important to understand the basic properties of tree transformationsystems, which we cover in this tutorial.
23 Jun 2006	Joseph Turian (NYU)	Discriminative Training for Large-Scale NLP Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Parsing and translating natural languages can be viewed asstructured-prediction problems. We outline the crucial designdecisions that must be made to build a machine to solve structuredprediction problems, and explain our particular choices for these twolarge-scale NLP problems. Our approach uses a purely discriminativelearning method that scales up well to problems of this size. Unlikecurrently popular methods, this one does not require a great deal offeature engineering a priori, because it performs feature selectionover a compound feature space as it learns. Accuracy on constituentparsing was at least as good as other comparable methods. To ourknowledge, it is the first purely discriminative learning algorithmfor translation with tree-structured models. Experiments demonstratethe method's versatility, accuracy, and efficiency.
26 May 2006	Radu Soricut and Hal Daume III	Defense Practice Talks: Generation and Learning Time: 3:00 pm - 5:00 pm Location: 11 Large Abstract: These are two practice talks for our upcoming thesis defenses. The titlesand abstracts are:--------------------------------------------------------------------------NATURAL LANGUAGE GENERATION FOR TEXT-TO-TEXT APPLICATIONS USING AN INFORMATION-SLIM REPRESENTATIONRadu SoricutIn this talk, I describe a new natural language generation paradigm, basedon direct transformation of textual information into well-formed textualoutput. I support this language generation paradigm with theoreticalcontributions in the field of formal languages, new algorithms, empiricalresults, and software implementations. At the core of this work is a novelrepresentation formalism for probability distributions over finitelanguages. Due to its convenient representation and computationalproperties, this formalism supports a wide range of language generationneeds, from sentence realization to text planning.Based on this formalism, I describe, implement, and analyze theoreticallya family of algorithms that perform language generation using directtransformations of text. These algorithms use stochastic models oflanguage to drive the generation process. I perform extensive empiricalevaluations using my implementation of these algorithms. These evaluationsshow state-of-the-art performance in automatic translation, andsignificant improvements in state-of-the-art performance in abstractiveheadline generation and coherent discourse generation.--------------------------------------------------------------------------PRACTICAL STRUCTURED LEARNING FOR NATURAL LANGUAGE PROCESSINGHal Daume III Natural language processing is replete with problems whose outputs arehighly complex and structured. The current state-of-the-art in machinelearning is not yet sufficiently general to be applied to general problemsin NLP. In this thesis, I present Searn (for "search" + "learn"), anapproach to learning for structured outputs that is applicable to the widevariety of problems encountered in natural language. Searn operates bytransforming structured prediction problems into a collection ofclassification problems, to which any standard binary classifier may beapplied. From a theoretical perspective, Searn satisfies a strongfundamental performance guarantee: given a good classification algorithm,Searn yields a good structured prediction algorithm. To demonstrateSearn's general applicability, I present applications in such diverseareas as automatic document summarization and entity detection andtracking. In these applications, Searn is empirically shown to achievestate-of-the-art performance.
24 May 2006	Hal Daume III	Beyond EM: Bayesian Techniques for Human Language Technology Researchers Time: 9:00 am - 12:00 pm Location: 4th Floor Abstract: This is a practice tutorial for one I am giving at HLT/NAACL one weeklater. Comments/feedback are very welcome.----------------------------------------------------------------------Expectation Maximization (EM) has proved to be a great and usefultechnique for unsupervised learning problems in speech and languageprocessing. Unfortunately, its range of applications is limited either byintractable E- or M-steps, or by its reliance on the maximum likelihoodestimator. The natural language processing community typically resorts toad-hoc approximation methods to get (some reduced form of) EM to apply toNLP tasks. However, many of the problems that plague EM can be solvedwith Bayesian methods, which are theoretically more well justified. Inthis tutorial, I discuss Bayesian methods as they can be used in naturallanguage processing. The two primary foci of this tutorial are specifyingprior distributions and performing the necessary computations to performinference in Bayesian models. I focus on unsupervised techniques (forwhich EM is the obvious choice), but discuss supervised and discriminativetechniques at the conclusion with pointers to relevant literature.Depending on one's inference technique of choice, the math required tobuild Bayesian learning models can be difficult. Compounding this problemis the fact that current written tutorials on Bayesian techniques tend tofocus on continuous-valued problems, a poor match for the high-dimensiondiscrete world of text. This combination makes the cost of entrance tothe Bayesian learning literature often too high. The goal of thistutorial is to provide sufficient motivation, intuition and vocabularymapping so that one can easily understand recent papers in Bayesianlearning that are published at conferences like NIPS, and increasingly atACL. In addition to the standard tutorial materials (slides), thistutorial is accompanied by a technical report that spells out all themathematic derivations in great detail, for those who wish to startresearch projects in this fields. This tutorial should be accessible to anyone with a basic understanding ofstatistics. I use a query-focused summarization task as a motivatingrunning example for the tutorial, which should be of interest toresearchers in natural language processing and in information retrieval.Additionally, though the tutorial does not focus on speech problems, thoseattendees interested in graphical modeling techniques for automatic speechrecognition might also find the tutorial of interest.
19 May 2006	Patrick Pantel	Espresso: Making Use of Generic Patterns for Mining Relations from Small and Large Corpora Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: In the past decade, researchers have explored many approaches toautomatically extract large collections of knowledge from text. In thistalk, we present Espresso, a weakly-supervised, general-purpose, andbroad-coverage algorithm for harvesting binary semantic relations. Themain contributions are: i) a method for exploiting generic patterns byfiltering incorrect instances using the Web; and ii) a principled measureof pattern and instance reliability enabling the filtering algorithm. Wepresent an empirical comparison of Espresso with various state of the artsystems, on different size and genre corpora, on extracting variousgeneral and specific relations. Experimental results show that ourexploitation of generic patterns substantially increases system recallwith small effect on overall precision.
12 May 2006	Nick Mote and Donghui Feng	Pedagogical Contextualization of Language Learner Speech Errors AND Learning to Detect Conversation Focus of Threaded Discussions Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: This is two practice talks.-----------------------------------------------------------------------------FIRST TALK:The traditional approach to diagnosing learner speech errors in ComputerAided Language Learning is to create a linguistic profile of thelearner/user. We, however, propose that work must also be done to modelthe linguistic profile of a typcial native listener.Not all errors in second langage learner speech are created equal.Different errors sound more "severe" or "harsh" to native speaker ears andshould therefore be treated with more emphasis in pedagogical interaction.The Tactical Language Training System (TLTS) is a speech-enabledvirtual-reality based computer learning environment designed to teachArabic spoken communication to American English speakers. This talkaddresses the ways the TLTS contextualizes non-native speech errors, andhow this contextualization fits in the corrective exchanges between anon-native learner and a pedagogical agent built to model a nativelistener.The pedagogical system used in TLTS includes: * Automatic Speech Recognition (ASR) models which are built on acombination of both annnotated and unannotated non-native speech withnative speech data. * A stochastic generative model for errors in learner speech thatcreates mispronunciation grammars for the ASR * Reweighting of system-perceived mispronunciation severity based onaggregate native speaker judgements of quality pronunciation andintelligiblity. * Contextualization of feedback based on lexical and phoneticinventories of the native and non-native languages.-----------------------------------------------------------------------------SECOND TALK: We present a novel feature-enriched approach that learns to detect theconversation focus of threaded discussions by combining NLP analysis andIR techniques. Using the graph-based algorithm HITS, we integratedifferent features such as lexical similarity, poster trustworthiness, andspeech act analysis of human conversations with featureoriented linkgeneration functions. It is the first quantitative study to analyze humanconversation focus in the context of online discussions that takes intoaccount heterogeneous sources of evidence. Experimental results using athreaded discussion corpus from an undergraduate class show that itachieves significant performance improvements compared with the baselinesystem.
05 May 2006	Namhee Kwon	Recognizing Argument Structures in Texts Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: I present our approach to identify an argument structure defined as asimple hierarchical structure of claim and reasons. The claim is alsoclassified into "in favor of" or "against" the topic. The experiment isperformed on the comments from the general public sent to governmentofficials in response to proposed regulations.
28 Apr 2006	Feng Pan	Learning Event Durations from Event Descriptions Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: The research of extracting event duration information from texts ispotentially very important in applications in which the time course ofevents is to be extracted from news. For example, whether two eventsoverlap or are in sequence often depends very much on their durations. Ifa war started yesterday, we can be pretty sure it is still going on today.If a hurricane started last year, we can be sure it is over by now.In the talk, I will first present our work on constructing an annotatedcorpus for extracting information about the typical durations of eventsfrom texts, including the annotation guidelines, the event classes wecategorized, the way we use normal distributions to model such vague andimplicit temporal information, and how we evaluate inter-annotatoragreement. I will then show that machine learning techniques applied tothis data yield coarse-grained event duration information, considerablyoutperforming a baseline and approaching human performance.At the beginning of the talk, I will also give a brief overview of thetime ontology (OWL-Time, formerly DAML-Time) we have developed, which isrepresented in both first-order logic and the OWL web ontology language.
21 Apr 2006	Soo-Min Kim	Identifying and Analyzing Judgment Opinions Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: In this talk, we introduce a methodology for analyzing judgment opinions.We define a judgment opinion as consisting of a valence, a holder, and atopic. We decompose the task of opinion analysis into four parts: 1)recognizing the opinion; 2) identifying the valence; 3) identifying theholder; and 4) identifying the topic. We evaluate our methodology usingboth intrinsic and extrinsic measures.
14 Apr 2006	Radu Soricut	Natural Language Generation for Text-to-Text Applications using an Information-Slim Representation Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Although a considerable number of generic Natural Language Generation(NLG) systems has been produced over the years, none of them is usuallyemployed in end-to-end, text-to-text applications such as MachineTranslation, Summarization, Question Answering, etc. In this talk, weidentify the likely reasons for this state of affairs, and proposeWIDL-expressions as a flexible formalism that facilitates the integrationof a generic NLG engine within end-to-end language processingapplications.WIDL-expressions represent compactly probability distributions over finitesets of candidate realizations, and have optimal algorithms for textrealization via interpolation with language model probabilitydistributions. We show the effectiveness of our WIDL-based NLG engine forboth sentence realization and document realization tasks. By employinglanguage models that capture sentence-level properties, we perform MachineTranslation and Headline Generation at state-of-the-art levels or better.By employing language models that capture document-level properties suchas text coherence, we synthesize output for Multi-document Summarizationthat displays both high content selection performance and increasedcoherence.
24 Mar 2006	Dragos Munteanu	Automatic creation of parallel corpora Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Parallel texts -- texts that are translations of each other -- are animportant resource in many cross-lingual NLP applications, such as lexicalacquisition, cross-language IR, and annotation projection. However, theirimportance is paramount for Statistical Machine Translation (SMT), as theyprovide the training data from which all the translation knowledge islearned. The state of the art in SMT is advanced enough that, givensufficient parallel data (i.e. a few million words) for any language pairin a given domain, a generic SMT system trained on it will achieve areasonable translation performance in that domain. The main reason why SMTsystems exist only for a handful of languages is that, for most languagepairs, parallel training data is simply not available.One way to alleviate this lack of parallel data is to exploit a muchricher and more diverse resource: comparable corpora, texts which are notstrictly parallel but related. The prototypical example of comparabletexts are two news articles in different languages which report on thesame event. I will present methods for automatic extraction of paralleldata from such corpora. I will show how to detect parallel data at variouslevels of granularity: parallel documents, parallel sentences, and evenparallel sub-sentence fragments. The parallel corpora obtained using thesemethods help improve translation performance for both resource-scarcelanguage pairs (such as Romanian-English) and resource-rich ones (such asArabic-English).
17 Mar 2006	Jonathan May	Tiburon: A Finite State Tree Automata Toolkit Time: 3:00 pm - 4:30 pm Location: 4th Floor Abstract: In the 1990s, researchers applied their new developments in transducertheory using widely available easy-to-use toolkits for string transducers,and made well-known advances in parsing, machine translation, and otherareas. Rapid prototyping via software such as the AT&T toolkit and carmelwas useful for proofs of concept and in many cases led to unforseendevelopments in novel areas. In the current nlp research environment treebased strategies and new models have shown promising results in advancingthe state of the art, and recent developments in weighted tree automatatheory are enriching the bedrock created 40 years ago, but as of yet thereis no toolkit available with the necessary capabilities to turn promiseinto solution. Tiburon is the first probablistic tree transducer toolkit. Similar in formand function to the string-based toolkits of yesteryear, it is designed tobe easy to use, with simple but expressive definitions of tree automataand a concise set of vital operations that can be used to construct manyuseful tree-based nlp projects. Although a work in progress, Tiburon isalready a usable tool with active users between the ages of 6 and 41. Iwill describe the current status of the system, demonstrate ease of useand potential power, and discuss the challenges ahead.
10 Mar 2006	Mark Hopkins	Exploring the Potential of Intractable Parsers Time: 3:00 pm - 4:30 pm Location: 10th Floor Abstract: We revisit the idea of history-based parsing, and present a history-basedparsing framework that strives to be simple, general, and flexible. Wealso provide a decoder for this probability model that is linear-space,optimal, and anytime. A parser based on this framework, when evaluated onSection 23 of the Penn Treebank, compares favorably with otherstate-of-the-art approaches, in terms of both accuracy and speed.
03 Mar 2006	Liang Huang (Penn)	Syntax-Directed Translation with Extended Domain of Locality Time: 3:00 pm - 4:30 pm Location: 11th Floor (Large) Abstract: (note: this is a very tentative title -- comments welcome!)We present a novel extension of syntax-directed translation forstatistical MT. Formally speaking, our model is based on tree-to- stringtransducers that recursively convert a parse-tree in the source-languageinto a string in the target-language. These transduction rules havemulti-level trees on the source-side, giving this system moretransformational power due to the extended domain of locality. We alsopresent efficient algorithms for decoding based on dynamic programming.Initial experiments on English-to-Chinese translation show promisingresults in both speed and the translation quality.Joint work with Kevin Knight and Aravind Joshi. Bio: Liang Huang is a 3rd-year PhD student from the University of Pennsylvania.He is mainly interested in algorithms and formalisms for parsing andsyntax-based machine translation. His recent work has been on k-bestparsing algorithms (with David Chiang) and synchronous binarization for MT(with Hao Zhang, Dan Gildea, and Kevin Knight).
24 Feb 2006	Hal Daume III	Search-based Structured Prediction Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: I present an algorithm, Searn (for "search-learn") that is designed tosolve structured prediction problem: problems whose goal is to learn topredict complex objects such as parts-of-speech, parse trees,translations, etc... Searn functions by "breaking apart" structuredprediction problems into classification problems in the process of search.I analyze Searn in the framework of learning reductions and show that goodperformance on the underlying classification problems implies good searchperformance. Moreover, Searn is computationally efficient in a supersetof the settings where previous algorithms are efficient and is not limitedby conditional independence assumptions (as in CRFs). This excessivelysimple and general algorithm turns out to have excellent state-of-the-artperformance. This is joint work with John Langford (TTI-C) and Daniel Marcu; and, to alesser extent, with Drew Bagnell (CMU) and Bianca Zadrozny (IBM TJWatson).
10 Feb 2006	David Chiang	Parsing Arabic Dialects Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: The Arabic language exhibits diglossia, i.e., the coexistence of two formsof language, a variety with standard orthography and sociopolitical cloutwhich is not natively spoken by anyone (Modern Standard Arabic, MSA) andvarieties that are primarily spoken and lack writing standards (Arabicdialects). There are important resources currently available for MSA withmuch on-going NLP work; for example, there is an Arabic Treebank andseveral syntactic parsers for MSA. However, Arabic dialect resources andNLP research are still at an infancy stage. I will present work done atthe Johns Hopkins CLSP Summer Workshop on parsing of Arabic dialects, inparticular, Levantine Arabic. We have experimented with three approachesto leveraging MSA resources to create a parser for Levantine Arabic, aswell as methods for induction of MSA-Levantine translation lexicons and aLevantine part-of-speech tagger. Using these methods we obtain errorreductions of up to 15% compared with applying an MSA parser directly toLevantine text.Rambow et al. Parsing Arabic Dialects: Final Report. Johns HopkinsUniversity Center for Language and Speech Processing Workshop 2005.http://www.clsp.jhu.edu/ws2005/groups/arabic/documents/finalreport.pdfChiang et al. Parsing Arabic Dialects. To appear in Proc. EACL 2006. This is joint work with O. Rambow, M. Diab, N. Habash, R. Hwa, K. Sima'an,V. Lacey, R. Levy, C. Nichols and S. Shareef.
03 Feb 2006	Alex Fraser	Measuring Word Alignment Quality for Statistical Machine Translation Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Automatic word alignment plays a critical role in statistical machinetranslation. Unfortunately the relationship between alignment quality andstatistical machine translation performance has not been well understood.In the recent literature the alignment task has frequently been decoupledfrom the translation task, and assumptions have been made about measuringalignment quality for machine translation which, it turns out, are notjustified. In particular, none of the tens of papers published over thelast five years has shown that significant decreases in Alignment ErrorRate (AER) result in significant increases in translation quality. I willexplain this state of affairs and present steps towards measuringalignment quality in a way which is predictive of statistical machinetranslation quality.I will also provide a brief overview of some of my other work on trainingand search for word alignment.
27 Jan 2006	John Conroy	Multi-Document Summary Space:What do People Agree is Important? Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: A multi-document summary gives the "gist" of what is contained in acollection of related documents. But how can we define a "gist?" Weexplore this question by analyzing human written summaries for clusters ofdocument sets. In particular, we estimate the probability that word willbe chosen by a human to be included in a summary. We demonstrate that ifthis probability model were given by an oracle, then a simple automaticmethod of summarization can produce extract summaries which arestatistically indistinguishable from the human summaries.About the Speaker:John M. Conroy received a B.S. in Mathematics from Saint Joseph'sUniversity in 1980 and a Ph.D. in Applied Mathematics from the Universityof Maryland in 1986. Since then he has been a research staff member forthe IDA Center for Computing Sciences in Bowie, MD. His research interestis applications of numerical linear algebra and statistics. He is a memberof the Society for Industrial and Applied Mathematics, Institute ofElectrical and Electronics Engineers (IEEE), and the Association forComputational Linguistics.
26 Jan 2006	Tim Chklovski	GrainPile: Deriving Quantitative Overviews of Free Text Assessments on the Web Time: 1:00 pm - 2:00 pm Location: 4th floor Abstract: Many research efforts are addressing the problem of enabling automaticsummarization of opinions and assessments stated on the web in productreviews, discussion forums, and blogs. One key difficulty is that relevantassessments scattered throughout web pages are obscured by variations innatural language. In this paper, we focus on a novel aspect of enablingaggregations of assessments of degree to which a given property holds fora given entity (for instance, how touristy is Boston). We presentGrainPile, a user interface for extracting from the web, aggregating andquantifying degree assessments of unconstrained topics. The interfaceprovides a variety of functions: a) identification of dimensions ofcomparison (properties) relevant to a particular entity or set ofentities, b) comparisons of like entities on user-specified properties(for example, which university is more prestigious, Yale or Cornell), c)tracing the derived opinions back to their sources (so that the reasonsfor the opinions can be found). A central contribution in GrainPile is theevaluated demonstration of feasibility of mapping the recognizedexpressions (such as fairly, very, extremely, and so on) to a common scaleof numerical values and aggregating across all the extracted assessmentsto derive an overall assessment of degree. GrainPile’s novelassessment and aggregation of degree expressions is shown to stronglyoutperform an interpretation-free, co-occurrence based method.Full paper:http://www.isi.edu/~timc/papers/IUI06-grainpile-chkl.pdf
16 Dec 2005	Jonathan May	A Better N-Best List - Practical Determinization of Weighted Finite Tree Automata Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Ranked lists of output trees from syntactic statistical NLP applicationsfrequently contain multiple repeated entries. This redundancy leads tomisrepresentation of tree weight and reduced information for debugging andtuning purposes. It is chiefly due to nondeterminism in the weightedautomata that produce the results. I will introduce an algorithm thatdeterminizes such automata while preserving proper weights, returning thesum of the weight of all multiply derived trees. I will also reportresults of the application of the algorithm to machine translation andData Oriented Parsing.
30 Sep 2005	David Chiang	Some Computational Complexity Results for Synchronous Context-Free Grammars Time: 3:00 pm - 4:30 pm Location: 4 Large Abstract: (This is a practice talk for a paper by Giorgio Satta and Enoch Peserico)This paper investigates some computational problems associated withprobabilistic translation models that have recently been adopted in theliterature on machine translation. These models can be viewed as pairs ofprobabilistic context-free grammars working in a `synchronous' way. Twohardness results for the class NP are reported, along with an exponentialtime lower-bound for certain classes of algorithms that are currently usedin the literature.
29 Sep 2005	Tim Chklovski	Previews of my talks for K-CAP Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: The topics & approximate start times:(3:00 sharp) My 7-10 min bit for panel discussion on "Manual vs. AutomatedKnowledge Acquisition"Will touch on web extraction vs. learning from volunteers -- strengths andweaknesses, new thoughts on synergies(3:15) Designing Intelligent Acquisition Interfaces for Collecting WorldKnowledge from Web Contributors(paper by Timothy Chklovski, Yolanda Gil) (3:55) Collecting Paraphrase Corpora from Volunteer Contributors (paper byTimothy Chklovski)
26 Aug 2005	Fossum, Huang and Zhang	Summer Student Presentations Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: 3:00pm Victoria Fossum (Michigan)Exploring the Continuum between Phrase-based and Syntax-based Machine TranslationState-of-the-art statistical machine translation systems use lexicalphrases as the basic unit of translation. Phrase-based systems cancapture those aspects of translation that are sensitive to local context.Syntax-based systems, on the other hand, make use of linguisticallymotivated syntactic structure, can capture long-distance dependencies andreorderings, and offer greater generalization in translation rules.However, their performance lags that of phrase-based systems.Hierarchical phrase-based translation, introduced by [Chiang 05], providesan elegant framework for exploring the continuum between phrase-based andsyntax-based translation. This system combines the "formal machinery" ofsyntax-based systems without any "linguistic commitment" to a particularsyntactic structure [Chiang 05].I will present results from my re-implementation of Chiang's hierarchicalphrase-based system, and (if time permits) compare those results with thefollowing systems on Chinese-English translation: ISI's phrase-basedsystem, and ISI's syntax-based system. Between now and December 2005, Iplan to incrementally explore the space between phrase-based andsyntax-based systems by augmenting these hierarchical phrase-based ruleswith richer syntactic annotation.3:30pm Liang Huang (Penn) and Hao Zhang (Rochester)Efficient Integration of n-gram Language Models with Syntax-based DecodingWe first give an overview of the ISI syntax-based MT system which is basedon tree-to-string (xRs) translation rules. The biggest problem at thisstage is the inefficiency of the integration of n-gram models. Withoutn-gram models, the xRs translation rules can be easily binarized withrespect to the foreign language to ensure cubic-time decoding. With n-grammodels, however, binarization without considering both languages will leadto exponential complexity. Inspired by Inversion Transduction Grammar (ITG) (Wu, 97), we will focuson the so-called ITG binarizable rules which count for over 99% of thewhole rule set. A simple linear-time algorithm will be presented to do thebinarization. Decoding with ITG-like rules is of low polynomial complexityin both time and space. We will discuss experimental results on bothefficiency and accuracy of decoding with the new binarization. If timepermits, we will also present the "hook trick" (inspired by (Eisner andSatta, 99)) to even further reduce the polynomial complexity of thedecoding process.
24 Aug 2005	Hopkins, Riesa, and Nakov	Summer Student Presentations Time: 3:30 pm - 5:00 pm Location: 11 Large Abstract: 3:30pm Mark Hopkins (UCLA)Tree Sequence Automata: A Unifying Framework for Tree Relation FormalismsThere exist a wide variety of competing formalisms for representing alanguage of ordered tree pairs. These include (bottom-up and top-down)tree transducers, synchronous tree-substitution grammars (STSGs),synchronous tree-adjoining grammars (STAGs), and inversion transductiongrammars (ITGs). Since these formalisms have all developed independentlyof one another, it is difficult to compare their respectiverepresentational power. This work seeks to make this task simpler byviewing these formalisms as instances of a general unifying formalism,which we call tree sequence automata (TSA). By casting these differentformalisms in a single framework, we can compare them directly by studyingthe specific subclass of TSA that they fall into.4:00pm Jason Riesa (Johns Hopkins)A case study in building a cost-effective speech-to-speech machine translation system with sparse resources: English - Iraqi ArabicThe Arabic spoken dialect of Iraq is a language deprived of the vastresources that researchers enjoy when working with its writtencounterpart, Modern Standard Arabic (MSA). The Iraqi Arabic lexicon andgrammar are also sufficiently distinct so that the use of existing toolsor corpora for MSA yield little or no positive effect on machinetranslation output quality. One can see that building a machinetranslation system normally dependent on a large parallel corpus is aparticularly difficult task when given just a 37,000 line translatedparallel text based on transcribed speech. This talk will explore theconstraints involved in working with this type of data, how we endeavoredto mitigate such problems as a non-standard orthography and a highlyinflected grammar, and propose a cost- effective way for dealing with suchprojects in the future.4:30pm Preslav Nakov (UC Berkeley)Multilingual Word Alignment Recently there has been a growing number of available multilingualparallel texts. One such source is the European Union, which publishes itsofficial documents in the official languages of all member states(sometimes also in the languages of the candidates). Another source arethe United Nations. These corpora are a great source of training data formachine translation between new language pairs. But they also offer theopportunity to obtain better pairwise word alignments by looking atmultiple languages in parallel. In this talk I will present my research asa summer intern at ISI on getting better French (Fr) to English (En) wordalignments using an additional language (Xx). First, I will introduce twoheuristics which start with pairwise alignments between Fr-Xx, En-Xx andFr-En and then combine them probabilistically (in a linear model) orgraph-theoretically (by looking at in- and out-degrees for each word).Then I will present two Model1 inspired alignment models: (a) from "Fr andXx" to En; and (b) from Fr to "En and Xx".
05 Aug 2005	Jan Hajic (Charles U)	The Family of Prague Dependency Treebanks Time: 10:30 am - 12:00 pm Location: 11 Large Abstract: The Prague Dependency Treebank project is aimed at a linguisticallycomplex, multi-tier annotation of relatively large amounts of naturallyoccuring sentences of natural language. There are four tiers at present:the basic token tier (level 0), and the morphological, surface-syntacic,and semantic (called "tectogrammatics") tiers. The syntactic andtectogrammatic tiers are based on a richly labelled dependencyrepresentation principle. So far, the project produced three corpora: theCzech-language-only Prague Dependency Treebank, the Prague Czech-EnglishDependency Treebank and the Prague Arabic Dependency Treebank. In thetalk, the principles of the Prague Dependency Treebank linguisticannotation scheme will be presented. Some technical details will also bediscussed, as well as some of the tools developed both for the manualannotation itself and for corpus-based NLP of Czech, English and Arabic.
05 Aug 2005	Doug Oard (Maryland)	The CLEF Cross-Language Speech Retrieval Test Collection Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Test collections for information retrieval tasks have traditionallyassumed that what we are searching for are documents (e.g., Web pages,news stories, or academic documents). Most information that is generatedis, however, not in originally generated as part of a document, but ratheras what we might refer to as "conversational media" (e.g., email, speech,or instant messaging). In this talk, I'll describe the creation of twotest collections for conversational media, an email collection beingcreated in the TREC Enterprise Search track and a spoken word testcollection for the the Cross-Language Evaluation Forum (CLEF). I'll spendmost of the talk describing the details of the CLEF test collection,illustrating the issues with some of the results that we have obtainedfrom our experiments with that collection. I'll conclude with a fewremarks about the implications of what we are learning for DARPA's newGALE program. This is joint work with Charles University, the IBM TJWatson Research Center, the Johns Hopkins University, the Survivors of theShoah Visual History Foundation, and the University of West Bohemia.About the speaker: Douglas Oard is an Associate Professor at the University of Maryland,College Park, with a joint appointment in the College of InformationStudies and the Institute for Advanced Computer Studies. He holds a Ph.D.in Electrical Engineering from the University of Maryland, and hisresearch interests center around the use of emerging technologies tosupport information seeking by end users. In 2002 and 2003, Doug spent ayear in paradise here at USC-ISI. His recent work has focused oninteractive techniques for cross-language information retrieval and onsearching conversational text and speech. Additional information isavailable at http://www.glue.umd.edu/~oard/.
15 Jul 2005	Victoria Li Fossum (Michigan)	Inducing POS Taggers by Projecting from Multiple Source Languages Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: (Yarowsky et al., 2001) present an algorithm for bootstrapping a POStagger for an arbitrary target language, using an existing POS tagger fora source language and a parallel corpus in the source and targetlanguages. The source text is annotated with the POS tagger; the parallelcorpus is word-aligned; the POS tags are "projected" from source to targetlanguage; and finally smoothing is performed before training a POS taggerfor the target language on the projected annotations. I will talk about my work (jointly with my advisor, Steve Abney, at U. ofMichigan) in which we extend this algorithm by projecting from multiplesource languages onto a target language, then combining the outputs tocompute a consensus POS tagger. Our hypothesis is that systematictransfer errors from different source-target pairs can be reduced by usingmultiple source languages. I will present experimental results for threedifferent source languages (English, German, and Spanish), and twodifferent target languages (French and Czech). Our results indicate thatusing multiple source languages improves performance.
07 Jul 2005	Radu Soricut	Natural Language Generation for Text-to-Text Applications Using an Information-Slim Representation Time: 3:00 pm - 4:30 pm Location: 11 Small Abstract: Text-to-text applications -- Machine Translation, Summarization, QuestionAnswering -- do not usually involve generic Natural Language Generation(NLG) systems in their generation components, but rather useapplication-specific algorithms. The main reason for this state of affairsis that virtually all the formalisms used by current generic NLG systemsrequire information that cannot be reliably extracted from unrestrictedtext.This thesis proposal is about meeting the demand for natural languagegeneration in the context of text-to-text applications. I introduce a newrepresentation formalism (WIDL-expressions), propose generation algorithmsthat operate on representations specific to this formalism, and discuss ageneric sentence realization framework for text-to-text applications. Thegeneration mechanism is based on algorithms for intersectingWIDL-expressions with probabilistic language models. I present boththeoretical and empirical results concerning the correctness andefficiency of these algorithms. I also discuss the practical aspectsarising from implementing this generation mechanism.In a concrete application of the proposed generation mechanisms, I presentan end-to-end Machine Translation application. I also discuss anotherpossible application for Automated Summarization, namely automatedheadline generation.
06 Jul 2005	Alessandro Moschitti (Rome)	Kernel Methods for Semantic Role Labeling Time: 2:00 pm - 3:30 pm Location: 11 Large Abstract: Automatic Natural Language applications often require the processing ofstructured data. Traditional machine learning approaches attempt torepresent structured syntactic/semantic objects by means of flat featurerepresentations, i.e. attribute-value vectors. This raises two problems:1. There is no well defined theoretical motivation for such feature model.Structural properties may not fit in any flat feature representation.2. To define effective flat features, a deep knowledge about thelinguistic phenomenon is required.Kernel methods for Natural Language Processing aim to solve both the aboveproblems as kernel functions can be used to define similarities betweenlinguistic objects without explicitly defining the target feature space.In this way, a linguistic phenomenon can be modeled at a more abstractlevel where the modeling is easier. Such property is extremely useful whenthe representation of linguistic phenomena is still not well understood.For example, the feature design of semantic role labeling appear to bequite complex since several and non-definitive feature sets have beenproposed.As a viable alternative to manual feature design, kernel methods proposetwo steps: (1) they generate all substructures of the targetsyntactic/semantic structures and (2) they let the learning algorithm(e.g. Support Vector Machines) to select the most relevant substructures.In this talk, we (1) introduce the PropBank and FrameNet predicateargument structures, (2) present the standard approaches to the automaticlabeling of semantic roles and (3) show advanced semantic role labelingmodels based on kernel methods.About the speaker:Alessandro Moschitti is a researcher at the Computer Science Department ofthe University of Rome ^ÓTor Vergata^Ô. In 1998 he took his master degreein Computer Science at the University of Rome ^ÓLa Sapienza^Ô. In 2003 hefinished his PhD in Computer Science at ^ÓTor Vergata^Ô University.Between 2002 and 2004 he worked as an associate researcher in theUniversity of Texas at Dallas. His research interests concern machinelearning approaches for Natural Language Processing and InformationRetrieval. His deep expertise relates to automated text categorization andsemantic role labeling. Recently, he has devised new kernels which enableSupport Vector and other kernel-based machines to carry out advancedsemantic processing.
23 Jun 2005	Michael Fleischman (MIT)	Intentional Context in Situated Language Learning Time: 10:30 am - 12:00 pm Location: 11 Small Abstract: Natural language interfaces designed for agents that interact with usersin shared environments (e.g. training simulators, videogames) mustincorporate knowledge about the users' context in order to address themany ambiguities of situated language use. We introduce a model ofsituated language acquisition that operates in two phases. First,intentional context is represented and inferred from user actions usingprobabilistic context free grammars. Then, utterances are mapped ontothis representation in a noisy channel framework. The acquisition modelis trained on unconstrained speech collected from subjects playing aninteractive game, and tested using an understanding task. Discussion ofresults focuses both on the implications for theoretical models ofcognition, as well as, for natural language applications in sharedenvironments.
22 Jun 2005	Hal Daume III	Beyond EM: Bayesian Techniques for NLP Researchers Time: 1:00 pm - 4:30 pm Location: 11 Large Abstract: EM has proved to be a great and useful technique for unsupervised learningproblems in natural language. Unfortunately, it cannot solve everyproblem out there, either because the E-step is intractable, the M-step isintractable or both. Typically our community resorts to a Viterbiapproximation in this case, which really isn't very justified and caneasily diverge from our expectations (no pun intended). Moreover, EM --like all maximum likelihood methods -- suffers from a need for ad-hoc andundesirable smoothing. All of these problems -- intractable E- orM-steps, the Viterbi approximation, and the annoyance of smoothing -- aresolved by using Bayesian methods. Moreover, from a theoretic point ofview, the Bayesian paradigm is much more foundationally well justifiedthan the frequentist use of estimators (such as the maximum likelihoodestimator), at some cost in computation (though not as much as you mightbelieve).In this tutorial, I will discuss Bayesian methods as they can be used innatural language processing. The first half will be background (some ofwhich you probably won't have seen, some of which you probably will haveseen, but which will probably be presented in a different way that you'reused to) including graphical models, EM, priors and pro- (and con-)Bayesian arguments. The second half of the tutorial will focus on solvingcomplex inference problems, essentially building on what we've seen fromEM. I'll cover MAP (not Bayesian -- if you can't tell me why, then youshould come to the tutorial!), summing, Monte Carlo, MCMC, Laplace,variational and expectation propagation. Time permitting, I will brieflydiscuss Bayesian discriminative models (basically what a Bayesian usesinstead of SVMs), non-parametric (infinite) models and Bayesian decisiontheory, all of which make use of the inference techniques we will havealready covered.This tutorial is intended to be largely self contained, though I willexpect that you know what probabilities are, what distributions are andthe standard manipulations of conditional/joint distributions. Familiaritywith EM would be helpful, but I'll cover this topic in some depth since itwill be important for understanding the rest of the tutorial. I hope --though this never really seems to come to fruition -- that this will be asemi-interactive talk and I will attempt to adjust according to whatpeople are interested in and what is putting people to sleep.(see http://www.isi.edu/~hdaume/bayesnlp/ for more information)
22 Jun 2005	Mitsunori Matsushita	Lumisight Table: A Face-to-face Collaboration Support System That Optimizes Direction of Projected Information to Each Stakeholder Time: 11:00 am - 12:00 pm Location: 11 Large Abstract: (This talk occurs in the morning on the same day as the Bayesian tutorial.)The goal of our research is to support cooperative work performed bystakeholders sitting around a table. To support such cooperation, varioustable-based systems with a shared electronic display on the tabletop havebeen developed. These systems, however, suffer the common problem of notrecognizing shared information such as text and images equally because theorientation of their view angle is not favorable. To solve this problem,we propose the Lumisight Table. This is a system capable of displayingpersonalized information to each required direction on one horizontalscreen simultaneously by multiplexing them and of capturing stakeholders'gestures to manipulate the information.About the Speaker:Mitsunori Matsushita is a research scientist of NTT Communication ScienceLabs., Nippon Telegraph and Telephone Corporation (NTT). He received B.E.,M.E., and Dr.E. degrees from Osaka University, in 1993, 1995 and 2003respectively. In 1995, he joined NTT, and has been engaged in researcheson natural language understanding, information visualization, andinteraction design.
20 Jun 2005	Birte Loenneker (Hamburg)	Between Story Generation and Natural Language Generation Time: 10:00 am - 11:30 am Location: 11 Small Abstract: Narratology analyzes the discursive structure of narratives as finalizedproducts of human invention, such as novels, short-stories, orfairy-tales. Those narratives are rendered in a given surface form;Narratology focuses on narratives in natural language. Narratologistsassume that each narrative surface representation is associated with aneutral, abstract event sequence, the "Story" (histoire, sjuzhet). Theabstractness of Story is illustrated by the fact that the same Story canbe realized in different surface texts. By discursive structure or"Discourse" (discours, fabula), narralogists mean the relation between anabstract Story and its concrete expression in a sequential text. Forexample, if the chronological order of the Story is not respected in itstextual recount, we are dealing with the Discourse parameter of order.Other Discourse parameters include the frequency with which Story eventsare evoked, the point of view from which they are narrated (perceived,evaluated,...), or framed narratives with several narrative levels.The Story Generator Algorithms project at the University of Hamburgevaluated several existing Story Generators with respect to theirdiscursive abilities. It became obvious that most Story Generatorsconcentrate on creating a coherent and chronological abstract Story,which is directly mapped onto natural language. This results in apredominance of 1:1 relations between Story and surface, and in mostcases corresponds to a default or zero instantiation of Discourseparameters. As a consequence, Story Generator outputs tend to be veryexplicit and straightforward, and are likely to be perceived as uniformand boring.Narratological expert knowledge might be useful to future enhanced StoryGenerators and to Natural Language Generation systems dealing withnarrative. One of the aims of Computational Narratology is to model thatexpert knowledge. Ideally, narratological knowledge will be integratedinto a Narratological Structurer, as a processing component of anadvanced system that creates narratives. In such a system, theNarratological Structurer will be the interface between a Story Generatorand subsequent Natural Language Generation modules. The talk alsopresents examples of the knowledge that is being modelled.About the Speaker:Birte Lönneker graduated from the University of Hamburg, Germany, with adegree in French with Finno-Ugristics (Finnish) and BusinessAdministration. Since then, her main fields of publication are CognitiveLinguistics and electronic resources for Natural Language Processing,with special focus on frames and metaphors, as well as electronicdictionaries, corpora, and recently part-of-speech tagging. Her PhD onConcept Frames and Relations, also published as a book in 2003, wasco-supervised at the Institute for Romance Languages and at theDepartment of Informatics in Hamburg. For her Slovenian-German onlinedictionary, Birte Lönneker was twice awarded the EURALEX Laurence UrdangAward. From 2002 to 2004, she received various research grants forSlovenia, where she was working in the Corpus Laboratory of the Instituteof Slovenian Language.Since 2004, Birte Lönneker carries out research on Story GeneratorAlgorithms within the Narratology Research Group Hamburg. She is also aboard member of the German Cognitive Linguistics Association.
17 Jun 2005	Gully Burns	The neuroscience laboratory as a knowledge factory: challenges, approaches and tools Time: 10:30 am - 12:00 pm Location: 11 Large Abstract: As a discipline of biology, the field of neuroscience suffers greatly frominformation overload, non-standardization and complexity. In the absenceof a mathematical theoretical structure for the subject, scientists usetheir own ad-hoc methods of collating and synthesizing information fromboth the primary literature and their own data. In order to eventuallyformalize and accelerate the development of theoretical approaches in thesubject, we are combining an Electronic Laboratory Notebook (ELN) withasset management of the primary research literature to construct aknowledge engineering framework based around the organizational unit of aneuroscience laboratory. This project, called ¡NeuroScholar¢(http://www.neuroscholar.org/) is open-source, and is being tested andused in the laboratories of Prof. Larry Swanson and Prof. Alan Watts atUSC. In each laboratory, the system will operate on top of a ¡laboratorycorpus¢ of knowledge resources (data files, full-text pdf files , etc.)that summarizes the relevant knowledge for that laboratory. Not only willthis collection provide a valuable resource for the members of thelaboratory, it provides a platform for natural language processing andknowledge engineering to answer formally-defined research questions. TheSociety for Neuroscience¢s annual meeting attracts over 30,000 attendees,who collectively form potential user-base of this software.I will talk about the ideas underlying the project, the currentimplementation of NeuroScholar, developments from collaboration with thenatural language group at ISI and possible collaborations for the future.
13 Jun 2005	Hal Daume III	Search, Learning and Features (my thesis proposal proposal) Time: 10:30 am - 12:00 pm Location: 11 Small Abstract: I'm going to talk about what I've been working on recently. My thesisproposal is something having to do with the interaction of search,learning and features in supervised natural language problems. I will befocusing on the task of coreference, since it is a well-studied problem,yet nevertheless not really solved and quite difficult. It is also agreat pedagogical example for why we should care about something otherthan standard Markov random fields for structured prediction, since, forthe coreference problem (and pretty much every other "real" naturallanguage problem) inference in such models is intractable. The contents of this talk will be roughly 40% from a paper I have at ICMLthis year on efficient, accurate supervised learning techniques forstructured prediction (and why I feel inclined to make the verycontroversial statement that supervised learning for NLP problems issolved); it will be roughly 40% about an application of this technique tothe coreference resolution problem and an exploration of the feature spacefor solving this problem (submitted to HLT); and it will be roughly 20%about looking forward to what I want to accomplish in the remainder of mythesis, not covered by the first 80%.
10 Jun 2005	Liang Huang (Penn)	Better k-best Parsing, Hypergraphs and Dynamic Programming Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: We discuss the relevance of k-best parsing to recent applications innatural language parsing, and develop algorithms that substantiallyimprove on previously-used algorithms with respect to efficiency,scalability, and accuracy. We demonstrate these algorithms in experimentson Bikel's implementation of Collins' lexicalized PCFG model, and on asynchronous CFG based decoder for statistical machine translation. We showin particular how the improved output of our algorithms has the potentialto improve results from parse reranking systems and other applications.In this talk, I will demonstrate the convergence of several popularparsing formalisms (weighted deduction, shared forest, semiring) under thepowerful hypergraph formalism. If time permits, I will also show howgeneric Dynamic Programming can be formalised as hypergraph searching.Joint work with David Chiang (University of Maryland)
08 Jun 2005	Hao Zhang (Rochester)	Lexicalization and A* Searching for Inversion Transduction Grammar Time: 3:00 pm - 4:30 pm Location: 4th floor Abstract: The Inversion Transduction Grammar (ITG) of cite{DekaiCL} generates asynchronous parse tree for a given pair of sentences in two languages. Byallowing inversion of the order of children at any level of thesynchronous parse tree, ITG can do recursive, systematic word reordering.We made a version of ITG where the nonterminals are lexicalized by wordpairs and the inversions are dependent on the so-lexicalized nonterminals.We found out that after lexicalization, the Alignment Error Rate (AER)against gold standard is reduced for short sentences. ITG parsingcomplexity is high polynomial. We proposed a pruning techique thatutilizes IBM Model 1 to estimate the inside and outside probability of abitext cell. Taking a step further, we applied the A* parsing having beenused for monolingual parsing to ITG. I will talk about the heuristicestimates we used for A* parsing for Viterbi alignment selection anddecoding.
27 May 2005	Radu Soricut	Towards Developing Generation Algorithms for Text-to-Text Time: 3:00 pm - 4:30 pm Location: 11 Small Abstract: We describe a new sentence realization framework for text-to-textapplications. This framework uses IDL-expressions as a representationformalism, and a generation mechanism based on algorithms for intersectingIDL-expressions with probabilistic language models. We present boththeoretical and empirical results concerning the correctness andefficiency of these algorithms.
13 May 2005	Ed Stabler (UCLA)	Natural Logic Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: I will describe some recent work on "natural logics", logics for languagesthat are more similar to human languages than traditional first orderpredicate logic, giving particular attention to questions about what thesyntax encodes about semantic relations among sentences. On everyone'sview, some but not all entailments are syntactically encoded (in a sensethat I will define precisely), but, beyond this starting point,controversy starts almost immediately. Considering some particularexamples, I will sketch methods for addressing some of the basicquestions.
22 Apr 2005	Deepak Ravichandran	Working with Large Corpus, High speed clustering and its applications Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: I am going to be talking about stuff that I have been working over thepast 6-9 months. This includes randomized algorithms and its applicationto 2 NLP problems: noun clustering and noun-pair clustering. I will alsobe commenting on my experience of working with very very large amounts ofreal Natural Language text (This includes processing and working with dataavailable from the web. This corpus is not the standard newspaper textthat we are so used to in the NLP community.) This talk will also cover alarge part of my thesis work.
08 Apr 2005	Jamie Callan (CMU)	Search Engines for HLT Applications Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: TBA
25 Mar 2005	Dagen Wang	Metalinguistic feature study for spontaneous speech in human computer interaction Time: 3:00 pm - 4:30 pm Location: 11 Large (THIS HAS CHANGED!!!) Abstract: Speech is a crucial component in human computer interaction. Whiletremendous progress has been made in automatic speech recognition, speechtranscription -- which is the output of automatic speech recognition -- isfar from providing all the information that one could retrieve fromspeech. For example, prominence, pause, rhythm, and rate of speech allcarry important information in speech and are crucial in speechperception. Inclusion of such information can facilitate better machinerecognition and understanding of speech. In this talk, we will introduce the research effort and result in speechrate, prominence, disfluency and utterance boundary detection. We willalso show some interesting applications utilizing these features innatural language understanding and dialog management.
18 Mar 2005	Ed Hovy	Methodologies of ontology content construction Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: This talk is the second in three tutorial lectures on ontologies. Itfirst shows some details of various Upper Ontologies-ResearchCYC, SUMO,DOLCE, and the Penman Upper Model. It then discusses the problem ofcreating content for the 'Middle Model' zone of ontologies, and outlines amethodology for moving from words to word senses to concepts. Itconcludes by describing ISI's Omega ontology and showing how Omega hasbeen used in annotation projects to support semantic labeling of texts.Please bring a pen or pencil and some paper; there is a small exercise!
18 Feb 2005	Inderjeet Mani (Georgetown)	TBA Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: TBA
14 Feb 2005	Tim Chklovski	Collecting Broad-Coverage Knowledge Bases from Volunteers Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: (Note that this is a MONDAY!)
11 Feb 2005	Hae-Chang Rim	Unsupervised Word Sense Disambiguation Using Wordnet Relatives Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract:
28 Jan 2005	Yutaka Sasaki (ATR)	Research Activities in Speech Translation at ATR/QA as Question-Biased Term Extraction Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: This talk has two parts. In the first part, I will introduce researchactivities in Speech-to-Speech Translation at ATR, including on-goingresearch on statistical machine translation. In the second part, I willpresent a new approach to QA named Question-Biased Term Extraction (QBTE).The QBTE directly extracts answers as terms biased by the question. Toconfirm the feasibility of our QBTE approach, we conducted experiments onthe CRL QA Data based on 10-fold cross validation, using Maximum EntropyModels as an ML technique. Experimental results showed that the trainedsystem achieved approximately 0.35 in MRR and 50% in TOP5 accuracy. Thispart is an English version of my presentation given in IPSJ SIGNL-163 in2004 in Japanese. If time allows, I would like to introduce the NTCIR-5(2004/2005) Cross-Lingual QA task (CLQA) that I am going to organize.About the speaker:Yutaka Sasaki received his Ph.D. in Engineering from the University ofTsukuba, Japan in 2000 for his work on generating Information Extractionrules with hierarchically sored Inductive Logic Programming. He joined NTTLaboratories in 1988. Since then, he was involved in research inrule-based CAI, inductive logic programming, Information Extraction, andQuestion Answering. From 1995 to 1996, he spent one year at Simon FraserUniversity, Canada as a visiting researcher. From 1999, he led a subgroupto develop the first practical Japanese Question Answering System SAIQA.Then, he applied SVMs to automatically construct the QA system SAIQA-IIfrom QA and NE data. In June 2004, he moved to ATR Spoken LanguageTranslation Research Laboratories. Currently, he is the head of Departmentof Natural Language Processing. He is also an organizer of the NTCIR 5Cross-Lingual Question Answering Task.
17 Dec 2004	Nicola Ueffing	Word-Level Confidence Measures for SMT Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: This talk will address the problem of assessing the correctness of MToutput on the word level. I will give an overview on word confidencemeasures for SMT. Different variants of word posterior probabilities thatcan be directly used as confidence measure will be presented. Theirconnection with the Bayes decision rule and the underlying error measurewill be shown. Experimental comparison of different word confidencemeasures will be presented on a translation task consisting of technicalmanuals.Additionally, I will show how word confidence measures can be applied inan interactive SMT system. This system predicts translations, taking partsof the sentence into account that have already been accepted or typed bythe user. Through the use of confidence measures, the performance of theprediction engine can be improved.About the Speaker:Nicola Ueffing is a graduate research assistant at the group for "HumanLanguage Technology and Pattern Recognition" (Lehrstuhl fuer InformatikVI) at RWTH Aachen University. She received her diploma in mathematicsfrom RWTH Aachen University in 2000. Her research topic is statisticalmachine translation, focusing on confidence measures for SMT. In 2003, shewas a member of the team working on "Confidence Estimation for SMT" at theCLSP workshop at JHU.
10 Dec 2004	Nick Mote	Developing a Language Model for Second Language Learner Speech Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: ISI's Tactical Language Project is a system designed to teach Americanshow to speak Arabic through a video game environment. We've taken a FPSengine (Unreal 2003) and re-did the graphics so it looks like you're in atypical Lebanese village. We took away the guns, added speech recognition,and set the player in the middle of it all. The theory is that if youlearn well in a classroom, you'll perform well in a classroom, but if youlearn well in a pseudo-naturalistic environment, you'll perform better inreal life.In a pedagogical context, speech recognition is a hard thing we're tryingto recover signal from noisy language-learner speech--with all of itsmispronunciations, disfluencies, and grammatical errors . Languageunderstanding is hopeless unless you have a good approximation of whatkinds of mistakes learners make, and you can build a system to anticipatethem.Suppose an English language learner says "Water". Is he asking you forwater? Is he telling you there's a puddle in front of you? Is he sayinghis name is "Walter", but with horrible pronunciation? There's a lot ofambiguity involved. In order to disambiguate, we need to look at thespeech signal itself, the utterance's context, the learner's past languageperformance, and details about the learner's mother language as it relatesto English, etc., etc... Only then can we hope to guess what the learneris actually trying to say.And then, of course, once we've made a good guess at the learner's speechintentions, what do we do about it? How do we correct him? How do webalance the consideration of inherent qualities of learner motivation,language errors, learning objectives, and possibly low-confidence speechrecognition, as we generate good pedagogical feedback?This is NLP (primarily statistical) with a bit of pedagogy theory andlinguistic (SLA and phonology) theory sprinkled in.
19 Nov 2004	Chin-Yew Lin	After TIDES, What's Left? - Finding Basic Elements Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: As DARPA's TIDES (Translingual Information Detection, Extraction, andSummarization) program coming to an end, I will give a summary of what wehave learned from TIDES in summarization and a brief overview of ourcurrent effort in developing automatic evaluation methods that go beyondsurface n-gram matching. Topics to be covered: (1) Summary of DUCs 2001 - 2004(2) Automatic Evaluations in Summarization and MT(3) Basic Elements - New Efforts in Summarization at ISI
15 Nov 2004	Thiago Pardo	Unsupervised learning of verb argument structures Time: 3:00 pm - 4:30 pm Location: 8th floor multipurpose room (#849) -- NOT the conference room Abstract: In this talk, I'll present the investigation I'm carrying out in ISIlately under Daniel Marcu's supervision. Following the noisy-channelframework, we propose a statistical model for learning the argumentstructures of verbs automatically. We show that we are able to learn bothlexicalized and generalized structures and achieve good results, relyingonly on basic NLP tools like a POS tagger and named-entity recognizer. Wealso present a comparison of the structures we learn with the predictedones in PropBank.
12 Nov 2004	Dragomir Radev	Words, links, and patterns: novel representations for Web-scale text mining Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Textual data is everywhere, in email and scientific papers, in onlinenewspapers and e-commerce sites. The Web contains more than 200 terabytesof text not even counting the contents of dynamic textual databases. Thisenormous source of knowledge is seriously underexploited. Textualdocuments on the Web are very hard to model computationally: they aremostly unstructured, time-dependent, collectively authored, multilingual,and of uneven importance. Traditional grammar-based techniques don'tscale up to address such problems. Novel representations and analyticaltools are needed.I will discuss several current projects at Michigan related to text miningfrom a variety of genres. Depending on the amount of time, I will talkabout (a) lexical centrality for multidocument summarization, (b)syntax-based sentence alignment, (c) graph-based classification,(d)lexical models of Web growth, and (e) mining protein interactions fromscientific papers. As it turns out, the right representations, whencomplemented with traditional NLP and IR techniques, turn many of theseinto instances of better studied problems in areas such as socialnetworks, statistical mechanics, sequence analysis, and computationalphylogenetics.About the Speaker:Dragomir R. Radev is Assistant Professor of Information, ElectricalEngineering and Computer Science, and Linguistics at the University ofMichigan, Ann Arbor. He leads the CLAIR (Computational LingusiticsAnd Information Retrieval) group which currently includes 12undergraduate and graduate students. Dragomir holds a Ph.D. inComputer Science from Columbia University. Before joining Michigan,he was a Research Staff Member at IBM's TJ Watson Research Center inHawthorne, NY. He is the author of more than 45 papers on informationretrieval, text summarization, graph models of the Web, questionanswering, machine translation, text generation, and informationextraction. Dr. Radev's current research on probabilistic andlink-based methods for exploiting very large textual repositories,representing and acquiring knowledge of genome regulation, andsemantic entity and relation extraction from Web-scale text documentcollections is supported by NSF and NIH. Dragomir serves on theHLT-NAACL advisory committee, was recently reelected as treasurer ofNAACL, is a member of the editorial boards of JAIR and InformationRetrieval, and is a four-time finalist at the ACM internationalprogramming finals (as contestant in 1993 and as coach in1995-1997). Dragomir received a graduate teaching award at Columbiaand recently, the U. of Michigan award for Outstanding ResearchMentorship (UROP).
05 Nov 2004	Mary Wood (Manchester)	A Human-Computer Collaborative Approach to Computer Aided Assessment Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: The ABC (Assess by Computer) system has been developed and used in theSchool of Computer Science at the University of Manchester for formativeand (principally) summative assessment at undergraduate and postgraduatelevel. We believe that fully automatic marking of constructed answers -especially free text answers - is not a sensible aim. Instead - drawing onparallels in the history of machine translation - we take a"human-computer collaborative" approach, in which the system does what itcan to support the efficiency and consistency of the human marker, whokeeps the final judgement.Our current work focuses on what are generally referred to as "short textanswers" as contrasted to "essays". However we prefer to contrast"factual" with "discursive" answers, and speculate that the former may beamenable to simple statistical techniques, while the latter require moresophisticated natural language analysis. I will show some examples of realexam data and the techniques we are using and developing to handle them.
22 Oct 2004	Jerry Hobbs	Like Now: Two Explorations in Deep Lexical Semantics Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: As part of an effort to encode the commonsense knowledge we need innatural language understanding, I have been looking at several very commonwords and their uses in diverse corpora, and asking what we have to knowto understand this word in this context. In this talk, I will describethe investigations of the uses of two words -- the adverb "now" and thepreposition "like".One might think that "now" simply expresses a temporal property of anevent. But in fact in almost every instance, it is used to point up acontrast -- "This is true now. Something else was true then." It is thusmore of a relation than a property. I will describe several categories ofsuch relations. Another question of interest about "now" is "How long aperiod is the word "now" describing in its various uses?": "I'm typing anabstract now" vs. "We travel by automobile now." I suggest somecategories of knowledge that need to be encoded to answer this question.When we successfully understand "A is like B", we have figured out someproperty that A and B have in common. How can we find that propertycomputationally? In the data I looked at, in 80% of the instances, theproperty is explicit in the nearby text, and I will talk about how we canidentify it. For the remainder I examine the knowledge we would need inorder to infer the common property.
24 Sep 2004	Hal Daume III	Domain Adaptation in Maximum Extropy Models Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: I will present some preliminary results on the problem of domainadaptation in maximum entropy models, specifically in the case when thereis a large amount of "out of domain" data, and only a very small amount of"in domain" data. The model and algorithms I present are based on thetechnique of conditional Expectation Maximization (CEM) and allow forrelatively fast optimization of these models. Preliminary results on sometasks are quite promising.
17 Sep 2004	Various	About Syntax Fest 2004 (Part II) Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: This summer we held a three-month workshop on syntax-driven machinetranslation, in which we learned syntactic transformations automaticallyfrom Chinese/English translated corpora and applied them to translate newtext. We'll give a progress report!
10 Sep 2004	Various	About Syntax Fest 2004 (Part I) Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: This summer we held a three-month workshop on syntax-driven machinetranslation, in which we learned syntactic transformations automaticallyfrom Chinese/English translated corpora and applied them to translate newtext. We'll give a progress report!
16 Aug 2004	Patrick Pantel & Tim Chklovski	VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations Time: 2:00 pm - 3:30 pm Location: 11 Large Abstract: Broad-coverage repositories of semantic relations between verbs couldbenefit many NLP tasks. We present a semi-automatic method for extractingfine-grained semantic relations between verbs. We detect similarity,strength, antonymy, enablement, and temporal happens-before relationsbetween pairs of strongly associated verbs using lexico-syntactic patternsover the Web. On a set of 29,165 strongly associated verb pairs, ourextraction algorithm yielded 65.5% accuracy. We provide the resource,called VerbOcean, for download at http://semantics.isi.edu/ocean/. We willalso discuss current work on disambiguating the verbs in the network aswell as refining the semantic relations using path analysis.
13 Aug 2004	Deepak Ravichandran	Randomized algorithms and its application to NLP Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: The last decade has seen a plethora of papers in NLP devoted to MachineLearning algorithms. However, most of these papers have devoted theireffort exclusively to improving the system performance on the accuracyaxis. Most of the sophisticated NLP algorithms are extremely slow and donot scale up easily when applied to large amounts of data.I will talk about the importance of randomized algorithms and theirpotential in speeding up some NLP algorithms. This talk will be a surveyof some recent advances in Theoretical Computer Science/Math seen with anNLP point-of-view. I am not going to present any results. But I am hopingthat this talk will clarify my thinking process, get feedback from peopleand help me colloborate with others.
09 Aug 2004	Justin Busch, Hai Huang, Jens Stephan & Chen-kang Yang	CL Student Presentations Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Justin Busch:Weight and Semantic Class Issues in Japanese Noun Phrase OrderingMany current designs for automatic parsers learn probabilities for therelative frequencies of parts-of-speech and syntactic rules, and this hasproven to be generally reliable. In spite of the ubiquity of probabilistictechniques for parsing, however, little attention has been given to thelinguistic significance of the probabilistic data and what it might sayabout human performance.Hawkins proposes a general theory of grammaticalization based on theminimization of syntactic domains. Given that a sentence of any languagewill contain at least one noun phrase, one verb, and possibly additionalnoun phrases and prepositional phrases, "minimize domains" suggests thatthese phrases will order themselves according to whichever patternrequires the least effort to recognize the higher syntactic structure ofthe sentence. These effects are directly measurable through corpusstatistics, and can be interpreted as potential heuristics forprobabilistic parsers. In this study, we examine Japanese data from theKyoto Treebank and test Hawkins' predictions for noun phrase ordering bynoun phrase weight as well as by generic semantic types. The discussionwill focus primarily on how accurately Hawkins' predictions are reflectedin the corpus statistics, and will conclude with observations about howthey might be applied to the decision mechanisms of probabilistic parsers.--------------------------------------------------------------------------Hai Huang:TBA--------------------------------------------------------------------------Jens Stephan:Evaluation and Visualization of a Dialogue SystemEvaluations have become a necessary standard to almost any type ofresearch. However, there are many areas where there is no common agreementon how to evaluate, which is the case for complex problem of evaluatingdialogue systems. The evaluation of the multi party multi modal dialoguesystem MRE(1) provides a good example of what questions are important forsuch an evaluation, how to actually do the evaluation and finally how tohow make special problems of the system visible to use the evaluationresults to improve the systems performance.After a brief introduction of the MRE domain and architecture, I willbreak the task town to a set of general evaluation questions. From there Iwill explain what kinds of metrics and visualizations are suited to answerthose questions and what kind of data is needed, as well as how that datawas obtained. Along the road, examples of actual system problems andperformances will be presented. The topics of data formatting andvisualization will receive some special attention by introducing the MREEvaluation Toolkit as well as the corpus it operates on.--------------------------------------------------------------------------Chen-kang Yang:Using the Omega Ontology to Determine Selectional Restrictions for Word Sense Disambiguation Word sense disambiguation is fundamental for language processing. Thoughpurely statistical methods are effective for this task, they neglect thesyntactic and semantic aspects. In this study, we adopt a hybrid approachby applying an unsupervised machine learning method to learn verbsselectional restrictions on their subjects/objects. The system then usesthese learned selectional restrictions for word sense disambiguation ofthe subjects/objects. Instead of words, the training data containsontological taxonomy hierarchies that are retrieved from the Omegaontology. Unlike other similar systems, we are able to automatically findthe best match among classes from different levels of the ontology. Thisprovides us more flexibility and is closer to human instinct. Our systemperforms better than other similar systems, though it still needscooperating methods for better results.
06 Aug 2004	Hae-Chang Rim	Information Retrieval using Word Senses: Root Sense Tagging Approach Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Information retrieval using word senses is emerging as a good researchchallenge on semantic information retrieval. In this presentation, I amgoing to propose a new method using word senses in information retrieval:root sense tagging method. This method assigns coarse-grained word sensesdefined in WordNet to query terms and document terms by unsupervised wayusing co-occurrence information constructed automatically. The sensetagger is crude, but performs consistent disambiguation by consideringonly the single most informative word as evidence to disambiguate thetarget word. We also allow multiple-sense assignment to alleviate theproblem caused by incorrect disambiguation.Experimental results on a large-scale TREC collection show that theproposed approach to improve retrieval effectiveness is successful, whilemost of the previous work failed to improve performances even on smalltext collection. The proposed method also shows promising results when iscombined with pseudo relevance feedback and state-of-the-art retrievalfunction such as BM25.
16 Jul 2004	Hal Daume III and Radu Soricut	Practice Talks for ACL (+workshops) Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: TBA
09 Jul 2004	Kevin Knight	Survey of Trees and Grammars Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: I'll give a survey of trees and grammars, at least the parts that seemmost relevant to ongoing work at ISI. This will be a theory talk. I'llstart with context-free grammars, which were developed in the 1950s, andcover other tree-generating systems. I'll also talk abouttree-transforming systems.
02 Jul 2004	Hal Daume III	A Phrase-Based HMM Approach to Document/Abstract Alignment Time: 1:30 pm - 3:00 pm Location: 11 Large Abstract: I will present work that extends the standard hidden Markov model to aversion that can emit multiple symbols in a single time step. Using thismodel, we are able to automatically create phrase-to-phrase mappings in analignment process. I've applied this model to the task of creatingalignments between documents and their human-written abstracts, yieldingan overall alignment F-score of 0.548, a significant improvement on thebest results to date of 0.363. These results are published in an EMNLPpaper this year, but the talk will be an extended version of the talk Iwill give there (namely, I will discuss the mechanics of the extended HMMin more detail in this seminar).
25 Jun 2004	Dan Gildea	Syntactic Supervision and Tree-Based Alignment Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Tree-based probability models of translation have been proposed to takeadvantage of parse trees on one, both, or neither sides of a parallelcorpus. I will present comparative results for these three approaches forthe task of word alignment on Chinese-English and French-English data, aswell as some analysis of what is going on behind the numbers.
21 Jun 2004	Emil Ettelaie	Speech-to-Speech Translation: A Phrase Classification Approach Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: This talk will be about automatic speech-to-speech translation. In oursystem, a doctor speaks one language, the patient speaks another language,and the machine translates their utterances from one language to theother. The talk will be followed by a demo of our system.One approach we have been successful with is phrase classification, i.e.,classifying a noisy speech-recognized utterance into one of many meaningcategories. Phrase classification is computationally cheap and canprovide high quality translations for in domain utterances almostinstantaneously. Speed is important for speech translation, whereprocessing delay is a great concern.In this talk, different aspects of building a classification-based speechtranslator are discussed. Following an overview of automaticspeech-to-speech translation and its challenges, a comparison of differentclassification methods is presented and data collection techniques forthat application are introduced.
17 Jun 2004	Marcello Federico	Statistical Machine Translation at ITC-irst Time: 3:00 pm - 4:30 pm Location: 4th Floor Abstract: My presentation will overview recent activities on Chinese-English SMTcarried out at ITC-irst (Trento, Italy). After an overview of thecomplete architecture of our system, I will focus on progress made inChinese word-segmentation, phrase-based modeling and decoding, log-linearmodeling and minimum error training, and language model adaptation.Experimental results will be provided in terms of Bleu and Nist scores ontwo translation tasks: basic traveling expressions and news reports,respectively adopted by the C-STAR consortium and for the 2002 and 2003NIST MT evaluation campaigns. Bio: Marcello Federico has been a permanent researcher at ITC-irst since 1991.During 1998-2003, he led the "Multilingual natural speech technologies"(MUNST) research line at ITC-irst. Since 2004, he is head of the"Cross-language information processing" (Hermes) research line. Hisinterests include automatic speech recognition, statistical languagemodeling, information retrieval, and machine translation.
24 May 2004	Philipp Koehn	Challenges in Statistical Machine Translation Time: 4:00 pm - 5:00 pm Location: 11 Large Abstract: In the last years a standard model in statistical machinetranslation has emerged, which is based on the translationof sequences of words (so-called "phrases") at a time.I will describe this model, how to train and decode with it,but the focus of this talk will be how to address thechallenges to advance and move beyond the model: my thesiswork on noun phrase translation, making use of syntax, andbetter modeling, such as discriminative training. Bio: Philipp Koehn is the author of papers on natural languageprocessing, machine translation, and machine learning. Hereceived his PhD from the University of Southern Californiain 2003 (advisor: Kevin Knight), and is currently employed asa postdoc at the Massachusetts Institute of Technology, workingwith Michael Collins. He has worked at AT&T Laboratories ontext-to-speech systems, and at WhizBang! Labs on textcategorization.
21 May 2004	Tom Murray and Rahul Bhagat	Statistical Learning for Dialogue System and A Community of Words Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Natural Language Understanding: A fast and accurate Statistical Learning Approach for Dialogue Systems Natural Language Understanding (NLU) is an essential module of a gooddialogue system. To achieve satisfactory performance levels, real timedialogue systems need the NLU module to be both fast and accurate. FiniteState Model (FSM) based systems are fast and accurate but lack robustnessand flexibility. The Statistical Learning Model (SLM) based systems arerobust and flexible but lack accuracy and are at most times slow.In this talk, I am going to talk about an SLM based NLU approach fordialogue utterances that is both accurate and fast. The system has highaccuracy and produces frames in real time. A Community of Words: Understanding Social Relationships from E-mail A corpus of e-mail messages presents a number of challenges for NLPtechniques, with its nearly unconstrained structure and vocabulary,mistyped words and ungrammatical sentences, and extensive contextualinformation that is never explicitly stated. Yet, the intrinsically socialnature of such communication provides an opportunity to study not just abag of words, but also the relationships, competencies, and activitiesbehind them.This talk presents work with Eduard Hovy as part of the MKIDS project.
30 Apr 2004	Liang Zhou	Automating the Building of Summarization Systems Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Summarization requires one to identify the internal structure ofinformation and to bring that to the surface both operationally andorganizationally.How does one put this theory to practice and build real summarizationsystems? How do the systems built based on this idea perform?
28 Apr 2004	Dragos Muntanu, Radu Soricut and Hal Daume III	Practice Talks for HLT/NAACL Time: 3:00 pm - 5:00 pm Location: 11 Large Abstract: TBA
23 Apr 2004	Hal Daume III	A Tree-Position Kernel for Document Compression Time: 3:00 pm - 4:00 pm Location: 10 Large Abstract: I'll describe our entry into the DUC 2004 automatic document summarizationcompetition. We competed only in the single document, headline generationtask. Our system is based on a novel kernel dubbed the tree positionkernel, combined with two other well-known kernels. Our system performswell on white-box evaluations, but does very poorly in the overall DUCevaluation. C'est la vie.
16 Apr 2004	Rada Mihalcea (UNT)	Graph-based Ranking Algorithms for Language Processing Time: 10:30 am - 12:00 pm Location: 11 Large Abstract: Although we live in a predominantly statistical world, there are stillmany language processing applications that long for accuraterepresentations of text meaning. Even applications that found partialsolutions in statistical modeling, including information retrieval,machine translation, or automatic summarization, are likely to get asignificant boost from deeper text understanding.In this talk, I will present an innovative method for automatic extractionof conceptual graphs as a means to represent text meaning. The methodrelies on a novel adaptation of graph-based ranking algorithms -traditionally (and successfully) used in citation analysis, Web pageranking, and social networks. I will show how such algorithms can beadapted to semantic networks, resulting in an efficient unsupervisedmethod for resolving the semantic ambiguity of all words in open text, andidentifying relations between entities in the text. I will also outline anumber of applications that are enabled by this representation, includingkeyphrase extraction, domain classification, and extractive summarization. BIO: Rada Mihalcea is an Assistant Professor of Computer Science atUniversity of North Texas. Her research interests are in lexicalsemantics, minimally supervised natural language learning, andmultilingual natural language processing. She is currently involved in anumber of research projects, including word sense disambiguation, shallowsemantic parsing, (non-traditional) methods for building annotated corporawith volunteer contributions over the Web, word alignment for languagepairs with scarce resources, and graph-based ranking algorithms forlanguage processing. Her research is supported by NSF and the state ofTexas.
13 Apr 2004	Jill Burstein (ETS)	Automated Essay Evaluation: From NLP research through deployment as a business Time: 3:00 pm - 4:30 pm Location: 4 Large Abstract: Automated essay scoring was initially motivated by its potential costsavings for large-scale writing assessments. However, as automated essayscoring became more widely available and accepted, teachers and assessmentexperts realized that the potential of the technology could go way beyondjust essay scoring. Over the past five years or so, there has been rapiddevelopment, and commercial deployment of automated essay evaluation forboth large-scale assessment and classroom instruction. A number offactors contribute to an essay score, including varying sentencestructure, grammatical correctness, appropriate word choice, errors inspelling and punctuation, use of transitional words/phrases, andorganization and development. Instructional software capabilities existthat provide essay scores and evaluations of student essay writing in allof these domains. The foundation of automated essay evaluation softwareis rooted in NLP research. This talk will walk through the development ofCriterionSM, e-rater, and Critique writing analysis tools, automated essayevaluation software developed at Educational Testing Service - from NLPresearch through deployment as a business.(Preview of an HLT/NAACL-2004 Invited Speaker Presentation)Jill BursteinEducational Testing ServicePrinceton, NJ
09 Apr 2004	Eduard Hovy	Three (and a half?) Trends: The Future of NLP Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: An interesting (disturbing?) new trend is beginning to manifest itself inNLP, one that is focused on performance and hence very attractive in thecontext of inter-system competitive evaluations such as TREC and DUC, butone that does not provide much insight about language or NLP methods tothe researcher interested in these topics. This addition of a newparadigm to NLP has implications for all of us.
02 Apr 2004	Stephan Vogel	The CMU Statistical Machine Translation System Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: The presentation will give an overview of the SMT activities at theLanguage Technologies Institute, Carnegie Mellon University, in largevocabulary text translation tasks, esp. the Chinese-English andArabic-English, as well as in limited domain speech-to-speech translationtasks. The CMU SMT system is, like most modern statistical MT systems,based on phrase translation. Several approaches have been developed toextract the phrase pairs from parallel corpora and current researchinvestigates different scoring approaches for these translation pairs.Details of the decoder, esp. on hypothesis recombination, pruning, andefficient n-best list generation will be given. Recently, the SMT systemhas been extended to use partial translations generated from example basedand grammar based translation system, thereby performing multi-enginemachine translation. Bio: Stephan Vogel is a researcher at the Language Technologies Institute,Carnegie Mellon University, where he heads the statistical machinetranslation team. He received a Diploma in Physics from PhilipsUniversity Marburg, Germany, and a Masters of Philosophy from theUniversity of Cambridge, England. After working for a number of years onthe history of science, he turned to computer science, especially naturallanguage processing. Before coming to CMU, he worked for several years atthe Technical Univerity of Aachen on statistical machine translation, andalso in the Interactive Systems Lab at the University of Karlsruhe.
26 Mar 2004	Shlomo Argamon	On Writing, Our Selves: Explorations in Stylistic Text Categorization Time: 1:30 pm - 3:00 pm Location: 11 Large Abstract: This talk will survey results of several recent projects we have beenundertaking in automated text categorization based upon the style,rather than the topic, of the documents. I will describe a generaltext-categorization framework using machine learning along with generalprinciples for choosing stylistically relevant sets of features forlearning effective classification models. Applications of these methodsinclude determining author gender and text genre in published books andarticles, authorship attribution of email messages, and analysis oflanguage use in different scientific fields. In many cases, the modelsthat are learned also give some insight into the respective styles beingdistinguished, which I will also discuss.Shlomo Argamon is an associate professor at the Illinois Institute ofTechnology Chicago.
25 Mar 2004	Jon Patrick (U. of Sydney)	ScamSeek: Capturing Financial Scams at the Coalface by Language Technology Time: 10:30 am - 12:00 pm Location: 11 Large Abstract: The Scamseek project aims to build a surveillance tool for identifyingfinancial scams on the Internet by performing document classification ofInternet pages. There are three principle types of documents of concern:those that give financial advice by unregistered advisors, unlawfulinvestment schemes, and share ramping.The first phase of the project has been completed and a working system,known as ScamAlert installed at the Australian Securities and InvestmentCommission (ASIC). The independent audit of the performance of the systemproved satisfactory with a result for precision of .75, recall .43, andF=. 54, along with identification of 4 scams misclassified by the client.Significant improvement in recall is foreshadowed in the 2nd phase of theproject. The results are satisfying in the context of the structure ofthe data where the density of scam documents is about 1.8% of the totalcorpus.The good performance of the operational system is ascribed to thecombination of using a strong linguistic model of language (SystemicFunctional Linguistics) to define the scam documents in parallel with arich statistical analysis of the structure of non-scam documents and scamlook-alikes. A large amount of the experimental program has concentratedon understanding and exploiting the interaction between the linguisticallydescribed aspects of the documents and the statistical properties. Eachtype of data has been used to inform and modify the usage of the other.The operational aspects of the project have proven to be as challenging asthe research objectives. The project has a budget of $2.2M over 15 months.It has been managed so as to create a balance in resources between theneeds of both the research objectives and the engineering objectives.Software development has concentrated on three aspects. Firstly, toproduce an environment for the strong directive management ofcomputational linguistics experiments, secondly, in the aid of thelinguists to create tools to support their manual analysis, and thirdlythe best practice of software engineering principles to ensure a cleanautomated rollout of the production system for ASIC. The contributing partners in the Scamseek project are The Capital MarketsCo-operative Research Centre (CMCRC), ASIC, the University of Sydney andMacquarie University.
12 Mar 2004	Deepak Ravichandran	About My Thesis Proposal Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: TBA
20 Feb 2004	Hal Daume III	Some Results in Automatic Evaluation for Summarization and MT Time: 3:00 pm - 4:00 pm Location: 4 Large Abstract: I will be presenting some recent results of mine regarding the possibilityof automatic evaluation in summarization. I will discuss both my ownfindings, as well of those of people here and at Columbia, and attempt toexplain in a principled fashion why there are disparate opinions on theplausibility of performing automatic evaluation in this task. I willdiscuss my (perhaps pessimistic) views on the plausibility of doing anysort of evaluation of summarization, automatic or otherwise.The results and experimental setups developed in connection withsummarization will be extended to the machine translation. I will reviewpossible reasons why metrics such a bleu have experienced significantlymore success in machine translation than in summarization. I will alsoconnect the evaluation criterea developed in the context of summarizationto machine translation, and discuss the automation of these methods.In short: I'll talk about why I've been doing so much data elicitaitonrecently.This will be a highly informal seminar and participation is highlyencouraged.
06 Feb 2004	Mark Hopkins	What's in a Translation Rule? Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: We propose a theory that gives formal semantics to word-levelalignments defined over parallel corpora. We use our theory tointroduce a linear algorithm that can be used to derive fromword-aligned, parallel corpora the minimal set of syntacticallymotivated transformation rules that explain human translation data.(joint work with Michel Galley, Kevin Knight, and Daniel Marcu)
30 Jan 2004	Paul Kingsbury (Penn)	PropBank: the next stage of Treebank and Inducing a Chronology of the Pali Canon Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: PropBank: the next stage of TreebankNatural-language engineers the world over are coming to a consensus that adegree of semantic knowledge is a necessary addition to purely structuralrepresentations of language. This talk describes the Propbank project atPenn, which provides a complete shallow semantic parse of the Treebank IIcorpus.Inducing a Chronology of the Pali Canon:Works such as Kroch (1989), Taylor (1994) and Han (2000) have demonstratedthat syntactic change can be described mathematically as the competitionbetween innovating and archaic formations. This paper demonstrates howthis same mathematical description can be turned around to predict thedate of a historical text. The Middle Indic period showed dramatic changein the morphological system, such as the collapse of the past-tense verbalsystem. Whereas Sanskrit had three competing formations, each withmultiple possible morphological realizations, Pali (a Middle Indo-Aryanlanguage) had only a single formation, based mostly on the sigmatic aoristalthough many archaic nonsigmatic aorists are also attested. Theproportions of the archaic and innovative forms can be easily calculatedfor each text in the Pali Canon and these proportions used to assign anapproximate date for each text. The accuracy of the method can beassessed qualitatively by comparing the derived chronology to chronologiesbased on various non-linguistic criteria, or quantitatively by comparingthe derived chronology to a known dating scheme. For the latter it isnecessary to turn to a different dataset, such as that describing the riseof do-support in Early Modern English, as described in Ellegard (1953) andKroch (1989). Bio: Paul Kingsbury graduated summa cum laude in linguistics from Ohio StateUniversity in 1993 with a thesis on "Some sources for L-words inSanskrit". He subsequently entered the University of Pennsylvania tostudy historical linguistics and Sanskrit, but (like most historicalstudents) was diverted to computational issues. He joined the Propbankproject in 2000 and soon thereafter engineered a major rethinking of themethods and goals of the project, in order to make the annotationlinguistically meaningful. He completed his doctorate in 2002 with athesis entitled 'The Chronology of the Pali Canon: the case of theaorist'.
16 Jan 2004	John Prager (IBM)	Using Constraints to Improve Question-Answering Accuracy Time: 2:00 pm - 3:00 pm Location: 11 Large Abstract: Leading Question-Answering systems employ a variety of means to boost theaccuracy of their answers. Such methods include redundancy (getting thesame answer from multiple documents/sources), deeper parsing of questionsand texts (hence improving the accuracy of confidence measures),inferencing (proving the answer from information in texts plus backgroundknowledge) and sanity-checking (verifying that answers are consistent withknown facts). To our knowledge, however, no QA system deliberately asksadditional questions in order to derive constraints on the answers to theoriginal questions. We present in this talk the method of QA-by-Dossier-with-Constraints (QDC).This is an extension of the simpler method of QA-by-Dossier, in whichdefinitional questions ("Who/what is X") are addressed by asking a set ofquestions about anticipated properties of X. In QDC, the collection ofDossier candidate answers, along with possibly other answers to questionsasked expressly for this purpose, are subjected to satisfying a set ofnaturally-arising constraints. For example, for a "Who is X" question, thesystem will ask about birth, accomplishment and death dates, which, if theyexist, must occur in that order, and also obey other constraints such aslifespan. Temporal, spatial and kinship relationships seem to beparticularly amenable to this treatment, but it would seem that almost any"factoid" question can benefit from QDC. We will discuss the setting-upand application of constraint networks, and talk about how (and whether) todevelop the constraint sets automatically. We will demonstrate severalapplications of QDC, and present one evaluation in which the F-measure fora set of questions improved with QDC from .39 to .69.
19 Dec 2003	Robert Krovetz (Ask Jeeves)	More than One Sense Per Discourse Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: Previous research has indicated that when a polysemous word appears twoor more times in a discourse, it is extremely likely that they will allshare the same sense (Gale et al. 92). However, those results werebased on a coarse-grained distinction between senses (e.g, {emsentence} in the sense of a `prison sentence' vs. a `grammaticalsentence'). I conducted an analysis of multiple senses within twosense-tagged corpora, Semcor and DSO. These corpora used WordNet fortheir sense inventory. I found significantly more occurrences ofmultiple-senses per discourse than reported in (Gale et al. 92) (33%instead of 4%). I also found classes of ambiguous words in which asmany as 45% of the senses in the class co-occur within a document. Iwill discuss the implications of these results for the task ofword-sense tagging and for the way in which senses should berepresented.
25 Nov 2003	Hang Li (MSR Beijing)	Using Bilingual Data to Mine and Rank Translations Time: 10:30 pm - 12:00 pm Location: 11th Floor Large Abstract: In this talk, I will introduce some of the technologies whichwe have developed in the project on an English reading assistant systemcalled English Reading Wizard. The technologies include a method formining translations from web (unparallel corpora), a method for wordtranslation disambiguation based on bootstrapping, which is calledBilingual Bootstrapping, and a general method of bootstrapping, which iscalled Collaborative Bootstrapping. First, I will introduce the mainfeatures of English Reading Wizard. Next, I will introduce each of themethods. The translation mining method is based on a naïve Bayesianensemble and the EM algorithm. Bilingual Bootstrapping uses theasymmetric translation relationship between words in the two languagesin translation and can construct reliable classifiers for wordtranslation disambiguation. Collaborative Bootstrapping contains theco-training algorithm as its special case, and it uses the strategy ofuncertainty reduction in training of the two classifiers. Bio: Hang Li is a researcher at the Natural Language Computing Groupof Microsoft Research in Beijing, China. He is also adjunct professor ofXian Jiaotong University. Hang Li obtained a B.S. in ElectricalEngineering from Kyoto University (Japan) in 1988 and a M.S. in ComputerScience from Kyoto University in 1990. He earned his Ph.D. in ComputerScience from the University of Tokyo in 1998. >From 1990 to 2001, HangLi worked at the Research Laboratories of NEC Corporation in Kawasaki,Japan. He joined Microsoft Research in 2001. His research interestincludes statistical learning, natural language processing, data mining,and information retrieval. Hang Li's web site:http://research.microsoft.com/users/hangli/
17 Nov 2003	Dr. Kato and Dr. Fukomoto (NTCIR)	An Overview of the QA Challenge + NTCIR -- The Way Ahead Time: 10:30 am - 12:00 pm Location: 4th Floor Abstract: An Overview of Question Answering ChallengeJun'ichi Fukumoto and Tsuneaki KatoIn this talk, we will present an overview of Question AnsweringChallenge(QAC), which is the question answering task of the NTCIRWorkshop. QAC-1 (the first evaluation of QAC) was carried outat NTCIR Workshop 3 in October 2002, and QAC-2 will be atNTCIR Workshop 4 in December 2003. In the QAC, systems to beevaluated are expected to return exact answers consisting of a nounor noun compound denoting, for example, the names of persons,organizations, or various artifacts or numerical expressions suchas money, size, or date. Those basically range over the NamedEntity (NE) elements of MUC and IREX but is not limited to them.QAC consists of three kinds of subtasks: Task 1, where the systemsare allowed to return ranked five possible answers; Task 2, wherethe systems are required to return a complete list of answers; andTask 3, the systems are required to answer series of questions, thathave anaphora and zero-anaphora. We will present the results ofQAC-1, and vision and prospect of QAC-2.NTCIR -- the Way AheadNoriko KandoDr. Noriko Kando is the leader of NTCIR(Test Collections and Evaluationof IR, Text Summarization, Q&A, etc) project, and an associate professorof National Institute of Informatics (NII). She got her Ph. D in 1995from Keio University. Her research interest includes evaluation ofinformation retrieval systems, technologies to "Make Information Usablefor Users", cross-lingual information retrieval, and analysis of textstructure, genre, citation & link She is a member of editorial boards ofInternational Journal on Information Processing and Management,ACM-Transaction on Asian Language Information Processing, etc.Jun'ichi Fukumoto and Tsuneaki Kato are task organizers of QAC.Dr. Jun'ichi Fukumoto is an associate professor of RitsumeikanUniversity. He got his Ph. D in 1999 from University of ManchesterInstitute of Science and Technology. His research interest includesQ&A, automatic summarization, and dialogue processing.Dr. Tsuneaki Kato is an associate professor of the University of Tokyo.He got his Dr. of Engineering in 1995 from Tokyo Institute ofTechnology. His research interests includes multimodal dialogueprocessing, multimodal presentation generation and domain independentquestion and answering. He is a member of editorial committee oftransaction on information and systems of The Institute of Electronics,Information and Communication Engineers.
27 Oct 2003	Christopher Manning (Stanford)	Natural Language Parsing: Graphs, the A* Algorithm, and Modularity Time: 10:00 am - 11:00 am Location: 11 Large Abstract: Probabilistic parsing methods have in recent years transformed our ability torobustly find correct parses for open domain sentences. Much of this work hasbeen within a common architecture of heuristic search for good pares inlexicalized probabilistic context-free grammars, with many layers of back-offto avoid problems of sparse data.In this talk, I will outline some different ideas that we have been pursuing.I will connect stochastic parsing with finding shortest paths in hypergraphs,and show how this approach naturally provides a chart parser for arbitraryprobabilistic context-free grammars (finding shortest paths in a hypergraph iseasy; the central problem of parsing is that the hypergraph has to beconstructed on the fly). From this viewpoint, a natural approach is to use theA* algorithm to cut down the work in finding the best parse. On unlexicalizedgrammars, this can reduce the parsing work done dramatically, by at least 97%.This approach is competitive with methods standardly used in statisticalparsers, while ensuring optimality, unlike most heuristic approaches tobest-first parsing.Finally, I will present a novel modular generative model in which semantic(lexical dependency) and syntactic structures are scored separately. Thisfactored model is conceptually simple, linguistically interesting, admits exactinferenence with an extremely effective A* algorithm, and providesstraightforward opportunities for separately improving the component models. Inparticular, I will mention some of the work we have done focusing on the PCFGcomponent to produce a very high accuracy unlexicalized grammar.This is joint work with Dan Klein.About the Speaker:Christopher Manning is an Assistant Professor of Computer Science andLinguistics at Stanford University. He received his Ph.D. from StanfordUniversity in 1995, and served on the faculty of the Computational LinguisticsProgram at Carnegie Mellon University (1994-1996) and the University of SydneyLinguistics Department (1996-1999) before returning to Stanford. His researchinterests include probabilistic models of language, natural language parsing,constraint-based linguistic theories, syntactic typology, informationextraction and text mining, and computational lexicography. He is the author ofthree books, including Foundations of Statistical Natural Language Processing(MIT Press, 1999, with Hinrich Schuetze). Chris' schedule is available in Postscript or PDF format.
17 Oct 2003	Hovy, Marcu, Knight, Byrd, Narayanan, Traum, Gordon	Introduction to CL Research Time: 3:00 pm - 4:30 pm Location: 11 Large Abstract: The annual Computational Linguistics Open House will be held at USC's InformationSciences Institute from 3:00-4:30pm in the 11th floor Conference Room. Researchers fromISI, including Eduard Hovy, Daniel Marcu, and Kevin Knight will present overviews oftheir latest research. We will also hear about the research activities of Dani Byrd ofthe Linguistics Department, Shri Narayanan's group in EE, and David Traum and AndrewGordon of USC's Institute for Creative Technologies.
10 Oct 2003	Philipp Koehn	Advances in Statistical MT: Phrases, Noun Phrases and Beyond Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: (This is a practice run for I talk I will give a few times over the nextweeks when interviewing for job positions.)I will review the state of the art in statistical machine translation(SMT), present my dissertation work, and sketch out the researchchallenges of syntactically structured statistical machine translation.The currently best methods in SMT build on the translation of phrases (anysequences of words) instead of single words. Phrase translation pairs areautomatically learned from parallel corpora. While SMT systems generatetranslation output that often conveys a lot of the meaning of the originaltext, it is frequently ungrammatical and incoherent.The research challenge at this point is to introduce syntactic knowledgeto the state of the art in order to improve translation quality. Myapproach breaks up the translation process along linguistic lines. I willpresent my thesis work on noun phrase translation and ideas about clausestructure.
03 Oct 2003	Anton Leuski	A Year in Paradise Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: I would like to talk about some of the things I did during the lastyear. I will discuss and demonstrate CuSTaRD, a cross-lingualinformation retrieval, organization, summarization, and visualizationsystem that was built for the Surprise Language exercise. I will focusin more details on iNeATS, the interactive multi-document summarizationpart of CuSTaRD. The other project I plan to present is eArchivarius, asystem for accessing collections of electronic mail.
02 Oct 2003	Ana-Maria Popescu	TBA Time: 4:00 pm - 5:00 pm Location: 11 Large Abstract:
15 Sep 2003	Beata Klebanov	Analyzing Sentences into Facts: Simple is Beautiful Time: 2:30 pm - 4:00 pm Location: 11 Large Abstract: I present my summer project - writing rule-based software forsimplifying texts. Task definition and motivations will bediscussed, as well as human and automatic evaluation, thelatter using a question answering system.This is joint work with Daniel Marcu and Kevin Knight.
12 Sep 2003	Lara Taylor	Discourse Coherence for Ordering Information Time: 2:30 pm - 4:00 pm Location: 11 Large Abstract: In this talk, I look at how the notion of discourse coherence can bemodeled computationally. I begin with the following idea: if you takea text and shuffle its sentences into a random order, that text willno longer make sense. In other words, the text will be "incoherent".Our task is to learn how to reassemble a shuffled text into an orderthat humans would consider to be coherent.I discuss practical and theoretical motivations for the task,evaluations of our model, increases in performance achieved over thesummer, and directions for future research.This work was done in collaboration with Kevin Knight, Daniel Marcu,Jonathan Graehl and Nick Mote.
05 Sep 2003	Nishit Rathod and Anish Nair	Deciphering Hindi Scripts Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: A major hurdle in building automated information retrieval systems forHindi text is the lack of an uniform encoding for text representation.Standards do exist, but noone seems interested. Every web contentpublisher seems to have their encoding system, making informationextraction a nightmare. We explore an unsupervised approach toconvert any given "unknown" encoding to UTF-8, by treating it as adecipherment problem. We also study how a little amount of supervisioncan improve decoding accuracy.
03 Sep 2003	Alex Fraser and Franz Och	JHU MT Workshop Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: We will present the results of the 2003 Johns Hopkins UniversitySummer Workshop on "Syntax for Statistical Machine Translation".We will describe a large effort to extend a high-performingphrase-based MT system as baseline by adding new features representingsyntactic knowledge that deal with specific problems of the underlyingbaseline. We investigate a broad range of possible feature functions,from very simple binary features to sophisticated tree-to-treetranslation models. Simple feature functions test if a certainconstituent occurs in the source and the target language parsetree. More sophisticated features will be derived from an alignmentmodel where whole sub-trees in source and target can be aligned nodeby node. We present results on the Chinese-English large data track ofthe recent TIDES MT evaluations.This is joint work with the other workshop team members: DanielGildea, Anoop Sarkar, Sanjeev Khudanpur, Kenji Yamada, Libin Shen,Shankar Kumar, David Smith, Viran Jain, Katherine Eng, Jin Zhen andDragomir Radev.See http://www.clsp.jhu.edu/ws03/groups/translate/for more.
29 Aug 2003	Stefan Riezler	Deepening Representations Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
27 Aug 2003	Michel Galley and Mark Hopkins	Syntax for Statistical MT Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
22 Aug 2003	Satoshi Sekine	Information Extraction, IR and QA Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
15 Aug 2003	Beata Klebanov	On Her Masters Research Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
01 Aug 2003	Shou-de Lin	Toward deciphering the 2-dimensional ancient Luwian script by discovering its writing order Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
29 Jul 2003	Michael Brasser	A Model of Word Movement for Machine Translation Time: 3:00 pm - 4:00 pm Location: 11 Small Abstract:
25 Jul 2003	Jonathan Graehl and Kevin Knight	Super-Carmel for Trees Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
18 Jul 2003	Doug Oard	A Maryland Yankee in King Eduard's Court: Some Remarks on a Year in Paradise Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
27 Jun 2003	Michael Fleischman	Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked and Maximum Entropy Models for FrameNet Classification Time: 3:00 pm - 4:00 pm Location: 10 Large Abstract:
12 Jun 2003	Dina Demner-Fushman	Measuring the Effect of Dictionary Coverage on Cross-Language Retrieval Time: 11:00 am - 12:00 pm Location: 11 Large Abstract: Bilingual term lists have proven to be a useful basis fordictionary-based Cross-Language Information Retrieval (CLIR), butthere is ample anecdotal evidence that differences in vocabularycoverage can have a substantial impact on retrieval effectiveness.This issue has recently been explored using ablation studies in whichprogressively smaller term lists were synthesized using samplingtechniques. The ablation techniques used in those studies have not,however, been validated using real terms lists. In this talk I willreport the results of what we believe is the first large coveragestudy use naturally occurring term lists. Thirty-five bilingual termslists were obtained from a variety of sources, each with English asone of the two paired languages. From these, we created 35English-to-English term lists by taking each term that was present inthe English side of the list as its own translation. When used withan English information retreval test collection, this allowed us tomeasure the reduction in retrieval effectivenss that could beattributed to deficiencies in the coverage of English terms. Eighttypes of untranslatable terms were identified in a collection of newsstories, of which named entitles were found to have the greatestimpact on retrieval effectiveness. Differences in named entitycoverage were found to produce large differences in retrievaleffectiveness for term lists of similar sizes. Controlling for namedentity effects yielded a clear relationship between retrievaleffectiveness and the size of the translatable English vocabulary.The functional dependence that we observed is consistent with onepreviously applied ablation technique and inconsistent with another.Our results indicate that the outcome of a widely cited landmark studyof query expansion effects for CLIR was likely affected by a flawedablation model. We conclude our talk with a suggestion for furtherwork on that topic, and a simple prescription for avoiding suchproblems in the future.
23 May 2003	Liang Zhou	A Web-Trained Extraction Summarization System and Headline Summarization at ISI Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: 1) A serious bottleneck in the development of trainable text summarizationsystems is the shortage of training data. Constructing such data is a verytedious task, especially because there are in general many differentcorrect ways to summarize a text. Fortunately we can utilize the Internetas a source of suitable training data. In this paper, we present asummarization system that uses the web as the source of training data. Theprocedure involves structuring the articles downloaded from variouswebsites, building adequate corpora of (summary, text) and (extract,text) pairs, training on positive and negative data, and automaticallylearning to perform the task of extraction-based summarization systems.2) Headlines are useful for users who only need information on the maintopics of a story. We present a headline summarization system that isbuilt at ISI for this purpose and is a top performer for DUC2003's task 1,generating very short summaries (10 words or less).
20 May 2003	Michel Galley	Discourse Segmentation of Multi-Party Conversation Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract:
16 May 2003	Chin-Yew Lin	Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Following the recent adoption by the machine translation community ofautomatic evaluation using the BLEU/NIST scoring process, we conduct anin-depth study of a similar idea for evaluating summaries. The resultsshow that automatic evaluation using unigram co-occurrences betweensummary pairs correlates surprising well with human evaluations, basedon various statistical metrics; while direct application of the BLEUevaluation procedure does not always give good results.
09 May 2003	Doug Oard	Coping with Surprise: The Case of Cebuano Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: For ten days in March, nine research teams worked together to buildCebuano language resources and systems for a "dry run" the TIDES SupriseLanguage experiment. Cebuano is spoken widely in the southernPhillipines, but there had previously been little work on computationallinguistics for that language. As we prepare for the actual SupriseLanguage experiment this June, we will use this talk to look back on whatworked, what didn't, and what lessons there are to be learned from ourexperience in March. Come prepared to share the excitement, offer yourideas, and understand why we have tried to ask Ed to cancel all vacationsduring the month of June (just kidding...).
02 May 2003	Hal Daumé III	Acquiring Paraphrase Templates from Document/Abstract Pairs Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: We present an approach to automatically extracting paraphrase templatesfrom document/abstract pairs. This methodology relies on word-basedalignments created by off-the-shelf software. Our paraphrases areevaluated by human evaluators for precision and automatically forapplicability. We find that 77% of the extracted paraphrases are judgedto be always correct and that the generalized templates of 60% arejudged to be applicable most of the time and 87% are judged to beapplicable sometimes.
25 Apr 2003	Quamrul Tipu	Statistical MT with Bilingual Morphology Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Traditional statistical MT systems mostly work on the word-andphrase-level. For different language pairs, the performance of suchsystems vary from some 15% to 35%. These systems suffer from problemssuch as sparse data, with huge vocabulary sizes leading to lessreliable probability estimates. In our current research, we aim tocome up with a better MT system by looking inside the words. Almost inevery language, a root (stem) can have many different forms(inflectional, derivational, etc.). If we can identify the roots, thesize of the vocabulary will quite small, and we can have betterprobability estimates, reducing the sparse data problem andpotentially leading to higher accuracy. We are trying to come up witha model that induces morphology automatically from a bilingual corpusand achieves this improvement.
04 Apr 2003	Donghui Feng	Natural Language Understanding in MRE Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: In this talk, I will present my current work on language understandingin the project, Mission Rehearsal Exercise(MRE). One of the challengesin a dialogure system is to provide a robust understanding/parsingcompoment. We applied both Finte State Model and Statistical LearningModel for the parsing of separate sentences of dialogue utterances.Their performances are evaluated and compared with a new blind set.And we hope to incorporate them to make a better solution in thisspecific application.
21 Mar 2003	Gareth Jones	An Investigation of the Application of Broad Coverage Automatic Pronoun Resolution in Information Retrieval Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Term weighting methods have been shown to give significant increasesin information retrieval performance. Term weights are typicallycalculated using frequency counts across the whole retrievalcollection, frequency of each term within individual documents andcompensation for varying document length. The presence of pronomialreferences in documents effectively reduces the within document termfrequency of associated words with a consequent effect on term weightsand information retrieval behaviour. This presentation will describean experimental investigation into the impact on information retrievalperformance of broad coverage automatic pronoun resolution. Resultsusing a standard information retieval test collection indicate thatcalculating term weights using a pronoun resolved version of thedocument test collection can improve both fixed cutoff and averageretrieval precision.
14 Mar 2003	Kareem Darwish	Improving the Efficiency and Effectiveness of Structured Query Methods Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: One of the key challenges in retrieval is what to do when a query termneeds to be replaced with more than one term. This problem arises inapplications such as cross language information retrieval andthesaurus expansion. One solution is to use structured query methods,which treat all the possible replacements as if they were one queryterm by computing a joint document frequency and a joint termfrequency. This presentation will review prior work on structuredquery techniques and then introduce three new variants that aim toimprove computational efficiency and to leverage estimates ofreplacement probabilities to improve retrieval effectiveness. Themethods have now been tested in cross-language retrieval andOCR-degraded text retrieval applications in which replacementprobability estimates could be estimated. In both applications, thenew structured query methods showed statistically significantimprovements in retrieval effectiveness over previously knownstructured query methods.
07 Mar 2003	Scott Klemmer	Books with Voices: Paper Transcripts as a Tangible Interface to Oral Histories Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Our contextual inquiry into the practices of oralhistorians uneartheda curious incongruity. While oral historians consider interviewrecordings a central historical artifact, these recordingssit unusedafter a written transcript is produced. We hypothesizedthat this islargely because books are more usable than recordings.Therefore, wecreated Books with Voices: bar-code augmented paper transcriptsenabling fast, random access to digital video interviews ona PDA. Wepresent quantitative results of an evaluation of this tangibleinterface with 13 participants. They found this lightweight,structured access to original recordings to offersubstantial benefitswith minimal overhead. Oral historians found a level ofemotion in thevideo not available in the printed transcript. The videoalso helpedreaders clarify the text and observe nonverbal cues. http://guir.berkeley.edu/oral-history/
28 Feb 2003	Radu Soricut	Sentence Level Discourse Parsing using Syntactic and Lexical Information Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: We introduce two probabilistic models that can be used to identifyelementary discourse units and build sentence-level discourse parsetrees. The models use syntactic and lexical features. A discourse parsingalgorithm that implements these models derives discourse parse trees withan error reduction of 18.8% over a state-of-the-art decision-baseddiscourse parser. A set of empirical evaluations shows that our discourseparsing model is sophisticated enough to yield discourse trees at anaccuracy level that matches near-human levels of performance.
21 Feb 2003	Nate Chambers	Statistical Language Generation in a Dialogue System Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: The large corpora of written text that is available to the languagecommunity has largely been utilized for language understanding; it hassomewhat been ignored in the context of language generation. Recentdevelopments in stochastic generation have allowed such systems to shiftthe burden from hand crafted databases (lexicons, grammars, ontologies) tothe knowledge implicitly found in written text. However, when building adialogue system, generation is largely interactive, very different fromthe written structure of most corpora.In this talk, I will discuss my recent work at applying a stochasticgenerator, HALogen, and its newswire language model to a dialogue system,TRIPS. I'll describe the difficulties in mapping the TRIPS semantic forminto HALogen's representation, the critical differences between newswireand dialogue, and the possibility of using HALogen and a large newswiremodel as a domain independent generator.
07 Feb 2003	Jeongwon Cha	Automatic Pattern Learning for Information Extraction using Web Data Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: I will give a status report work on information extraction during last10 months. The motivation of this work is to learn extractionpatterns automatically using seed template and web search engine. Myapproach is to generate linguistics patterns and surface patterns andcombine them to compenstate for the respective weaknesses of twopatterns. On the DUC01-test-disasters (67 documents),DUC01-training-disasters (54 documents) I got a 0.34/0.26 f-measurerespectively. In this talk, I will give a status report on ReADproject (with Dr. Chin-Yew Lin).
31 Jan 2003	Philipp Koehn	Noun Phrase Translation Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: I will give a status report on my current thesis work onnoun phrase translation. The motivation of this work isto break up the machine translation problem into smaller,more manageable units. The treatment of noun phrase translationas a subtask of machine translation is both linguisticallyand empirically motivated. My approach is to generatea n-best list of candidate translations with a statisticalmachine translation system and rerank the candidates withadditional features. For about 90% of all noun phrases wecan find an acceptable translation in the 100-best list, whilean acceptable translation comes out on the very top for onlyabout 60% of the noun phrases. I will discuss a variety oflinguistic and empirical features that (may) help to movethe acceptable translations higher in the list. I will alsopresent results modeling issues such as phrase basedtranslation and compound splitting. This talk is alsointended as a fishing expedition for feature suggestions bythe audience.
24 Jan 2003	Doug Oard & Anton Leuski	Access to Archival Collections of Electronic Mail Time: 3:00 pm - 4:00 pm Location: 11 Large Abstract: Since its inception more than 30 years ago, electronic mail (email)has developed into a powerful communication medium with applicationsthat extend well beyond simple asynchronous message exchange betweenindividuals. Automated tools to support the use of email inindividual, organizational and social contexts have receivedincreasing attention in recent years. Among the tasks that are nowsupported are filtering (e.g., spam detection), aggregation (e.g.,mailing list digests), workflow management (e.g., help desk routing),and reuse (e.g., retrospective search). We are interested in howtoday's email will be used in the future -- some will certainly bepreserved (indeed, some MUST be preserved!), and those records willserve as powerful evidence of how we lived our lives and organized oursocieties. The challenges of managing many types of electronic recordcollections are receiving increasing attention, but we are not awareof any work yet on supporting access to electronic mail archives.That will be the focus of this talk.We will introduce the Open Archival Information Systems (OAIS) model,and then focus on two key processes: ingestion and access. Our focusin ingestion is on support for review and redaction, which we believewill be key enablers to acquisition and near-term access. For access,we will address both browsing based on provenance (original order) anduser-guided reorganization based on search and visualization. Alongthe way, we will identify potentially productive opportunities toapply natural language processing technologies such as topicsegmentation, link detection, and summarization. We will thendescribe two test collections, and demonstrate a system that we havedeveloped to explore user-guided reorganization through visualizationfor one of those collections. We will conclude the talk by sketchingout a research agenda. At that point, we will expect suggestions andcomments from the audience. Knowing this audience, it is unlikelythat we will need to wait that long :-).

Information Sciences Institute

Natural Language Group

Visitor Information

Upcoming Talks

Past Talks