Eduard Hovy, Ph.D.

 

File written by Adobe Photoshop® 5.2

Information Sciences Institute
of the University of Southern California
4676 Admiralty Way
Marina del Rey, CA 90292-6695
U.S.A.

tel: +1-310-448-8731
fax: +1-310-823-6714
email: hovy@isi.edu

Projects webpage: http://www.isi.edu/natural-language/nlp-at-isi.html


Dr. Hovy currently holds several positions:

·       Director of the Natural Language Group at ISI.  The NL Group, which currently contains about 40 people, consists of several related projects, conducting research in various aspects of natural language processing, including text summarization, machine translation, text parsing and generation, question answering, information retrieval, discourse and dialogue processing, and ontologies.  For details see below. 

·       Deputy Director of the Intelligent Systems Division of ISI, which performs Artificial Intelligence research.  In this capacity Dr. Hovy helps administer the division, which currently contains about 150 people. 

·       Research Associate Professor of Computer Science at USC.  Dr. Hovy regularly co-teaches a graduate course and advises Ph.D. and M.S. students; for details click here. 

·       Director of the Center for Knowledge Integration and Discovery (CKID).  CKID is one of four research centers funded by the Department of Homeland Security that form the Institute for Discrete Sciences (IDS), performing research on the extraction of interesting information from various media, its integration into a single repository, and techniques for theoretically well-founded techniques for trend analysis. 

·       Director of Research for the Digital Government Research Center (DGRC).  The DGRC is one of three NSF-supported centers in the US that perform research in various aspects of Digital Government (the other two are the Center for Technology in Government (CTG) at the University of Albany, NY, which focuses on the use of technology in government organizations, and the National Center for Digital Government (NCDG) at Harvard University and the University of Massachusetts in Amherst, MA, which focuses on political science).  The DGRC focuses on Information and Communications Technology (ICT), housing several projects at any time. 

·       Advisory Professor at the Beijing University of Posts and Telecommunications. 

·       Regular High-Level Visiting Scientist, International Guest Academic Talents (IGAT) Program for the Development of University Disciplines in China (111 Program), Jan 2008–Dec 2012.

 

Dr. Hovy’s research can be organized into three principal directions (that somewhat overlap):

(1) Natural Language Processing / Computational Linguistics / Human Language Technology

The development of automated text summarization systems and automated summarization evaluation theory and technology.  Summarization engines include SUMMARIST (single documents), NeATS (multiple documents), and GOSP (producing headlines), and the use of summarization in multilingual text access and management systems such as C*ST*RD and MuST (with Dr. Chin-Yew Lin and others at ISI).  Summarization evaluation systems include ROUGE (2003–04), developed by Dr Chin-Yew Lin of ISI, and the BE package (2005), developed by Dr. Hovy, Dr. Lin, and others. Relevant publications.

Research on various aspects of machine translation and MT evaluation.   For MT evaluation, work in 2002–04 includes the FEMTI survey (a systematization of all major machine translation evaluation measures); partners are Prof. Maghi King and Dr. Andre Popescu-Belis at the University of Geneva, as well as students and researchers at other universities and commercial MT companies.  Work on machine translation included development of the Pangloss MT system (1990–94) together with researchers at CMU and New Mexico State University, which helped establish ISI’s Gazelle system headed by Dr. Kevin Knight.  The NSF-sponsored IL-Annotation project IAMTC (2003–04), joint with researchers at CMU, University of Maryland, MITRE, Columbia University, and New Mexico State University, focused on Interlingua design and text annotation; see under lexical semantics below. Relevant publications.

Development of sophisticated information extraction, opinion identification, parsing, and text analysis technology.  The Psyop project (2004–) employs technology to extract from online texts entities, events, beliefs, goals, opinions, and other information of interest, and to compose the results into psychologically informative descriptions of people.  This research relates to the analysis of public commentary below.  The SASO (2004–) and MRE (2001–04) projects at the Institute of Creative Technology of the University of Southern California develop virtual humans in a virtual reality simulation called SASO (2004–) and Mission Rehearsal Exercise (2001–04), which employ text-to-semantics parsers (statistical and rule-based) being developed by Dr. Hovy and students.  Focusing on Social Network Analysis, the MKIDS-ISI project (2002–05) developed methods to analyze emails for expertise (of people and groups) and relative social status, using topic signature and speech act recognition technology. Relevant publications.

The development of automated question answering systems (2001–03) such as Textmap and Webclopedia (with Dr. Daniel Marcu, Dr. Ulf Hermjakob, Dr. Chin-Yew Lin, and others at ISI).  This work employs information retrieval, clustering, text summarization, parsing, and text harvesting methods described elsewhere. Relevant publications.

The development of theories and systems to perform automated text generation, including multi-sentence text planners, sentence planners, and sentence generators.  Currently, developing a generator used by the virtual humans in a virtual reality simulation called SASO (2004–) and Mission Rehearsal Exercise (2001–04) being developed at the Institute of Creative Technology of the University of Southern California (this work in collaboration with Dr. David Traum and others).  Another project, Quick!Help is a small effort focusing on the generation of tailored recipes for poor people (this work in collaboration with Prof. Peter Clarke and Dr. Susan Evans from USC and Andrew Philpot from ISI).  This work relates to language tailoring done earlier in the HealthDoc project with Prof. Chrysanne DiMarco from the University of Waterloo, Canada and Prof. Graeme Hirst from the University of Toronto).  Previous work (1987–92) includes the RST Text Structurer and the Penman sentence generator (this work in collaboration with researchers in various countries).  Relevant publications. 

The development of discourse relations and planners that employ them to ensure the production of coherent multisentential text.  A taxonomy of discourse relations collected from various sources (1992) and the RST Test Structurer (1987–92). Relevant publications. 

The development of theory to address problems in multimedia human-computer communication; specifically, the question of dynamic planning and allocation of information to media during presentation design.  This work (1989–2002) conducted with Dr. Yigal Arens of ISI and students.  Relevant publications. 

(2) Ontologies, Text Mining/Harvesting, and Lexical Semantics

The development of shallow semantic representation notations and tools that support manual annotation of large amounts of text with shallow semantic information.  The current DARPA-funded OntoNotes (formerly OntoBank) project, joint with Dr. Ralph Weischedel and Dr. Lance Ramshaw of BBN, Prof. Mitch Marcus of the University of Pennsylvania, and Prof. Martha Palmer of the University of Colorado, focuses on the creation of a large corpus of texts in English, Chinese, and Arabic, annotated with shallow semantic information.  The NSF-funded IL-Annot project IAMTC (2003–04), joint with researchers at CMU, University of Maryland, MITRE, Columbia University, and New Mexico State University, focused on stepwise Interlingua design and verification by annotation of texts in 7 languages.  In both these projects, the Omega ontology (see above) provides the symbol set for semantic annotation.  Relevant publications.

The development of large concept taxonomies/ontologies through a combination of merging together existing ontologies, adding to the knowledge by extracting information from online text (see below), and enriching the interdependency relations by extracting information from dictionaries. Omega, our current ontology (2003–), contains over 120,000 concept terms and several million instances, in addition to various other information, acquired from a variety of sources, including Princeton's WordNet, NMSU's Mikrokosmos, and ISI's previous ontology SENSUS (1996–2000).  This work performed in collaboration with Dr, Patrick Pantel, Mr. Michael Fleischman, Mr. Andrew Philpot, and Dr. Jerry Hobbs from ISI.  Relevant publications. 

The development of technology to extract large amounts of instantial and conceptual information from online text.  In several projects since 1996, Dr. Hovy, students, and collaborators have developed a series of text mining and information extraction engines, and built collections of several millions facts (about people, locations, objects, etc.).  This information, stored in a database, is connected to the Omega ontology (see above).  The most recent projects focus on Learning by Reading (2005–), in which tagging, parsing, semantic analysis, and inference techniques are combined to create a knowledge base automatically from text (including high school textbooks of Biology and Chemistry and webpages about the heart and engines), and to answer high school test questions from this.  Relevant publications. 

(3) Digital Government

The development of systems to automatically find alignments or aliases across and within databases (2003–06). The SiFT system uses mutual information technology to detect patterns in the distribution of data values.  Current government partners in this NSF-funded project project are the Environmental Protection Agency (EPA), who provide databases with air quality measurement data.  (This work with Mr. Andrew Philpot and Dr. Patrick Pantel from ISI).   Relevant publications.

The development of sophisticated text analysis of public commentary, such as emails, letters, and reports (2004–07).  Government staff who have to create regulations regularly face tens or hundreds of thousands of emails and other comments about proposed regulations, sent to them by the public.  Funded by the NSF, the eRule project is a collaboration between Prof. Stuart Shulman (a political scientist at the University of Pittsburgh), Prof. Jamie Callan (a computer scientist at CMU), Prof Steven Zavestoski (a sociologist at the University of San Francisco), and Prof. Hovy. Government partners providing data are the Environmental Protection Agency (EPA) and the Department of Transportation (DOT).  Research at ISI focuses on technology to perform opinion detection and argument structure extraction.  This research relates to the analysis of text for psychological profiling above.  Relevant publications.

The development of text analysis of public communications with city government via email (2005).  The NSF funded a one-year project to collaborate with the QUALEG group, a European consortium of businesses, researchers, and three cities funded by the EU’s eGovernment program to develop ICT for city-to-citizen interaction.  Work at ISI focuses on the development of a system to classify emails and extract speech acts, opinions, and stakeholders, in German, and possibly French and Polish. 

The development of systems to access multiple heterogeneous databases (1999–2003).  Funded by the NSF, the EDC and AskCal systems provided access to over 50,000 table of information about gasoline, produced by various Federal Statistics agencies, including the Census Bureau, the Bureau of Labor Statistics, and the Energy Information Administration. The system includes a large ontology and a natural language question interpreter (this work in collaboration with Dr. Jose-Luis Ambite and Andrew Philpot at ISI).  Partners in this project were the DGRC team at Columbia University, New York, headed by Dr. Judith Klavans.  Relevant publications.

 

Dr. Hovy's Ph.D. work focused on the development of a text generation program PAULINE that took into account the pragmatic aspects of communication, since the absence of sensitivity toward hearer and context has been a serious shortcoming of generator programs written to date. In general, he is interested in all facets of communication, especially language, as situated in the wider context of intelligent behavior. Related areas include Artificial Intelligence (work on planning and learning), Linguistics (semantics and pragmatics), Psychology, Philosophy (ontologies), and Theory of Computation.

 


Work Experience

Deputy Division Director (October 2002–), ISI Fellow (August 2000–03), Senior Project Leader (May 1997–), Project Leader (July 1989 – April 1997), and Computer Scientist (March 1987 – June 1989), Information Sciences Institute, University of Southern California.

Research Associate Professor (November 1999–) and Research Assistant Professor (December 1989–October 1999), Department of Computer Science, University of Southern California.

Co-Director (May 1999–), Master's Degree Program in Computational Linguistics, University of Southern California.

Advisory Professor (October 2005 – September 2008), Beijing University of Posts and Telecommunications.

Adjunct Associate Professor (February 1997 – January 2003), School of Computer Science, University of Waterloo.


 

 

Biography | Education | Publications | Research grants
Professional Activities | Invited Presentations | Teaching and Advising