Eduard
Hovy, Ph.D.
|
|
Information Sciences Institute tel: +1-310-448-8731 Projects webpage: http://www.isi.edu/natural-language/nlp-at-isi.html |
Dr. Hovy currently holds several
positions:
· Director
of the Natural
Language Group at ISI. The NL
Group, which currently contains about 40 people, consists of several related
projects, conducting research in various aspects of natural language
processing, including text summarization, machine translation, text parsing and
generation, question answering, information retrieval, discourse and dialogue
processing, and ontologies. For
details see below.
· Deputy
Director of the Intelligent
Systems Division of ISI, which performs Artificial Intelligence
research. In this capacity Dr.
Hovy helps administer the division, which currently contains about 150
people.
· Research
Associate Professor of Computer Science at USC. Dr. Hovy regularly co-teaches a
graduate course and advises Ph.D. and M.S. students; for
details click here.
· Director
of the Center for Knowledge Integration and
Discovery (CKID). CKID is one
of four research centers funded by the Department of Homeland Security that
form the Institute for Discrete
Sciences (IDS), performing research on the extraction of interesting
information from various media, its integration into a single repository, and
techniques for theoretically well-founded techniques for trend analysis.
· Director
of Research for the Digital Government Research
Center (DGRC). The DGRC is one
of three NSF-supported centers in the US that perform research in various
aspects of Digital Government (the other two are the Center for Technology in Government (CTG)
at the University of Albany, NY, which focuses on the use of technology in
government organizations, and the National Center for Digital
Government (NCDG) at Harvard University and the University of Massachusetts
in Amherst, MA, which focuses on political science). The DGRC focuses on Information and Communications
Technology (ICT), housing several projects
at any time.
· Advisory
Professor at the Beijing
University of Posts and Telecommunications.
· Regular
High-Level Visiting Scientist, International Guest Academic Talents (IGAT) Program
for the Development of University Disciplines in China (111 Program), Jan 2008–Dec
2012.
Dr. Hovy’s research can be
organized into three principal directions (that somewhat overlap):
(1) Natural Language
Processing / Computational Linguistics / Human Language Technology
The development of automated text
summarization systems and automated summarization
evaluation theory and technology. Summarization engines include SUMMARIST
(single documents), NeATS
(multiple documents), and GOSP
(producing headlines), and the use of summarization in multilingual text access
and management systems such as C*ST*RD and
MuST (with Dr. Chin-Yew Lin and others at ISI). Summarization evaluation systems include ROUGE (2003–04), developed by Dr
Chin-Yew Lin of ISI, and the BE package
(2005), developed by Dr. Hovy, Dr. Lin, and others. Relevant publications.
Research on various aspects of machine
translation and MT evaluation.
For MT evaluation, work in 2002–04 includes the FEMTI survey (a systematization of all major
machine translation evaluation measures); partners are Prof. Maghi King and Dr. Andre Popescu-Belis at
the University of Geneva, as well as students and researchers at other
universities and commercial MT companies.
Work on machine translation included development of the Pangloss MT
system (1990–94) together with researchers at CMU and New Mexico State
University, which helped establish ISI’s Gazelle
system headed by Dr. Kevin Knight.
The NSF-sponsored IL-Annotation project IAMTC (2003–04), joint with
researchers at CMU, University of Maryland, MITRE, Columbia University, and New
Mexico State University, focused on Interlingua design and text annotation; see
under lexical semantics below. Relevant publications.
Development of sophisticated information
extraction, opinion identification, parsing,
and text analysis technology. The Psyop project (2004–) employs technology to extract from
online texts entities, events, beliefs, goals, opinions, and other information
of interest, and to compose the results into psychologically informative
descriptions of people. This
research relates to the analysis of public commentary below. The SASO (2004–) and MRE (2001–04) projects
at the Institute of Creative Technology
of the University of Southern California develop virtual humans in a virtual
reality simulation called SASO (2004–) and Mission Rehearsal Exercise
(2001–04), which employ text-to-semantics parsers (statistical and rule-based)
being developed by Dr. Hovy and students.
Focusing on Social Network Analysis, the MKIDS-ISI project (2002–05)
developed methods to analyze emails for expertise (of people and groups) and
relative social status, using topic signature and speech act recognition
technology. Relevant publications.
The development of automated
question answering systems (2001–03) such
as Textmap
and Webclopedia
(with Dr. Daniel Marcu, Dr. Ulf Hermjakob, Dr. Chin-Yew Lin, and others at
ISI). This work employs
information retrieval, clustering, text summarization, parsing, and text
harvesting methods described elsewhere. Relevant
publications.
The development of theories and
systems to perform automated text generation, including multi-sentence text planners, sentence planners, and
sentence generators. Currently,
developing a generator used by the virtual humans in a virtual reality
simulation called SASO (2004–) and Mission Rehearsal Exercise
(2001–04) being developed at the Institute of
Creative Technology of the University of Southern California (this work in
collaboration with Dr. David Traum
and others). Another project, Quick!Help is a small
effort focusing on the generation of tailored recipes for poor people (this
work in collaboration with Prof. Peter Clarke
and Dr. Susan Evans from USC and Andrew Philpot from ISI). This work relates to language tailoring
done earlier in the HealthDoc
project with Prof. Chrysanne DiMarco from the
University of Waterloo, Canada and Prof. Graeme Hirst from the University of
Toronto). Previous work (1987–92)
includes the RST Text Structurer and the Penman sentence generator (this work
in collaboration with researchers in various countries). Relevant
publications.
The development of discourse
relations and planners that employ them to
ensure the production of coherent multisentential text. A taxonomy of discourse relations
collected from various sources (1992) and the RST Test Structurer (1987–92). Relevant publications.
The development of theory to
address problems in multimedia
human-computer communication; specifically, the question of dynamic planning
and allocation of information to media during presentation design. This work (1989–2002) conducted with
Dr. Yigal Arens of ISI and students.
Relevant publications.
(2) Ontologies, Text
Mining/Harvesting, and Lexical Semantics
The development of shallow
semantic representation notations and tools
that support manual annotation of large amounts of text with shallow
semantic information. The current DARPA-funded OntoNotes (formerly OntoBank) project, joint with Dr. Ralph Weischedel and Dr.
Lance Ramshaw of BBN, Prof. Mitch Marcus
of the University of Pennsylvania, and Prof. Martha Palmer of the University of
Colorado, focuses on the creation of a large corpus of texts in English,
Chinese, and Arabic, annotated with shallow semantic information. The NSF-funded IL-Annot project IAMTC (2003–04), joint with
researchers at CMU, University of Maryland, MITRE, Columbia University, and New
Mexico State University, focused on stepwise Interlingua design and
verification by annotation of texts in 7 languages. In both these projects, the Omega ontology (see above)
provides the symbol set for semantic annotation. Relevant publications.
The development of large concept
taxonomies/ontologies through a
combination of merging together existing ontologies, adding to the knowledge by
extracting information from online text (see below), and enriching the
interdependency relations by extracting information from dictionaries. Omega, our current ontology (2003–), contains
over 120,000 concept terms and several million instances, in addition to
various other information, acquired from a variety of sources, including
Princeton's WordNet, NMSU's
Mikrokosmos, and ISI's previous ontology SENSUS (1996–2000). This work performed in collaboration
with Dr, Patrick Pantel, Mr. Michael Fleischman, Mr. Andrew Philpot, and Dr. Jerry
Hobbs from ISI. Relevant publications.
The development of technology to extract
large amounts of instantial and conceptual information from online text. In
several projects since 1996, Dr. Hovy, students, and collaborators have
developed a series of text mining and information extraction engines, and built
collections of several millions facts (about people, locations, objects,
etc.). This information, stored in
a database, is connected to the Omega ontology (see above). The most recent projects focus on Learning
by Reading (2005–), in which tagging,
parsing, semantic analysis, and inference techniques are combined to create a
knowledge base automatically from text (including high school textbooks of Biology
and Chemistry and webpages about the heart and engines), and to answer high
school test questions from this. Relevant publications.
(3) Digital Government
The development of systems to automatically
find alignments or aliases across and within databases (2003–06). The SiFT
system uses mutual information technology to detect patterns in the distribution
of data values. Current government
partners in this NSF-funded project project are the Environmental Protection
Agency (EPA), who provide databases with air quality measurement data. (This work with Mr. Andrew Philpot and
Dr. Patrick Pantel from ISI).
Relevant publications.
The development of
sophisticated text analysis of public commentary, such as emails, letters, and reports
(2004–07). Government staff who
have to create regulations regularly face tens or hundreds of thousands of
emails and other comments about proposed regulations, sent to them by the
public. Funded by the NSF, the eRule project is a collaboration
between Prof. Stuart Shulman (a
political scientist at the University of Pittsburgh), Prof. Jamie Callan (a computer scientist at CMU),
Prof Steven Zavestoski
(a sociologist at the University of San Francisco), and Prof. Hovy. Government
partners providing data are the Environmental Protection Agency (EPA) and the
Department of Transportation (DOT).
Research at ISI focuses on technology to perform opinion detection and
argument structure extraction.
This research relates to the analysis of text for psychological
profiling above.
Relevant publications.
The development of text
analysis of public communications with city government via email (2005). The NSF funded a one-year project to collaborate with the QUALEG group, a
European consortium of businesses, researchers, and three cities funded by the
EU’s eGovernment program to develop ICT for city-to-citizen interaction. Work at ISI focuses on the development
of a system to classify emails and extract speech acts, opinions, and
stakeholders, in German, and possibly French and Polish.
The development of systems to
access multiple heterogeneous databases
(1999–2003). Funded by the NSF,
the EDC and AskCal systems provided access
to over 50,000 table of information about gasoline, produced by various Federal
Statistics agencies, including the Census Bureau, the Bureau of Labor
Statistics, and the Energy Information Administration. The system includes a
large ontology and a natural language question interpreter (this work in
collaboration with Dr. Jose-Luis Ambite and Andrew Philpot at ISI). Partners in this project were the DGRC
team at Columbia University, New York, headed by Dr. Judith Klavans. Relevant
publications.
Dr. Hovy's Ph.D. work focused on
the development of a text generation program PAULINE that took into account the
pragmatic aspects of communication, since the absence of sensitivity toward hearer
and context has been a serious shortcoming of generator programs written to
date. In general, he is interested in all facets of communication, especially
language, as situated in the wider context of intelligent behavior. Related
areas include Artificial Intelligence (work on planning and learning),
Linguistics (semantics and pragmatics), Psychology, Philosophy (ontologies),
and Theory of Computation.
Work
Experience
Deputy Division Director (October
2002–), ISI Fellow (August 2000–03), Senior Project Leader (May 1997–), Project
Leader (July 1989 – April 1997), and Computer Scientist (March 1987 – June
1989), Information Sciences Institute,
University of Southern California.
Research Associate Professor
(November 1999–) and Research Assistant Professor (December 1989–October 1999),
Department of Computer Science, University of Southern California.
Co-Director (May 1999–), Master's
Degree Program in Computational Linguistics, University
of Southern California.
Advisory Professor (October 2005
– September 2008), Beijing
University of Posts and Telecommunications.
Adjunct Associate Professor
(February 1997 – January 2003), School of
Computer Science, University of Waterloo.
Biography | Education | Publications | Research
grants
Professional Activities | Invited Presentations | Teaching
and Advising