Description

Members

Publications

Demos

Funding

Research Home

eRulemaking
Language Processing Technology for Electronic Rulemaking

Description

Many people today-including news analysts, opinion pollsters, advertisers, and government regulation writers-need to interpret, structure, and rapidly master large quantities of opinion-based text. New research is needed to develop text processing tools that can perform advanced analysis of large text collections. This research is building on current text processing technologies such as text clustering, text searching using information retrieval, and extractive summaries, to build and test tools tailored to the specific needs of government personnel working in an electronic rulemaking environment.

We focus on the federal government's several thousand regulation writers, employed in some 200 agencies, who formulate, in a tightly scripted procedure, the rules and regulations that define the details of our laws. Part of this procedure requires them to invite, and then process in detail, comments from the public on the proposed regulations. In high-profile cases, comments have exceeded a million items, including individual emails, batches of form and near-form letters, legal commentary, and formal industry or university studies comprising several hundred pages each. By law, regulation writers must respond in the final regulation to every substantive and relevant issue raised in the comments.

This project attempts to solve several novel problems central to language processing research. In turn, it will deploy and evaluate a Rule-Writer's Workbench; a set of language tools that enables regulation writers, singly or jointly, to obtain a detailed and multidimensional overview of the material. Our initial work with DOT, EPA, and a host of other regulation writing agencies, has highlighted the need for the following capabilities:

  • basic information retrieval, for gathering relevant texts both within and outside the comment set;
  • text classification, for channeling comments to the appropriate rule writer's desk;
  • overall text characterization using word frequency counts, for identifying key issues;
  • duplicate detection, for quickly identifying form letters;
  • near-duplicate detection, for identifying and extracting text changes to form letters;
  • text summarization, for creating a first rough cut through the data;
  • author typing, for stakeholder analyses during and after a public comment period; and
  • opinion/affect determination, for determining what stakeholder concerns exist.

Four institutions are collaborating in this research: USC/ISI, CMU, University of Pittsburgh, and San Francisco University. IT research is conducted at the first two; Social and Political Science work at the last two. The two IT PIs (Dr. Jamie Callan from CMU and Dr. Eduard Hovy from USC/ISI) are recognized leaders in research and development of language technology, and already have prototypes of several of these capabilities. The two Social Science PIs (Dr. Stuart Shulman of Pittsburgh and Dr. Steven Zavestoski of San Francisco) have in-depth connections with government agencies and the expertise required to conduct a variety of evaluations of end-user attitudes toward the workbench and its separate functionalities. Our government collaborators have committed personnel for evaluation experiments, including contrasting regulation writer efficiency with and without the workbench.

This research is the culmination of three years' work in the form of meetings, workshops, data gathering, and preliminary studies conducted under the auspices of two prior small, SGER grants from the NSF's Digital Government program. During this time, the PIs have built up an extensive network of contacts with government personnel in several agencies, including DOT, EPA, USDA, BLM, and the USFS-Content Analysis Team. The team has obtained commitments for personnel time, public comment data, and funding supplements.

This research has the potential to impact far beyond IT and social science academia. It will explore such novel issues as author typing, opinion/affect determination, and near-duplicate detection. If even just a handful of the new technologies are effective, they eventually may help thousands of regulation writers more effectively communicate with and understand the comments of millions of citizens in our increasingly digitized society, and produce better regulatory rules for everyone.

Please see eRulemaking home pagefor more information.