This module ``parses'' the text into text segments. At a minimum it would separate the formatted from the unformatted regions. Some systems may go farther and segment the unformatted text into topic areas, either by looking for discourse particles like ``meanwhile'', or by statistical means. The header is the only formatted region in the Tipster texts. This module in MUC-5 systems will store the date and source information from the header for entry into the template, and the date will be used to interpret temporal deictics like ``last month'' during subsequent processing. Some header information is often thrown away as irrelevant.
Few if any systems have a systematic treatment of text zoning--only ad hoc code that is developed manually.