Writing Plans and Datafiles

Contents

  1. Overview
  2. Constructing a plan
  3. Constructing a datafile

1. Overview

Theseus takes input that consists of (a) a plan and (b) a set of input data and then produces a set of output data that reflects the processing of the input data by the plan.

A plan is a set of one or more input variables, a set of zero or more output variables, and a set of one or more operators. A variable is a relation of data - that is, a set of attributes and zero or more tuples that contain values for each of those attributes. An operator is a function that processes some set of input data and produces some set of output data. Input data may consist of both variables and literals. Output data are only variables. Plans are a graph consisting of operator nodes connected by edges that correspond to variables.

Input data is declared in a special file called the input datafile. Entries in this file correspond to the input variables for the plan. During execution, data in the input datafile is fed into the plan. Operators that consume the plan input variables may thus be triggered to execute - this results in the production of more data, which triggers the execution of other operators, and so on.

During execution, data communicated is streamed from producers to consumers through bounded FIFO queues. Streaming means that tuples are communciated as they are produced to the consumer, so that the consumer can begin processing its input as soon as possible. Communication between producer and consumer is asynchronous until the consuming queue becomes full, at which point producers must wait until there is room in the queue for more data.

2. Constructing a plan

To write a plan, one needs to specify a set of input variables, a set of output variables, and a set of operators. For example, if we want to write a plan called FILTER, we can start by declaring:

PLAN filter
{
  INPUT: stream input-data, stream criteria
  OUTPUT: stream filtered-data
}
Thus, the plan is named and its variables are declared. Note that every variable is a data stream. Currently, this is the only type available and all plan writers must use this when declaring their input and output variables.

Next, we need to specify the set of operations for the FILTER plan. Let's assume that our plan filters out a set of movies from a larger input movie list. In particular, we want to remove input tuples that do not contain "year" attributes of greater than some number (specified on input) and then only show the "film-name" and "year" attributes as output.

To do this, we need to use the relational SELECT and PROJECT operators. Generally speaking, SELECT admits tuples that do meet a certain criteria and PROJECT forwards a subset of the attributes of its input. More details on these operators are available in the operator list documentation. So, we can augment our existing plan like this:

PLAN filter
{
  INPUT: stream input-data, stream criteria
  OUTPUT: stream filtered-data
  
  BODY
  {
    select (input-data, criteria : selected)
    project (selected, "film-name, year" : filtered-data)
  }
}

Notice that a ":" is used to separate operator input from output.

Also, notice that it is possible to use literals. For example, "film-name, year" are specified outright as input to project. In reality, these literals are converted to relations of a single attribute (called "DUMMY") and a single row (the data). To specify multiple tuples in literal form, one needs to use the "|" symbol. For example

  select ("the fox|the bear", "dummy = 'the fox'" : out)
produces one tuple as output.

Referencing Subplans

In an agent plans, one can also reference other agent plans, just as easily as referencing an operator. To do this, the name of the subplan is specified in a parent plan, ensuring that the input and outputs of the subplan match. For example, to reference the FILTER plan as a subplan of a parent plan called COMBINE-FILM-LIBRARIES:

PLAN combine-film-libraries
{
  INPUT: stream query-a, stream query-b
  OUTPUT: stream result

  BODY
  {
    dbquery(query-a, "foo!user!password" : data-a)
    dbquery(query-b, "bar!user!password" : data-b)
    union (data-a, data-b : u-data)
    filter(u-data, criteria : result)
  }
}

Recursive subplans are also permitted - this is often a simple and elegant way to implement while (condition) do style looping.

3. Constructing a datafile

All plans are required to have at least one input variable. Thus, datafiles that specify values for input variables are required.

The structure of a datafile is simple. Relations are declared and then their tuples are specified - in this order. Two declare two relations of data in a single datafile, one needs to declare the first relation, specify its tuples, declare the second relation, and then specify its tuples. Each relation declaration follows the format:

RELATION [name]: [attribute1 name] [char|number|date], [attribute 2 name] ...
Value1|Value2
This shows relation attributes must be named and typed. Then, in the tuples that follow, the data values (separated by a "|") correspond to the attributes specified above.

As an example, the FILTER plan requires two inputs, "input-data" and "criteria". These can be specified in the datafile as:

RELATION input-data: film-name char, director char, year char, price number
Raiders of the Lost Ark|Spielberg|1981|35.99
North by Northwest|Hitchcock|1956|24.99
The Apartment|Wilder|1960|19.99
Star Wars|Lucas|1977|32.99
Vertigo|Hitchcock|1958|24.99
Forrest Gump|Zemekis|1996|28.99
RELATION criteria: dummy char
year < 1970

This input, when combined with the plan, will lead to an output of:

----------------------------------------------
RELATION: filter_filtered-data
   attrs: film-name, year
----------------------------------------------
North by Northwest, 1956
Vertigo, 1958
The Apartment, 1960
----------------------------------------------