The Theseus Operator Set
Theseus operators are designed to process data as fast as possible,
especially when a set of data is arriving incrementally from a remote
network data source. Operators share some common definitions and
traits, which are listed below:
- Each data variable processed by operators is a relations.
- Relations consist of a list of attributes and zero or more
tuples of data that contain values corresponding to the attributes
of the relation to which they belong.
- Relations are streamed to/from operators during execution. When streamed,
the set of tuples (each with a pointer to the attribute list for the
relation) is communicated to the consuming operator from the producing
operator. Communication of this set of data is followed by an
end-of-stream (EOS) token, to denote that all of the data for
that relation has been communicated prior.
- There is limited support for literal relations in the plan. To express
a literal relation, one simply can use double quotes and the "|" character
to denote the next tuple. Literal relations always have the same
schema ("value char").
- EXAMPLE: select ("foo|bar", "value != 'foo'" : output)
results in only "bar" being emitted.
- NULL literals (i.e., streams that consist of only an EOS) can
be specified by using the reserved keyword NULL.
- EXAMPLE: select ("foo|bar", NULL : output)
results in both "foo" and "bar" being emitted.
- Note: there has not been extensive testing of NULL in all
operators, under all conditions. The most common use of
NULL is as a parameter to the Join
operator, to indicate that the cartesian product is desired.
- Several non-relational operators automatically join their input
with their output - this is called a dependent join.
The Wrapper and Xquery examples demonstrate this
feature.
- Tuples communicated between two operators are not guaranteed to
arrive in the order sent.
- Data that is communicated to more than one consumer is deep copied
automatically by the system, to avoid having the processing of one
flow corrupt the status of another.