| QUESTION | ANSWER |
|
The manual shows an example of operators like SELECT that take strings as inputs. Can I use streams instead of strings? |
Yes; actually, every operator input is a stream. As the note at
the front of the manual says, string literals are converted to
streams of 1 tuple and 1 attribute (name of attribute is "dummy",
but this name often does not matter).
Thus, it is possible to have an input file consisting of: RELATION data: val char 10 20 RELATION criteria: name_does_not_matter char val < 15and a plan:
PLAN p1
{
INPUT: stream data, stream criteria
OUTPUT: stream output
BODY
{
select(data, criteria : output)
}
}
|
|
How complex can SELECT criteria be? |
Complex enough to handle most boolean logical expressions
that you would like to write. For example, you could have changed the
input for the above input file to:
RELATION data: val char 10 20 RELATION criteria: name_does_not_matter char (val > 2 or (val > 10 and val < 15)) and val < 18 |
|
How do we write an APPLY function? |
Suppose you wanted to write a single-row (apply) function
that increments values in specified relation columns by one.
For example, suppose you wanted to increment the "val" attribute
of the data in the relation shown in the FAQ answer above.
To do this, you would need to do the following:
|
|
Can I hardcode the SELECT criteria? If not, how do I construct the SELECT criteria from data in the input file?
|
You should NOT need to modify the input file hw4a.data. You should look through the operator manual to see if any operators will allow you to take the data (such as bbox) and rewrite it in the style that SELECT criteria needs to be. Take a close look at the operator manual. |
|
What is the difference between APPLY and AGGREGATE? How do I write an AGGREGATE function? |
The apply operator is meant for single-row computations (i.e.,
it works on each tuple) while the aggregate operator is
meant for multi-row computations and works on a set of tuples
(typically, the entire relation). A typical single-row function
is "incr" (as described in an earlier FAQ entry) and a typical
multi-row function is something like "sum", which would add up
all of the numbers in a column.
So, how does one implement "sum"? It's very similar to the way that you implemented "incr" in the earlier example:
public static Integer sum(ArrayList a1) {
int sum = 0;
if (a1 != null)
for (int i=0; i<a1.size(); i++)
sum += Integer.parseInt(a1.get(i).toString());
return new Integer(sum);
}
PLAN p3
{
INPUT: stream data
OUTPUT: stream output
BODY
{
aggregate(data, "sum(val)", "newrel", "the_sum" : tmp)
project(tmp, "the_sum" : output)
}
}
which would produce:
---------------------------------------------- RELATION: p3_0_output attrs: the_sum ---------------------------------------------- 30 ---------------------------------------------- |
|
How do we run plans? |
To keep things simple, do the following:
|
|
How do we express IF/THEN/ELSE in Theseus (or, how do we express a termination condition for a recursive plan)? |
Recursion is dataflow is the preferred method of looping
beacuse it requires fewer operators and synchronization.
To express IF/THEN/ELSE in your plans, you generally need to do the following:
|