Theseus FAQ

QUESTION ANSWER
 

The manual shows an example of operators like SELECT that take strings as inputs. Can I use streams instead of strings?

Yes; actually, every operator input is a stream. As the note at the front of the manual says, string literals are converted to streams of 1 tuple and 1 attribute (name of attribute is "dummy", but this name often does not matter).

Thus, it is possible to have an input file consisting of:

RELATION data: val char
10 
20
RELATION criteria: name_does_not_matter char
val < 15
and a plan:
PLAN p1
{
  INPUT: stream data, stream criteria
  OUTPUT: stream output

  BODY
  {
    select(data, criteria : output)
  }
}
 

How complex can SELECT criteria be?

Complex enough to handle most boolean logical expressions that you would like to write. For example, you could have changed the input for the above input file to:
RELATION data: val char
10
20
RELATION criteria: name_does_not_matter char
(val > 2 or (val > 10 and val < 15)) and val < 18
 

How do we write an APPLY function?

Suppose you wanted to write a single-row (apply) function that increments values in specified relation columns by one. For example, suppose you wanted to increment the "val" attribute of the data in the relation shown in the FAQ answer above.

To do this, you would need to do the following:

  • Add a function called "incr" (you could choose whatever name) in the file Functions.java (which is in your theseus directory). The function would look like this:
      public static ArrayList incr(Object s1) {
        ArrayList result = new ArrayList();
        int cur = Integer.parseInt(s1.toString());
        Integer i = new Integer(cur+1);
        result.add(i);
        return result;
      }
    
  • Compile Functions.java successfully (Functions.class is produced).
  • Next, you need to call that function from a plan. You need to make sure that you call the function by its name and specify a valid attribute of the incoming relation. For example, noting that the attribute "val" is in the input data from the above FAQs, we could write the plan this way:
    PLAN p2
    {
      INPUT: stream data, stream criteria
      OUTPUT: stream output
    
      BODY
      {
        select(data, criteria : selected_data)
        apply(selected_data, "incr(val)", "incremented" : output)
      }
    }
    
    which would produce:
    ----------------------------------------------
    RELATION: p2_0_output
       attrs: val, incremented
    ----------------------------------------------
    10, 11
    ----------------------------------------------
    
 

Can I hardcode the SELECT criteria? If not, how do I construct the SELECT criteria from data in the input file?  

You should NOT need to modify the input file hw4a.data. You should look through the operator manual to see if any operators will allow you to take the data (such as bbox) and rewrite it in the style that SELECT criteria needs to be. Take a close look at the operator manual.
 

What is the difference between APPLY and AGGREGATE? How do I write an AGGREGATE function?

The apply operator is meant for single-row computations (i.e., it works on each tuple) while the aggregate operator is meant for multi-row computations and works on a set of tuples (typically, the entire relation). A typical single-row function is "incr" (as described in an earlier FAQ entry) and a typical multi-row function is something like "sum", which would add up all of the numbers in a column.

So, how does one implement "sum"? It's very similar to the way that you implemented "incr" in the earlier example:

  • Add a function called "sum" (you could choose whatever name) in the file Functions.java (which is in your theseus directory). The function would look like this:
      public static Integer sum(ArrayList a1) {
        int sum = 0;
        if (a1 != null) 
          for (int i=0; i<a1.size(); i++)
             sum += Integer.parseInt(a1.get(i).toString()); 
        return new Integer(sum);
      }
    
  • Compile Functions.java successfully (Functions.class is produced).
  • Call that function from a plan. For example, noting that the attribute "val" is in the input data from the above FAQs, we could write the plan this way:
    PLAN p3
    {
      INPUT: stream data
      OUTPUT: stream output
    
      BODY
      {
        aggregate(data, "sum(val)", "newrel", "the_sum" : tmp)
        project(tmp, "the_sum" : output)
      }
    }
    
    which would produce:
    ----------------------------------------------
    RELATION: p3_0_output
       attrs: the_sum
    ----------------------------------------------
    30
    ----------------------------------------------
    
  •  

    How do we run plans?

    To keep things simple, do the following:
    1. Develop your plans and input files in the same directory that you installed Theseus (i.e., c:\theseus).
    2. Start DOS. In Windows, you need to click on Start->Run and then type "command". Or you can click on the DOS icon (if you have one).
    3. cd c:\theseus (or wherever you installed it)
    4. trcli [basename of plan] [full name of datafile]
      EXAMPLE: trcli hw4a.sample hw4a.data
     

    How do we express IF/THEN/ELSE in Theseus (or, how do we express a termination condition for a recursive plan)?

    Recursion is dataflow is the preferred method of looping beacuse it requires fewer operators and synchronization.

    To express IF/THEN/ELSE in your plans, you generally need to do the following:

    • Apply some filtering condition to a stream (e.g., "city = Chicago")
    • Test if the stream is NULL or not. This allows you to route dataflow conditionally.
    To use this in your plans, consider:
    • Using the SELECT operator to filter out some condition you want to test (i.e., the average rating of the set of restaurants).
    • Using the NULL operator to test whether the stream is NULL or not. Make sure that you understand the semantics of the NULL operator. It has 3 inputs (stream to be tested, stream to route when true, stream to route when false) and 2 outputs (new name for "true" stream", new name for "false" stream).