At the core of the WINGS architecture is a workflow generation and validation algorithm that propagates component and dataset constraints throughout the structure of the workflow. This algorithm and its underlying reasoning are at the core of our research. Here is how it works.
From an underspecified user request, WINGS generates workflow candidates that are consistent with the request and completes their specification so they can be submitted for execution. In the process, WINGS validates each workflow candidate and eliminates those that contain inconsistent constraints for their datasets and components. The algorithm elaborates workflow candidates in three major phases:
Phase 1. Component selection: Starting from the end data products in the workflow, the workflow is traversed backwards and for each component find additional constraints on any input arguments given the constraints on output. If an abstract component is encountered in the workflow, then the component catalog is asked for a set of specialized components that match the current workflow constraints. A new workflow candidate is created with each specialized component, and the backward sweep proceeds. Some parameter values may be set during this step as more constraints on arguments are introduced. At the end of this step, the constraints have been propagated to the workflow inputs for each workflow candidate.
Phase 2. Dataset selection: Given the constraints on inputs from the prior step, the data catalog is queried to find available datasets that match those constraints. There can be more than one dataset combinations for a given input set. For each combination of datasets, a new workflow candidate is created by binding the input data variables to matching data objects. Candidates with no matching data sources will be rejected as invalid. The data catalog is then queried about additional metadata properties for each dataset in a workflow candidate, which will be used in the next step.
Phase 3. Parameter selection: Given the metadata properties of the input datasets from phase 2 and their additional constraints propagated in phase 1, traverse the workflow forward and for each step query the component catalog for additional constraints on its outputs given the constraints on its inputs. This step results in workflows that contain metadata properties for all intermediate and final workflow data products. For each step, the component catalog is also queried regarding parameter values that are appropriate given the constraints on input and output datasets. For each choice of parameter settings, a new workflow candidate is created. Candidates are rejected if no parameter settings are possible, or if the component constraints are incompatible with its input data constraints.
Only valid and completely specified workflows are generated with this algorithm. The user can select a subset and WINGS converts them to a format appropriate for the execution engine of choice.
There are a lot of details in papers, and a sample of the internal representations in the posting of our entry to the First Provenance Challenge.