Publications

A general approach to real-time workflow monitoring

Abstract

Scientific workflow systems support different workflow representations, operational modes and configurations. However, independent of the system used, end users need to track the status of their workflows in real time, be notified of execution anomalies and failures automatically, perform troubleshooting and automate the analysis of the workflow to help categorize and qualify the results. In this paper, we describe how the Stampede monitoring infrastructure, which was previously integrated in the Pegasus Workflow Management System, was employed in Triana in order to add generic real time monitoring and troubleshooting capabilities across both systems. Stampede is an infrastructure that attempts to address interoperable monitoring needs by providing a three-layer model: a common data model to describe workflow and job executions; high-performance tools to load workflow logs conforming to the data model …

Date
November 10, 2012
Authors
Karan Vahi, Ian Harvey, Taghrid Samak, Daniel Gunter, Kieran Evans, Dave Rogers, Ian Taylor, Monte Goode, Fabio Silva, Eddie Al-Shakarchi, Gaurang Mehta, Andrew Jones, Ewa Deelman
Conference
2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Pages
108-118
Publisher
IEEE