Publications

 
[1] Paul Groth and Yolanda Gil. A scientific workflow construction command line. In International Conference on Intelligent User Interfaces 2009 (IUI2009), 2009.
[ bib ]
Workflows have emerged as a common tool for scientists to express their computational analyses. While there are a multitude of visual data flow editors for workflow construction, to date there are none that support the input of workflows using natural language. This work presents the design of a hybrid system that combines natural language input through a command line with a visual editor. The design of the system is scoped by an extensive analysis of a corpus of workflow descriptions.
[2] Paul Groth and Yolanda Gil. Scaffolding instructions to learn procedures from users. In AAAI 2009 Spring Symposium Agents that Learn from Human Teachers, 2009.
[ bib ]
Humans often teach procedures through tutorial instruction to other humans. For computers, learning from natural human instruction remains a challenge as it is plagued with incompleteness and ambiguity. Instructions are often given out of order and are not always consistent. Moreover, humans assume that the learner has a wealth of knowledge and skills, which computers do not always have. Our goal is to develop an electronic student that can be made increasingly capable through research to learn from human tutorial instruction. Initially, we will provide our student with humanunderstandable instruction that is extended with many scaffolding statements that supplement the limited initial capabilities of the student. Over time, improvements to the system are driven and quantified by the removal of scaffolding instructions that are not considered to be natural for users to provide humans. This paper describes our initial design and implementation of this system, how it learns from scaffolded instruction in two different domains, and the initial relaxations of scaffolding that the system supports.
[3] Paul Groth. Exposing privacy obligation policies in social networking sites. In AAAI 2009 Spring Symposium on Social Semantic Web, 2009.
[ bib ]
Increasingly, web-based applications are created through the composition of multiple functional components provided by different institutions. These so called ''mash-ups'' are an effective means to rapidly develop new applications. However, when these mash-ups are embedded within social networking sites that aggregate and expose personal data, such as Facebook, MySpace, or LinkedIn, serious privacy issues arise because personal data can be transmitted outside the applications hosting institution. In this paper, we describe an initial architecture and implementation to address these privacy concerns through the exposure of privacy obligation policies to the user using a workflow-based representation of mash-ups.
[4] Simon Miles, Paul Groth, and Michael Luck. Handling Mitigating Circumstances for Electronic Contracts. In Proceedings of the AISB 2008 Symposium on Behaviour Regulation in Multi-agent Systems, pages 37-42, Aberdeen, UK, April 2008. The Society for the Study of Artificial Intelligence and Simulation of Behaviour.
[ bib | .pdf ]
Electronic contracts are a means of representing agreed responsibilities and expected behaviour of autonomous agents acting on behalf of businesses. They can be used to regulate behaviour by providing negative consequences, penalties, where the responsibili- ties and expectations are not met, i.e. the contract is violated. How- ever, long-term business relationships require some flexibility in the face of circumstances that do not conform to the assumptions of the contract, that is, mitigating circumstances. In this paper, we describe how contract parties can represent and enact policies on mitigating circumstances. As part of this, we require records of what has oc- curred within the system leading up to a violation: the provenance of the violation. We therefore bring together contract-based and prove- nance systems to solve the issue of mitigating circumstances.
[5] Paul T. Groth. A distributed algorithm for determining the provenance of data. In Proceedings of the fourth IEEE International Conference on e-Science (e-Science'08), 2008.
[ bib | .pdf ]
As computational techniques for tracking provenance have become more widely used, applications are beginning to produce large quantities of provenance information. Furthermore, many of these applications are composed from distributed components (e.g. scientific workflows) that may, for reasons of scalability, security or policy, need to store this information across multiple sites. In this paper, we describe an algorithm, D-PQuery, for determining the provenance of data from distributed sources of provenance information in a parallel fashion. To enable scientist to use DPQuery on already existing Grid infrastructure, we present an implementation of the algorithm as a Condor DAGMan workflow that works across Kickstart records, which are produced in several production e-Science applications including the example application used in this paper, the astronomy application, Montage. Initial performance benchmarks are also presented.
[6] Paul Groth and Luc Moreau. Recording process documentation for provenance. IEEE Transactions on Parallel and Distributed Systems, 2008.
[ bib | .pdf ]
Scientific and business communities are adopting large scale distributed systems as a means to solve a wide range of resource intensive tasks. These communities also have requirements in terms of provenance. We define the provenance of a result produced by a distributed system as the process that led to that result. This paper describes a protocol for recording documentation of a distributed system's execution. The distributed protocol guarantees that documentation with characteristics suitable for accurately determining the provenance of results is recorded. These characteristics are confirmed through a number of proofs based on an abstract state machine formalisation.
[7] Paul Groth, Steve Munroe, Simon Miles, and Luc Moreau. In Lucio Grandinetti (ed.), HPC and Grids in Action, chapter Applying the Provenance Data Model to a Bioinformatics Case. IOS Press, January 2008.
[ bib | .pdf ]
Scientists and, more generally end users of computer systems, need to be able to trust the data they use. Understanding the origin or provenance of data can provide this trust. Attempts have been made to develop systems for recording provenance, however, most are not generic and cannot be applied in a general manner across different systems and different technologies. Moreover, many existing systems confuse the concept of provenance with its representation. In this article, we discuss an open, technology neutral model for provenance. The model can be applied across many different systems and makes the important distinction between provenance and the way it can be generated from a concrete representation of process. The model is described and applied to a grid-based example bioinformatics application.
[8] Paul Groth, Simon Miles, and Luc Moreau. A Model of Process Documentation to Determine Provenance in Mash-ups. Transactions on Internet Technology (TOIT), 9(1), 2008.
[ bib | .pdf ]
Through technologies such as RSS (Really Simple Syndication), Web Services, and AJAX (Asynchronous JavaScript And XML), the Internet has facilitated the emergence of applications that are composed from a variety of services and data sources. Through tools such as Yahoo Pipes, these ``mash-ups'' can be composed in a dynamic, just-in-time manner from components provided by multiple institutions (i.e. Google, Amazon, your neighbour). However, when using these applications, it is not apparent where data comes from or how it is processed. Thus, to inspire trust and confidence in mash-ups, it is critical to be able to analyse their processes after the fact. These trailing analyses, in particular the determination of the provenance of a result (i.e. the process that led to it), are enabled by process documentation, which is documentation of an application's past process created by the components of that application at execution time. In this paper, we define a generic conceptual data model that supports the autonomous creation of attributable, factual process documentation for dynamic multi-institutional applications. The data model is instantiated using two Internet formats, OWL and XML, and is evaluated with respect to questions about the provenance of results generated by a complex bioinformatics mash-up.
[9] Simon Miles, Paul Groth, Ewa Deelman, Karan Vahi, Gaurang Mehta, and Luc Moreau. Provenance: The bridge between experiments and data. Computing in Science and Engineering, 10(3):38-46, May/June 2008.
[ bib | .pdf ]
Current scientific applications are often structured as workflows and rely on workflow systems to compile abstract experiment designs into enactable workflows that utilise the best available resources. The automation of this step and of the workflow enactment, hides the details of how results have been produced. Knowing how compilation and enactment occurred allows results to be reconnected with the experiment design. We investigate how provenance helps scientists to connect their results with the actual execution that took place, their original experiment and its inputs and parameters.
[10] Jie Xu, Paul Townend, Nik Looker, and Paul T. Groth. Ft-grid: a system for achieving fault tolerance in grids. Concurrency and Computation: Practice and Experience, 20(3):297-309, 2008.
[ bib | .pdf ]
The FT-Grid system introduces a fault-tolerance framework that allows faults occurring in service- oriented systems to be tolerated, thus increasing the dependability of such systems. This paper presents the design, development and evaluation of FT-Grid. We show empirical evidence of the dependability benefits offered by FT-Grid by performing an experimental dependability analysis using fault-injection testing performed with the WS-FIT tool. We then illustrate a potential problem with voting-based fault-tolerance schemes in the service-oriented paradigm-namely that individual channels within a fault-tolerant system, supposed to be independent of each other, may in fact invoke common services as part of their workflow, thus increasing the potential for common-mode failure of those channels. We propose a solution to this issue by using the technique of provenance to provide FT-Grid with topological awareness. We implement a large experimental system, and-with the use of the Provenance Recording for Services system developed as part of the PASOA project at the University of Southampton-perform a large number of experiments that show that a topologically aware FT-Grid system serves as a much more dependable system than any other configuration tested, while imposing a negligible timing overhead.
[11] Luc Moreau, Paul Groth, Simon Miles, Javier Vazquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, Omer Rana, Andreas Schreiber, Victor Tan, and Laszlo Varga. The provenance of electronic data. Communications of the ACM, 51(4):52-58, 2008.
[ bib | http ]
[12] Simon Miles, Ewa Deelman, Paul Groth, Karan Vahi, Gaurang Mehta, and Luc Moreau. Connecting scientific data to scientific experiments with provenance. In Proceedings of the third IEEE International Conference on e-Science and Grid Computing (e-Science'07), Bangalore, India, December 2007.
[ bib | .pdf ]
As scientific workflows, and the data they operate on, grow in size and complexity, the task of defining how those workflows should execute (which resources they should use, where those resources should be in preparation for processing etc.) becomes proportionally more difficult. While `workflow compilers', such as Pegasus, aid greatly in reducing this burden, a further problem arises: as specifying the details of execution is now automatic, a workflow's results are harder to interpret, as they are in part due to the specifics of execution. By automating the steps between the original experiment design and its results, we lose the connection between them, making results harder to interpret. To reconnect the scientific data with the original experiment, we argue that scientists should have access to the full provenance of their data, including not only parameters, input data and intermediary results, but also the abstract experiment, refined into a concrete execution by the `workflow compiler'. In this paper, we describe our preliminary work on adapting Pegasus to capture the process of workflow refinement in the PASOA provenance system.
[13] Paul T. Groth. The Origin of Data: Enabling the Determination of Provenance in Multi-institutional Scientific Systems through the Documentation of Processes. PhD thesis, University of Southampton, September 2007.
[ bib | .pdf ]
The Oxford English Dictionary defines provenance as (i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; concr., a record of the ultimate derivation and passage of an item through its various owners. In art, knowing the provenance of an artwork lends weight and authority to it while providing a context for curators and the public to understand and appreciate the work's value. Without such a documented history, the work may be misunderstood, unappreciated, or undervalued. In computer systems, knowing the provenance of digital ob jects would provide them with greater weight, authority, and context just as it does for works of art. Specifically, if the prove- nance of digital ob jects could be determined, then users could understand how documents were produced, how simulation results were generated, and why decisions were made. Provenance is of particular importance in science, where experimental results are reused, reproduced, and verified. However, science is increasingly being done through large-scale collaborations that span multiple institutions, which makes the problem of determining the provenance of scientific results significantly harder. Current approaches to this problem are not designed specifically for multi-institutional scien- tific systems and their evolution towards greater dynamic and peer-to-peer topologies. Therefore, this thesis advocates a new approach, namely, that through the autonomous creation, scalable recording, and principled organisation of documentation of systems' processes, the determina- tion of the provenance of results produced by complex multi-institutional scientific systems is enabled. The dissertation makes four contributions to the state of the art. First is the idea that provenance is a query performed over documentation of a system's past process. Thus, the problem is one of how to collect and collate documentation from multiple distributed sources and organise it in a manner that enables the provenance of a digital ob ject to be determined. Second is an open, generic, shared, principled data model for documentation of processes, which enables its collation so that it provides high-quality evidence that a system's processes occurred. Once documentation has been created, it is recorded into specialised repositories called provenance stores using a formally specified protocol, which ensures documentation has high- quality characteristics. Furthermore, patterns and techniques are given to permit the distributed deployment of provenance stores. The protocol and patterns are the third contribution. The fourth contribution is a characterisation of the use of documentation of process to answer questions related to the provenance of digital ob jects and the impact recording has on application performance. Specifically, in the context of a bioinformatics case study, it is shown that six different provenance use cases are answered given an overhead of 13time. Beyond the case study, the solution has been applied to other applications including fault tolerance in service-oriented systems, aerospace engineering, and organ transplant management.
[14] Luc Moreau, Bertram Ludäscher, Ilkay Altintas, Roger S. Barga, Shawn Bowers, Steven Callahan, George Chin Jr., Ben Clifford, Shirley Cohen, Sarah Cohen-Boulakia, Susan Davidson, Ewa Deelman, Luciano Digiampietri, Ian Foster, Juliana Freire, James Frew, Joe Futrelle, Tara Gibson, Yolanda Gil, Carole Goble, Jennifer Golbeck, Paul Groth, David A. Holland, Sheng Jiang, Jihie Kim, David Koop, Ales Krenek, Timothy McPhillips, Gaurang Mehta, Simon Miles, Dominic Metzger, Steve Munroe, Jim Myers, Beth Plale, Norbert Podhorszki, Varun Ratnakar, Emanuele Santos, Carlos Scheidegger, Karen Schuchardt, Margo Seltzer, Yogesh L. Simmhan, Claudio Silva, Peter Slaughter, Eric Stephan, Robert Stevens, Daniele Turi, Huy Vo, Mike Wilde, Jun Zhao, and Yong Zhao. The First Provenance Challenge. Concurrency and Computation: Practice and Experience, 20(5):409-418, April 2007.
[ bib | .pdf ]
The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions.
[15] Paul Groth, Sheng Jiang, Simon Miles, Steve Munroe, Victor Tan, Sofia Tsasakou, and Luc Moreau. An Architecture for Provenance Systems. Technical report, University of Southampton, February 2007.
[ bib | http ]
This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies.
[16] David W. Eccles and Paul T. Groth. Wolves, bees, and football: Enhancing coordination in sociotechnological problem solving systems through the study of human and animal groups. Computers in Human Behavior, 23(6):2778-2790, 2007.
[ bib | .pdf ]
This paper describes how sociotechnological systems comprising human and technological agents can be considered problem solving systems. Problem solving systems typically comprise many agents, each characterized by at least partial autonomy. A challenge for problem solving systems is to coordinate system agent operations during problem solving. This paper explores how competence models of human?human and animal?animal coordination might be used to inform the design of problem solving systems so that the potential for agent coordination is enhanced. System design principles are identified based on a review of competent coordination in human groups, such as work and sport teams, and animal groups, such wolf packs and bee colonies. These principles are then discussed in relation to agent coordination in the domains of E-Science, future combat systems, and medicine, which typify real-world environments comprising problem solving systems.
[17] Simon Miles, Paul Groth, Miguel Branco, and Luc Moreau. The requirements of using provenance in e-science experiments. Journal of Grid Computing, 5(1):1-25, 2007.
[ bib | .pdf ]
In e-Science experiments, it is vital to record the experimental process for later use such as in interpreting results, verifying that the correct process took place or tracing where data came from. The process that led to some data is called the provenance of that data, and a provenance architecture is the software architecture for a system that will provide the necessary functionality to record, store and use process documentation to determine the provenance of data items. However, there has been little principled analysis of what is actually required of a provenance architecture, so it is impossible to determine the functionality they would ideally support. In this paper, we present use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, technology and applicationindependent architecture. We propose an architecture that meets these requirements, analyse its features compared with other approaches and evaluate a preliminary implementation by attempting to realise two of the use cases.
[18] Simon Miles, Paul Groth, Steve Munroe, Sheng Jiang, Thibaut Assandri, and Luc Moreau. Extracting causal graphs from an open provenance data model. Concurrency and Computation: Practice and Experience, 2007. to appear.
[ bib | .pdf ]
The open provenance architecture (OPA) approach to the challenge was distinct in several regards. In particular, it is based on an open, well-defined data model and architecture, allowing different components of the challenge workflow to independently record documentation, and for the workflow to be executed in any environment. Another noticeable feature is that we distinguish between the data recorded about what has occurred, process documentation, and the provenance of a data item, which is all that caused the data item to be as it is and is obtained as the result of a query over process documentation. This distinction allows us to tailor the system to separately best address the requirements of recording and querying documentation. Other notable features include the explicit recording of causal relationships between both events and data items, an interaction-based world model, intensional definition of data items in queries rather than relying on explicit naming mechanisms, and styling of documentation to support non-functional application requirements such as reducing storage costs or ensuring privacy of data. In this paper we describe how each of these features aid us in answering the challenge provenance queries.
[19] Simon Miles, Paul Groth, Steve Munroe, Michael Luck, and Luc Moreau. AgentPrIMe: Adapting MAS Designs to Build Confidence. In Proceedings of the 8th International Workshop on Agent Oriented Software Engineering, 2007.
[ bib | .pdf ]
[20] Simon Miles, Sylvia C. Wong, Weijian Fang, Paul Groth, Klaus-Peter Zauner, and Luc Moreau. Provenance-based validation of e-science experiments. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 5:28-38, 2007.
[ bib | .pdf ]
[21] David W. Eccles and Paul T. Groth. Agent coordination and communication in sociotechnological systems: Design and measurement issues. Interacting with Computers, 18(6):1170-1185, 2006.
[ bib | .pdf ]
This article is concerned with enhancing agent coordination in modern sociotechnological systems. To this end, sociotechnological systems are conceptualized as problem solving systems that comprise human and technological agents engaged in dynamic collaboration. Following this, there is a discussion of the challenge of achieving agent coordination in problem solving systems as technological agents become increasingly autonomous. A key assertion is that agent coordination in problem solving systems might be enhanced through the study of competent coordination in living systems such as human and animal groups. Based on a review of research on competent coordination in human and animal groups, design principles for problem solving systems are then presented. Finally, methods are proposed for measuring the extent to which a given agent operates in accordance with these principles.
[22] David W. Eccles and Paul T. Groth. Problem solving systems theory: Implications for the design of socio-technological systems. Technology, Instruction, Cognition and Learning (TICL), 3(3-4):323 - 343, 2006.
[ bib | .pdf ]
[23] Paul Groth, Simon Miles, and Steven Munroe. Principles of high quality documentation for provenance: A philosophical discussion. In Luc Moreau and Ian Foster, editors, Proceedings of Third International Provenance and Annotation Workshop (IPAW'06), volume 4145 of Lecture Notes in Computer Science, Chicago, IL, 2006. Springer.
[ bib | .pdf ]
Computer technology enables the creation of detailed documentation about the processes that create or affect entities (data, objects, etc.). Such documentation of the past can be used to answer various kinds of questions regarding the processes that led to the creation or modification of a particular entity. The answer to such questions are known as an entity?s provenance. In this paper, we derive a number of principles for documenting the past, grounded in work from philosophy and history, which allow for provenance questions to be answered within a computational context. These principles lead us to argue that an interaction-based model is particularly suited for representing high quality documentation of the past.
[24] Victor Tan, Paul Groth, Simon Miles, Sheng Jiang, Steve Munroe, Sofia Tsasakou, and Luc Moreau. Security issues in a soa-based provenance system. In Luc Moreau and Ian Foster, editors, Proceedings of Third International Provenance and Annotation Workshop (IPAW'06), volume 4145 of Lecture Notes in Computer Science, Chicago, IL, 2006. Springer.
[ bib | .pdf ]
Recent work has begun exploring the characterization and utilization of provenance in systems based on the Service Oriented Architecture (such as Web Services and Grid based environments). One of the salient issues related to provenance use within any given system is its security. Provenance presents some unique security requirements of its own, which are additionally dependent on the architectural and environmental context that a provenance system operates in. We discuss the security considerations pertaining to a Service Oriented Architecture based provenance system. Concurrently, we outline possible approaches to address them.
[25] Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson, Michael Luck, and Luc Moreau. A proof of concept: Provenance in a service oriented architecture. In Proceedings of the Fourth All Hands Meeting (AHM'05), September 2005.
[ bib | .pdf ]
[26] Paul Groth, Simon Miles, and Luc Moreau. Preserv: Provenance recording for services. In Proceedings of the UK OST e-Science Fourth All Hands Meeting (AHM05), September 2005.
[ bib | .pdf ]
The importance of understanding the process by which a result was generated in an experiment is fundamental to science. Without such information, other scientists cannot replicate, validate, or duplicate an experiment. We define provenance as the process that led to a result. With large scale in-silico experiments, it becomes increasingly difficult for scientists to record process documentation that can be used to retrieve the provenance of a result. Provenance Recording for Services (PReServ) is a software package that allows developers to integrate process documentation recording into their applications. PReServ has been used by several applications and its performance has been benchmarked.
[27] Paul Townend, Paul Groth, Nik Looker, and Jie Xu. Ft-grid: A fault-tolerance system for e-science. In Proceedings of the UK OST e-Science Fourth All Hands Meeting (AHM05), September 2005.
[ bib | .pdf ]
The size and complexity of many e-Science applications suggests that they may be very prone to errors and failures; the cost of recovering from failures may also be high. The FT-Grid system, developed as part of the e-Demand project at the University of Leeds [1], introduces a replication-based fault tolerance scheme that allows faults occurring in service-based systems to be tolerated, thus increasing the dependability of such systems. This paper details the progress that has been made in the development of FT-Grid, including both a GUI client and also an FT-Grid web service interface. We show empirical evidence of the dependability benefits offered by FT-Grid, by performing a dependability analysis on the results of fault injection testing performed with the WS-FIT tool at the University of Durham. We then illustrate a potential problem with voting based fault tolerance approaches in the service-oriented paradigm ? namely, that individual channels within fault-tolerant systems may invoke common services as part of their workflow, thus increasing the potential for commonmode failure. We propose a solution to this issue by using the technique of provenance to provide FT-Grid with topological awareness. We implement a large test system, and - with the use of the PreServ provenance system developed as part of the PASOA e-Science project at the University of Southampton - perform a large number of experiments which show that a provenance-aware FTGrid results in a much more dependable system than any of the other configurations tested, whilst imposing a negligible timing overhead.
[28] Sylvia C. Wong, Simon Miles, Weijian Fang, Paul Groth, and Luc Moreau. Validation of e-science experiments using a provenance-based approach. In Proceedings of Fourth All Hands Meeting (AHM'05), Nottingham, September 2005.
[ bib | http ]
E-science experiments typically involve many distributed services maintained by different organisations. As part of the scientific process, it is important for scientists to be able to verify the correctness of their own experiments, or to review the correctness of their peers? work. There is no existing framework for validating such experiments. Users therefore have to rely on error checking performed by the services, or adopt other ad hoc methods. This paper introduces a platform independent framework for validating workflow executions. The validation relies on reasoning over the documented provenance of experiment results and semantic descriptions of services advertised in a registry. This validation process ensures experiments are performed correctly, and thus results generated are meaningful. The framework is tested in a bioinformatics application that performs protein compressibility analysis.
[29] Paul Groth, Simon Miles, Weijian Fang, Sylvia C. Wong, Klaus-Peter Zauner, and Luc Moreau. Recording and using provenance in a protein compressibility experiment. In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC'05), July 2005.
[ bib | .pdf ]
Very large scale computations are now becoming routinely used as a methodology to undertake scientific research. In this context, ?provenance systems? are regarded as the equivalent of the scientist?s logbook for in silico experimentation: provenance captures the documentation of the process that led to some result. Using a protein compressibility analysis application, we derive a set of generic use cases for a provenance system. In order to support these, we address the following fundamental questions: what is provenance? how to record it? what is the performance impact for grid execution? what is the performance of reasoning? In doing so, we define a technologyindependent notion of provenance that captures interactions between components, internal component information and grouping of interactions, so as to allow us to analyse and reason about the execution of scientific processes. In order to support persistent provenance in heterogeneous applications, we introduce a separate provenance store, in which provenance documentation can be stored, archived and queried independently of the technology used to run the application. Through a series of practical tests, we evaluate the performance impact of such a provenance system. In summary, we demonstrate that provenance recording overhead of our prototype system remains under 10 that the recorded information successfully supports our use cases in a performant manner.
[30] Paul T. Groth. On the record: Provenance in large scale, open, distributed systems. Minithesis, University of Southampton; Faculty of Engineering, Science and Mathematics; School of Electronics and Computer Science, July 2005.
[ bib | .pdf ]
Scientist increasingly rely on large scale, open distributed systems such as Grids in order to investigate a wide variety of research questions. In such systems, it is difficult to know exactly how a result is generated, however, such information is necessary for the scientific process. Therefore, it is vital that these systems have an automated mechanism for documenting process from which a result?s provenance can be retrieved. The provenance of a result is the process that led to that result. This thesis defines what provenance is for distributed systems based on the Service Oriented Architecture model. It presents a structure for the documentation of process from which the provenance of a result can be retrieved. Based on this structure, a set of patterns and a protocol are presented for recording assertions about processes in Service Oriented Architecture-based systems. An implementation of these specifications is then detailed followed by an evaluation of that implementation. Finally, a direction for future work is outlined. esse sequitur operari being follows functioning
[31] Paul Townend, Paul Groth, and Jie Xu. A provenance-aware weighted fault tolerance scheme for service-based applications. In Proc. of the 8th IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC 2005), May 2005.
[ bib | .pdf ]
Service-orientation has been proposed as away of facilitating the development and integration of increasingly complex and heterogeneous system components. However, there are many new challenges to the dependability community in this new paradigm, such as how individual channels within fault-tolerant systems may invoke common services as part of their workflow, thus increasing the potential for common-mode failure. We propose a scheme that - for the first time - links the technique of provenance with that of multi-version fault tolerance. We implement a large test system and perform experiments with a single-version system, a traditional MVD system, and a provenance-aware MVD system, and compare their results. We show that for this experiment, our provenance-aware scheme results in a much more dependable system than either of the other systems tested, whilst imposing a negligible timing overhead.
[32] David W. Eccles and Paul T. Groth. Creating expert problem solving systems. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05), Jan 2005.
[ bib | .pdf ]
This paper takes the content of our wolves, football... paper and addresses it to the System Science community.
[33] Sylvia C. Wong, Simon Miles, Weijian Fang, Paul Groth, and Luc Moreau. Provenance-based validation of e-science experiments. In Proceedings of 4th International Semantic Web Conference (ISWC'05), volume 3729 of Lecture Notes in Computer Science, pages 801-815, Galway, Ireland, nov 2005. Springer-Verlag.
[ bib | .pdf ]
[34] Paul Groth, Michael Luck, and Luc Moreau. A protocol for recording provenance in service-oriented grids. In Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS'04), Grenoble, France, December 2004.
[ bib | .pdf ]
Both the scientific and business communities, which are beginning to rely on Grids as problem-solving mechanisms, have requirements in terms of provenance. The provenance of some data is the documentation of process that led to the data; its necessity is apparent in fields ranging from medicine to aerospace. To support provenance capture in Grids, we have developed an implementation-independent protocol for the recording of provenance. We describe the protocol in the context of a service-oriented architecture and formalise the entities involved using an abstract state machine or a three-dimensional state transition diagram. Using these techniques we sketch a liveness property for the system.
[35] David W. Eccles and Paul T. Groth. Wolves, football, and ambient computing: facilitating collaboration in problem solving systems through the study of human and animal groups. In Proceedings of the Third Nordic conference on Human-Computer interaction, pages 269-275. ACM Press, October 2004.
[ bib | .pdf ]
This paper describes how computer-human interaction in ambient computing environments can be best informed by conceptualizing of such environments as problem solving systems. Typically, such systems comprise multiple human and technological agents that meet the demands imposed by problem constraints through dynamic collaboration. A key assertion is that the design of ambient computing environments towards efficacious human-machine collaboration can benefit from an understanding of competence models of human-human and animal-animal collaboration. Consequently, design principles for such environments are derived from a review of competent collaboration in human groups, such as sport teams, and animal groups, such as wolf packs.
[36] Paul Groth, Michael Luck, and Luc Moreau. Formalising a protocol for recording provenance in grids. In Proceedings of the UK OST e-Science Second All Hands Meeting 2004 (AHM'04), Nottingham, UK, September 2004.
[ bib | .pdf ]
Both the scientific and business communities are beginning to rely on Grids as problemsolving mechanisms. These communities also have requirements in terms of provenance. Provenance is the documentation of process and the necessity for it is apparent in fields ranging from medicine to aerospace. To support provenance capture in Grids, we have developed an implementation-independent protocol for the recording of provenance. We describe the protocol in the context of a service-oriented architecture and formalise the entities involved using an abstract state machine or a three-dimensional state transition diagram. Using these techniques we sketch a liveness property for the system.
[37] Paul T. Groth. Recording provenance in service-oriented architectures. 9 month report, University of Southampton; Faculty of Engineering, Science and Mathematics; School of Electronics and Computer Science, 2004.
[ bib | .pdf ]
Provenance is the documentation of process for some result. This report addresses provenance recording in Service-Oriented Architectures, specifically for Grids and Web Services. The document begins by motivating the need for provenance recording. It then presents background information for Service-Oriented Architectures, Grids, Web Services and provenance software. Given this background, an architecture and protocol for recording provenance are presented along with an implementation of a provenance service. Finally, a direction for future work is outlined.
[38] Niranjan Suri, Jeffrey Bradshaw, Andrzej Uszok, Maggie Breedy, Marco Carvalho, Paul Groth, Renia Jeffers, Matt Johnson, Shri Kulkarni, James Lott, Mark Burstein, Brett Benyo, and David Diller. Towards DAML-based policy enforcement for semantic data transformation and filtering in multi-agent systems. In Proceedings of the Second nternational joint conference on Autonomous agents and multiagent systems (AAMAS 2003), pages 1132-1133. ACM Press, 2003.
[ bib | .pdf ]
This paper describes an approach to provide runtime policy-based control over information exchange. Two different control mechanisms are discussed: semantic (content-based) filtering of messages as well as in-stream transformation of messages. Both of these control mechanisms are driven by policies at run-time. These mechanisms allow a far more fine-grained control over dynamic and autonomous agent interactions. With such an approach, we hope to increase the confidence with which system designers will adopt agent-based approaches to building dynamic, heterogeneous systems.
[39] Niranjan Suri, Jeffrey M. Bradshaw, Marco M. Carvalho, Thomas B. Cowin, Maggie R. Breedy, Paul T. Groth, and Raul Saavedra. Agile Computing: Bridging the Gap between Grid Computing and Ad-hoc Peer-to-Peer Resource Sharing. In Proceedings of the 3st International Symposium on Cluster Computing and the Grid, page 618. IEEE Computer Society, 2003.
[ bib | .pdf ]
Agile computing may be defined as opportunistically (or on user demand) discovering and taking advantage of available resources in order to improve capability, performance, efficiency, fault tolerance, and survivability. The term agile is used to highlight both the need to quickly react to changes in the environment as well as the need to exploit transient resources only available for short periods of time. Agile computing builds on current research in grid computing, ad-hoc networking, and peer-to-peer resource sharing. This paper describes both the general notion of agile computing as well as one particular approach that exploits mobility of code, data, and computation. Some performance metrics are also suggested to measure the effectiveness of any approach to agile computing.
[40] Niranjan Suri, Marco Carvalho, Jeffrey M. Bradshaw, Maggie R. Breedy, Thomas B. Cowin, Paul T. Groth, Raul Saavedra, and Andrzej Uszok. Enforcement of communications policies in software agent systems through mobile code. In Proceedings of the 4th IEEE International Workshop on Policies for Distributed Systems and Networks, page 247. IEEE Computer Society, 2003.
[ bib | .pdf ]
This paper introduces the use of mobile agents as the mechanism for policy enforcement in multi-agent multidomain systems. The focus is on the effective application of communication policies in the setup and maintenance of spanning data streams that cross multiple hosts in different domains. We have designed and implemented a mobile agent based framework (FlexFeed) that works in concert with the KAoS framework for policy management.
[41] Niranjan Suri, Paul T. Groth, and Jeffrey M. Bradshaw. While You're Away: A System for Load-Balancing and Resource Sharing Based on Mobile Agents. In Proceedings of the 1st International Symposium on Cluster Computing and the Grid, page 470. IEEE Computer Society, 2001.
[ bib | .pdf ]
While You're Away (WYA) is a distributed system that aggregates the computational power of individual computer systems. WYA introduces the notion of Roaming Computations - Java-based programs that move around the network utilizing the resources of idle workstations. WYA provides architectural independence and addresses issues of convenience, security, and incentive for owners of workstations. WYA is based on the NOMADS mobile agent system, which uses the Aroma Virtual Machine (VM) to provide strong mobiliy, resource control, and resource accounting. WYA currently runs on Win32 and UNIX workstations but is being extended to work on other computational devices such as television set-top boxes, video game consoles, and Internet appliances.
[42] Paul T. Groth and Niranjan Suri. CPU Resource Control and Accounting in the NOMADS Mobile Agent System. In Proceedings of the ACM OOPSLA Workshop on Experiences with Autonomous Mobile Objects and Agent Based Systems, Minneapolis, USA, Oct. 2000., 2000.
[ bib | .pdf ]
NOMADS is a mobile agent system for Java-based mobile agents. One of the key enhancements provided by NOMADS is the ability to monitor and control resources consumed by agents running within the NOMADS environment. This paper describes the CPU resource control mechanism, which complements the disk and network resource controls already available in the NOMADS environment.
[43] Niranjan Suri, Jeffrey Bradshaw, Maggie R. Breedy, Paul T. Groth, Gregory A. Hill, and Renia Jeffers. Strong Mobility and Fine-Grained Resource Control in NOMADS. In Friedemann Mattern David Kotz, editor, Proceedings of the Second International Symposium on Agent Systems and Applications and Fourth International Symposium on Mobile Agents, ASA/MA 2000, Zurich, Switzerland, volume 1882 / 2004 of Lecture Notes in Computer Science, pages 2-15. Springer-Verlag, 2000.
[ bib | .pdf ]
NOMADS is a Java-based agent system that supports strong mobility (i.e., the ability to capture and transfer the full execution state of migrating agents) and safe agent execution (i.e., the ability to control resources consumed by agents, facilitating guarantees of quality of service while protecting against denial of service attacks). The NOMADS environment is composed of two parts: an agent execution environment called Oasis and a new Java-compatible Virtual Machine (VM) called Aroma. The combination of Oasis and the Aroma VM provides key enhancements over todays Java agent environments.
[44] Niranjan Suri, Jeffrey M. Bradshaw, Maggie R. Breedy, Paul T. Groth, Gregory A. Hill, Renia Jeffers, and Timothy S. Mitrovich. An Overview of the NOMADS Mobile Agent System. In Proceedings of ECOOP'2000, Nice, France, 2000, 2000.
[ bib | .pdf ]
[45] Niranjan Suri, Jeffrey M. Bradshaw, Maggie R. Breedy, Paul T. Groth, Gregory A. Hill, Renia Jeffers, Timothy S. Mitrovich, Brian R. Pouliot, and David S. Smith. NOMADS: toward a strong and safe mobile agent system. In Proceedings of the fourth international conference on Autonomous agents, pages 163-164. ACM Press, 2000.
[ bib | .pdf ]

This file has been generated by bibtex2html 1.75