Publications
SWARM: Reimagining scientific workflow management systems in a distributed world
Abstract
Modern scientific workflows process massive amounts of data from diverse instruments and sensors, leveraging geographically distributed, heterogeneous compute and storage resources—from leadership-class systems to edge devices—connected by high-performance networks. The diversity of resources introduces challenges in harnessing their full potential, with resilience issues arising across applications, system software, networks, storage, and hardware. Today, workflow management systems (WMS) coordinate the execution of computation and data management tasks across target resources. However, WMS’s centralized nature makes them vulnerable to faults and scalability issues that may result in failures of entire computational campaigns. This paper introduces a novel agentic framework for workflow management, fully distributing and decentralizing the WMS functions and modeling them as swarm …
- Date
- December 26, 2024
- Authors
- Prasanna Balaprakash, Krishnan Raghavan, Franck Cappello, Ewa Deelman, Anirban Mandal, Hongwei Jin, Imtiaz Mahmud, Komal Thareja, Shixun Wu, Pawel Zuk, Mariam Kiran, Zizhong Chen, Sheng Di, Kesheng Wu
- Journal
- The International Journal of High Performance Computing Applications
- Pages
- 10943420251339317
- Publisher
- SAGE Publications