Publications

Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy

Abstract

Data analytics and telemetry have become paramount to monitoring and maintaining quality-of-service in addition to business analytics. Stream processing-a model where a network of operators receives and processes continuously arriving discrete elements-is well-suited for these needs. Current and previous studies and frameworks have focused on continuity of operations and aggregate performance metrics. However, real-time performance and tail latency are also important. Timing errors caused by either performance or failed communication faults also affect real-time performance more drastically than aggregate metrics. In this paper, we introduce redundancy in the stream data to improve the real-time performance and resiliency to timing errors caused by either performance or failed communication faults. We also address limitations in previous solutions using a fine-grained acknowledgment tracking scheme …

Date
July 8, 2019
Authors
Geoffrey Phi Tran, John Paul Walters, Stephen Crago
Conference
2019 IEEE International Conference on Services Computing (SCC)
Pages
51-55
Publisher
IEEE