Publications
Root cause analysis of data integrity errors in networked systems with incomplete information
Abstract
There are renewed interests recently in data integrity error detection and localization driven by exponentially growing data volumes over large-scale networked systems. Most existing RCA (Root Cause Analysis) systems take an infrastructure operator's view and rely on dedicated and expensive monitoring capabilities to instrument and facilitate the analysis. Unfortunately, in our targeted wide area network environment, complete network information and monitoring capability are normally lacking. In this paper, we present a RCA system that leverages the end-to-end flow monitoring information from the application layer, augmented by limited network information. We demonstrated that root cause localization with high accuracy can be obtained using multi-class classification models. We specifically studied the impacts of different realistic combinations of features based on the available yet incomplete information from …
- Date
- October 20, 2021
- Authors
- Yufeng Xin, Shih-Wen Fu, Anirban Mandal, Ilya Baldin, Ryan Tanaka, Mats Rynge, Karan Vahi, Ewa Deelman, Ishan Abhinit, Welch Von
- Conference
- 2021 International Conference on Information and Communication Technology Convergence (ICTC)
- Pages
- 735-740
- Publisher
- IEEE