An Approach to Generalized Pattern Identification Based on Prototype Instances

 

A major question in instant data mining is the problem of recognition of recurring patterns which are similar to a class of previously known cases. The problem consists of pattern representation and case-based pattern recognition. While there are much effort in both directions in past, but design and implementation of a system which provides both aspects is inevitable. Applications in the context of time-series data mining include but not limited to analysis of time-series, monitoring and diagnosis of critical systems, classification and clustering of time-series, unsupervised and supervised discovery of recurring cases, outlier detection and case-based recognition. In this body of work, we propose a two-tire novel technique for instant data mining. At first we represent a class of time series in terms of a set of features and find the model behind the time-series in the form of a Hidden Markov Model. In second step we couple pattern representation module to a proven efficient optimal pattern search for online and offline pattern recognition. Our experimental results are encouraging and shows this model could be considered as a quick and acceptable case-based pattern recognition.

A major problem in data mining and pattern recognition is the problem of recognition of a segment of waveform in time-series based on their shapes. Applications in the context of time-series data mining include analysis of time-series, monitoring and diagnosis of critical systems, classification and clustering of time-series, unsupervised and supervised discovery of recurrent patterns, outlier recognition and phase shift detection. We propose a hybrid novel technique for representation of wave-form shapes in terms of their feature and the model behind the waveform in as a HMM model and couple this to an efficient optimal pattern search for online and offline pattern recognition. There are two major tracks of work on this problem in the data mining literature. In the first track much of the work has emphasized the issue of scalability in this context. For instance data miner has to be able to scale the representation and matching algorithm for large databases of time-series. The second track has focused their attention to signal representation and matching function aspects of the problem, rather than scalability. While there are a full set of interesting, powerful and applicable work in both areas, we believe that the scalability issue, representation and matching problems are still not adequately solved and it needs more improvement. Meanwhile, design ad implementation of a system which provides both aspects in data mining of times series is inevitable. To be more specific a data mining engine has to provide an acceptable representation and matching technique and at the same time scalable and applicable to large databases. Our main issue is to address a new representation and matching technique while, we will discuss how our approach can be scale up in an easy to use, simple but strong and efficient fashion which has been explained in other papers and its beyond the context of this paper.