Self-Simailrty in Data Mining and Predictve Modeling

 

Recently there have been an increasing interest in knowledge discovery using fractal dimension and self-similarity. Due to the new observations, finding and discoveries in several domains such as network data, multimedia databases, geographical datasets, Internet user behavior and individual shopping behavior; using fractal dimension and self-similarity for data mining seems inevitable. In this paper we intent to review the current state of the art techniques of using fractal dimension and self similarity for data mining in brief and to point out on new challenges in this field. We review on several recent data mining techniques using of such phenomenon including our works on Self-Similar Layered Hidden Markov Model and early stage result of sequential association rules. Both provided techniques has been investigated in the context of Network data and Patient data in Critical Care Units.

Introduction Almost all biological systems contain self-similar structures that are made through recurrent processes, while many physical systems contain a form of functional self-similarity that owes its richness to recursion. Ant colonies, human brains, economic markets, network data and nature create enormously complex behavior that is much richer than the behavior of the individual component units. In time series analysis, self-similar phenomenon show structural similarities across all or at least a wide range of time scales. In geometry self-similarity comes with the term fractal. Fractals have two interesting features. First they are self-similar on multiple scales, in that a small portion of fractal will often look similar to the whole object or the whole structure. Second, fractals have a fractional dimension, as opposed to an integer dimension that idealized objects or structures have. As all fractals are self-similar, they have a built-in form of recursion. The first type of fractals can typically be defines by a program-like specification, while the second type of fractal is usually related to a random or stochastic process. The typical assumption in many data mining techniques is that data is uniformly distributed, with independence between attributes. However, real data sets disobey these assumptions; rather, they typically are exhibit fractal dimensionalities that are much lower than their embedding dimension. Recently there have been several attempt to apply data mining techniques through fractal dimensions and self-similar characteristics in different domains including: dimensionality curse, using self-similar characteristics to mine databases and discover hidden patterns such as association rules, clustering, classification modeling and finding outliers.

Network Data

In this body of work we study, analyze and discuss the exploitation of self-similar structure embedded in Network data for prediction through Self Similar Layered Hidden Markov Model (SSLHMM). SSLHMM is a novel alternative of Hidden Markov Models (HMM) which proven to be useful in a variety of real world applications. SSLHMM leverage HMM power and extend such capability to self-similar structures and exploit this property to reduce the complexity of predictive modeling process. We show that SSLHMM approach can captures self-similar information and provides more accurate and interpretable model comparing to conventional techniques and one-step, flat method for model constructions such as HMM. 1 Introduction In recent years there have been a handful of research and development for traffic modeling and forecasting the Network data. Recent measurements of local-area and wide-area traffic have shown that network traffic exhibits variability at a wide range of time scales. For instance, analysis of traffic data from networks and services such as ISDN traffic, Ethernet LAN’s, Common Channel Signaling Network (CCNS) and Variable Bit Rate (VBR) video have all convincingly demonstrated the presence of features such as self-similarity, long range dependence, slowly decaying variances, heavy-tailed distributions and fractal dimensions. Through past decade, different models suggested for Network behavior analysis. Although several works attempt to analyze, model and predict Network data such as Markov Model, but there are only a few remarkable works which have shown successful result and the problem of modeling Network data still is an open issue. While there have been much effort on observing self-similar structures in scientific databases and natural structures there are few works on using self-similar structure and fractal dimension for data mining , predictive modeling and forecasting. Recently there have been several attempts to apply data mining techniques through fractal dimensions and self-similar characteristics in different domains. Among these works, using fractal dimension and self-similarity for managing the dimensionally curse, learning association rules and application in spatial joint are considerable. In this paper we study and analyze the application of Self Similar Layered Hidden Markov Model (SSLHMM) introduced by Adibi and Shen in for predictive modeling of Network data. Meanwhile we use the fractal dimension of data for a better estimation of SSLHMM. Our result is encouraging and it shows a better performance comparing with other methods.