Recently
there have been an increasing interest in knowledge discovery using fractal
dimension and self-similarity. Due to the new observations, finding and
discoveries in several domains such as network data, multimedia databases,
geographical datasets, Internet user behavior and individual shopping
behavior; using fractal dimension and self-similarity for data mining
seems inevitable. In this paper we intent to review the current state
of the art techniques of using fractal dimension and self similarity for
data mining in brief and to point out on new challenges in this field.
We review on several recent data mining techniques using of such phenomenon
including our works on Self-Similar Layered Hidden Markov Model and early
stage result of sequential association rules. Both provided techniques
has been investigated in the context of Network data and Patient data
in Critical Care Units.
Introduction
Almost all biological systems contain self-similar structures that are
made through recurrent processes, while many physical systems contain
a form of functional self-similarity that owes its richness to recursion.
Ant colonies, human brains, economic markets, network data and nature
create enormously complex behavior that is much richer than the behavior
of the individual component units. In time series analysis, self-similar
phenomenon show structural similarities across all or at least a wide
range of time scales. In geometry self-similarity comes with the term
fractal. Fractals have two interesting features. First they are self-similar
on multiple scales, in that a small portion of fractal will often look
similar to the whole object or the whole structure. Second, fractals have
a fractional dimension, as opposed to an integer dimension that idealized
objects or structures have. As all fractals are self-similar, they have
a built-in form of recursion. The first type of fractals can typically
be defines by a program-like specification, while the second type of fractal
is usually related to a random or stochastic process. The typical assumption
in many data mining techniques is that data is uniformly distributed,
with independence between attributes. However, real data sets disobey
these assumptions; rather, they typically are exhibit fractal dimensionalities
that are much lower than their embedding dimension. Recently there have
been several attempt to apply data mining techniques through fractal dimensions
and self-similar characteristics in different domains including: dimensionality
curse, using self-similar characteristics to mine databases and discover
hidden patterns such as association rules, clustering, classification
modeling and finding outliers.
Network
Data
In this body of work we study, analyze and discuss the exploitation of
self-similar structure embedded in Network data for prediction through
Self Similar Layered Hidden Markov Model (SSLHMM). SSLHMM is a novel alternative
of Hidden Markov Models (HMM) which proven to be useful in a variety of
real world applications. SSLHMM leverage HMM power and extend such capability
to self-similar structures and exploit this property to reduce the complexity
of predictive modeling process. We show that SSLHMM approach can captures
self-similar information and provides more accurate and interpretable
model comparing to conventional techniques and one-step, flat method for
model constructions such as HMM. 1 Introduction In recent years there
have been a handful of research and development for traffic modeling and
forecasting the Network data. Recent measurements of local-area and wide-area
traffic have shown that network traffic exhibits variability at a wide
range of time scales. For instance, analysis of traffic data from networks
and services such as ISDN traffic, Ethernet LAN’s, Common Channel Signaling
Network (CCNS) and Variable Bit Rate (VBR) video have all convincingly
demonstrated the presence of features such as self-similarity, long range
dependence, slowly decaying variances, heavy-tailed distributions and
fractal dimensions. Through past decade, different models suggested for
Network behavior analysis. Although several works attempt to analyze,
model and predict Network data such as Markov Model, but there are only
a few remarkable works which have shown successful result and the problem
of modeling Network data still is an open issue. While there have been
much effort on observing self-similar structures in scientific databases
and natural structures there are few works on using self-similar structure
and fractal dimension for data mining , predictive modeling and forecasting.
Recently there have been several attempts to apply data mining techniques
through fractal dimensions and self-similar characteristics in different
domains. Among these works, using fractal dimension and self-similarity
for managing the dimensionally curse, learning association rules and application
in spatial joint are considerable. In this paper we study and analyze
the application of Self Similar Layered Hidden Markov Model (SSLHMM) introduced
by Adibi and Shen in for predictive modeling of Network data. Meanwhile
we use the fractal dimension of data for a better estimation of SSLHMM.
Our result is encouraging and it shows a better performance comparing
with other methods.