Hidden
Markov Models (HMM) have proven to be useful in a variety of real world
applications where considerations for uncertainty are crucial. Such an
advantage can be more leveraged if HMM can be scaled up to deal with complex
problems. In this paper, we introduce, analyze and demonstrate Self-Similar
Layered HMM (SSLHMM), for a certain group of complex problems which show
self-similar property, and exploit this property to reduce the complexity
of model construction. We show how the embedded knowledge of self-similar
structure can be used to reduce the complexity of learning and increase
the accuracy of the learned model. Moreover, we introduce three different
types of self-similarity in SSLHMM, and investigate their performance
in the context of synthetic data and real-world network databases. We
show that SSLHMM has several advantages comparing to conventional HMM
techniques and it is more efficient and accurate than one-step, flat method
for model construction.
There is a vast amount of natural structures and physical systems which
contain self-similar structures that are made through recurrent processes.
To name a few: ocean flows, changes in the yearly flood levels of rivers,
voltages across nerve membranes, musical melodies, human brains, economic
markets, Internet web logs and network data create enormously complex
self-similar data. While there have been much effort on observing self-similar
structures in scientific databases and natural structures, there are few
works on using self-similar structure and fractal dimension for the purpose
of data mining and predictive modeling. Among these works, using fractal
dimension and self-similarity to reduce the dimensionally curse, learning
association rules and applications in spatial joint selectivity in databases
are considerable. In this paper we introduce a novel technique which uses
the self-similar structure for predictive modeling using a Self-Similar
Layered Hidden Markov Model (SSLHMM). Despite the broad range of application
areas shown for classic HMMs, they do have limitations and do not easily
handle problems with certain characteristics. For instance, classic HMM
has difficulties to model complex problems with large states spaces. Among
the recognized limitations, we only focus on complexity of HMM for a certain
category of problems with the following characteristics: 1) The uncertainty
and complexity embedded in these applications make it difficult and impractical
to construct the model in one step. 2) Systems are self-similar, contain
self-similar structures and have been generated through recurrent processes.
For instance, analysis of traffic data from networks and services such
as ISDN traffic, Ethernet LAN’s, Common Channel Signaling Network (CCNS)
and Variable Bit Rate (VBR) video have all convincingly demonstrated the
presence of features such as self-similarity, long range dependence, slowly
decaying variances, heavy-tailed distributions and fractal dimensions.
In a companion paper, Adibi and Shen introduced a domain independent novel
technique to mine sequential databases through Mining by Layered Phases
(MLP) in both discrete and continuous domains. In this work we introduce
a special form of MLP as Self-Similar Layered HMM (SSLHMM) for self-similar
structures. We show how SSLHMM uses the information embedded in a self-similar
structure to reduce the complexity of the problem and learn a more accurate
model than a general HMM. Our result is encouraging and show a significant
improvement when a self-similar data are modeled through SSLHMM in comparison
with HMM.