Chun-Nan Hsu
Institute of Information Science, Academia Sinica
donotspam.chunnan@iis.sinica.edu.tw
http://www.iis.sinica.edu.tw/~chunnan
"Why Discretization Works For Naive Bayesian Classifiers"
6/28/2000: [time not recorded]
[location not recorded]
Abstract: This paper explains why well-known discretization methods, such as
entropy-based and ten-bin, work well for naive Bayesian classifiers
with continuous variables, regardless of their complexities. These
methods usually assume that discretized variables have Dirichlet priors.
Since perfect aggregation holds for Dirichlets, we can show that,
generally, a wide variety of discretization methods can perform well
with insignificant difference. We identify situations where
discretization may cause performance degradation and show that they
are unlikely to happen for well-known methods. We empirically test our
explanation with synthesized and real data sets and obtain confirming results.
Our analysis leads to a lazy discretization method that can simplify
the training for naive Bayes. This new method can perform as well as
well-known methods in our experiment.
Last updated: Mon Jun 19 17:44:06 2006
 |