to ISI Home Page
isd home
About ISD
education at isd
employment
environment
news
people
research
AI Seminars
div3admin

environment
Chun-Nan Hsu
Institute of Information Science, Academia Sinica
donotspam.chunnan@iis.sinica.edu.tw
http://www.iis.sinica.edu.tw/~chunnan


"Why Discretization Works For Naive Bayesian Classifiers"

6/28/2000: [time not recorded]
[location not recorded]

Abstract: This paper explains why well-known discretization methods, such as entropy-based and ten-bin, work well for naive Bayesian classifiers with continuous variables, regardless of their complexities. These methods usually assume that discretized variables have Dirichlet priors. Since perfect aggregation holds for Dirichlets, we can show that, generally, a wide variety of discretization methods can perform well with insignificant difference. We identify situations where discretization may cause performance degradation and show that they are unlikely to happen for well-known methods. We empirically test our explanation with synthesized and real data sets and obtain confirming results. Our analysis leads to a lazy discretization method that can simplify the training for naive Bayes. This new method can perform as well as well-known methods in our experiment.


Last updated: Mon Jun 19 17:44:06 2006

 

 

 

 

 
USC Home Page ISI Home Page