Insufficient training data for Naive Bayes in practise

Craig (Mingtao) Zhang

unread,

Feb 1, 2013, 8:41:40 PM2/1/13

to 10-701-spri...@googlegroups.com

In Page 43 of "2 StatisticsPPT", there is an explanation about insufficient training data for NB.

How should we solve this problem in practice?

Could ww omit this feature for all labels' prediction for this instance?

Craig

Xuezhi Wang

unread,

Feb 2, 2013, 4:47:47 AM2/2/13

to 10-701-spri...@googlegroups.com

Actually you can use priors to solve this problem.

For example, if you are using uniform prior under discrete feature case, then you will end up adding a pseudo count to each feature & class, which solves the problem of seeing zero because now you have at least 1.

Krikamol Muandet

unread,

Feb 2, 2013, 4:55:05 AM2/2/13

to Xuezhi Wang, 10-701-spri...@googlegroups.com

Laplace smoothing, i.e., adding one to each feature, is a simple heuristic, but may not work well in practice. You might want to check out the Good-Turing estimation, which is a golden standard for estimating the probability of unseen instance. See http://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation and references therein.

Krik

--
http://alex.smola.org/teaching/cmu2013-10-701 (course website)
http://www.youtube.com/playlist?list=PLZSO_6-bSqHQmMKwWVvYwKreGu4b4kMU9 (YouTube playlist)
---
You received this message because you are subscribed to the Google Groups "10-701 Spring 2013 CMU" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 10-701-spring-201...@googlegroups.com.
To post to this group, send email to 10-701-spri...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Krikamol Muandet
PhD Student
Max Planck Institute for Intelligent Systems
Spemannstrasse 38, 72076 Tübingen, Germany
Telephone: +49-(0)7071 601 554
http://www.kyb.mpg.de/~krikamol

Craig (Mingtao) Zhang

unread,

Feb 2, 2013, 5:25:18 PM2/2/13

to 10-701-spri...@googlegroups.com

That's helpful!

A follow up question:

In continuous case, what could we do if we don't have the data for one feature for one instance?

Could we just assume it's 1?

Craig

On Friday, February 1, 2013 8:41:40 PM UTC-5, Craig (Mingtao) Zhang wrote:

Krikamol Muandet

unread,

Feb 2, 2013, 5:41:49 PM2/2/13

to Craig (Mingtao) Zhang, 10-701-spri...@googlegroups.com

That's a good question. There have actually been a number of works that devote to answering your questions. It is called missing data problem. A good reference is the following book:

Rubin, Donald B.; Little, Roderick J. A. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley. ISBN 0-471-18386-5.

Assuming a particular value, e.g., 1, for the missing feature is a bit dangerous as this also depends on the range of your feature value. The simplest solution I can think of is to replace the missing feature by an average of that feature value from other instances. Whether or not it works well depends on the problem.

Krik

--

http://alex.smola.org/teaching/cmu2013-10-701 (course website)
http://www.youtube.com/playlist?list=PLZSO_6-bSqHQmMKwWVvYwKreGu4b4kMU9 (YouTube playlist)
---
You received this message because you are subscribed to the Google Groups "10-701 Spring 2013 CMU" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 10-701-spring-201...@googlegroups.com.
To post to this group, send email to 10-701-spri...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward