Re: Naive bayes in PyMC

Chris Fonnesbeck

unread,

Apr 10, 2013, 10:02:39 PM4/10/13

to py...@googlegroups.com

Hi Tommy,

PyMC is designed to fit models using MCMC methods, which are not necessary for naive Bayes. Scikit-learn has an implementation that is pretty straightforward:

http://scikit-learn.org/0.12/modules/naive_bayes.html

If you want to get started with PyMC, I recommend looking at the User's Guide and at the multitude of examples on the PyMC wiki:

https://github.com/pymc-devs/pymc/wiki

Not to say that you could not implement one in PyMC -- maybe someone has -- but it would be overkill to do so.

John Salvatier

unread,

Apr 10, 2013, 10:07:19 PM4/10/13

to py...@googlegroups.com

Also, it would be somewhat non-trivial because Naive bayes is not a Bayesian model (despite the name IIRC), so the implementation would not be completely straightforward.

However, you might try doing logistic regression, which is somewhat similar to Naive bayes and which is a bayesian model.

Rhiannon Weaver

unread,

Apr 11, 2013, 8:00:44 AM4/11/13

to <pymc@googlegroups.com>

A quick google search finds the following which may provide some enlightenment.

http://lingpipe-blog.com/2009/10/02/bayesian-naive-bayes-aka-dirichlet-multinomial-classifiers/

Naive Bayes, if I recall, basically uses a plug-in point estimate of the conditional probabilities in a Bayes net, based on an independence assumption. In order to be Bayesian, those point estimates need to be replaced by a proper prior. Hence the use of the Dirichlet distribution.

-Rhiannon

On Apr 11, 2013, at 3:23 AM, Tommy Engström wrote:

Could you please explain why naive bayes is not a bayesian model?

Den torsdagen den 11:e april 2013 kl. 04:07:19 UTC+2 skrev John Salvatier:

Also, it would be somewhat non-trivial because Naive bayes is not a Bayesian model (despite the name IIRC), so the implementation would not be completely straightforward.

However, you might try doing logistic regression, which is somewhat similar to Naive bayes and which is a bayesian model.

--
You received this message because you are subscribed to the Google Groups "PyMC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pymc+uns...@googlegroups.com.
To post to this group, send email to py...@googlegroups.com.
Visit this group at http://groups.google.com/group/pymc?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Abraham D. Flaxman

unread,

Apr 13, 2013, 7:08:33 PM4/13/13

to py...@googlegroups.com

As Chris said it PyMC is not necessary for naïve Bayes, but I agree with Tommy that this is no reason not to use it. Here is a simple model that does the trick:

n_test = len(data['X_test'])

y = [mc.Bernoulli('y_%d'%i, .5) for i in range(n_test)]

alpha = empty(p)

beta = empty(p)

for j in range(p):

# alpha[j] is Pr[X_j = 1 | y = 1] in training data

alpha[j] = (data['X_train'][:,j] * data['y_train']).sum() / data['y_train'].sum()

# beta[j] is Pr[X_j = 1 | y = 0] in training data

beta[j] = (data['X_train'][:,j] * (1-data['y_train'])).sum() / (1-data['y_train']).sum()

X = [mc.Bernoulli('X_%d_%d'%(i,j), alpha[j]*y[i]+beta[j]*(1-y[i]), value=data['X_test'][i,j], observed=True) for i in range(n_test) for j in range(p)]

And here is an ipython notebook that takes it for a test drive:

http://nbviewer.ipython.org/5380476

--Abie

Abraham D. Flaxman

Assistant Professor

Institute for Health Metrics and Evaluation | University of Washington

2301 5th Avenue, Suite 600 | Seattle, WA 98121| USA

Tel: +1-206-897-2800 | Fax: +1-206-897-2899 UW

ab...@uw.edu | http://healthmetricsandevaluation.org | http://healthyalgorithms.com

From: py...@googlegroups.com [mailto:py...@googlegroups.com] On Behalf Of Tommy Engström
Sent: Thursday, April 11, 2013 12:22 AM
To: py...@googlegroups.com
Subject: [pymc] Re: Naive bayes in PyMC

Thank you for your answer Chris.

I realize that there are other tools that are much easier to use for implementing NB. My idea however was to implement it in part as an exercise and in part as a base model to compare future models against. I indent to improve the model later on.

Reading the PyMC tutorial and skimming the wiki I still don't understand how to "bind" observations together. Do need to make something like an observation function that takes a random number representing a sample and sets the individual attribute-observation-variables? Is there a smarter way?

--

Reply all

Reply to author

Forward