NLTK naive bayes raw probabilities instead of labels

1,670 views
Skip to first unread message

J

unread,
Nov 28, 2013, 5:50:06 PM11/28/13
to nltk-...@googlegroups.com
Hi,

For the NLTK naive bayes I was wondering if there was a way to return the raw probabilities (and not just the label)?

Alex Rudnick

unread,
Nov 28, 2013, 7:23:26 PM11/28/13
to nltk-...@googlegroups.com
Yes!

The function you want there is prob_classify instead of just classify.
It returns a distribution over the different possible labels.
--
-- alexr

J

unread,
Nov 28, 2013, 7:49:05 PM11/28/13
to nltk-...@googlegroups.com
Hm, it's just giving me:
 <ProbDist with 2 samples>

Alex Rudnick

unread,
Nov 28, 2013, 8:01:07 PM11/28/13
to nltk-...@googlegroups.com
Right! That's a ProbDist object. It's basically a mapping from labels
to probabilities.

You can ask it what the possible labels are with .samples() and get
the probability for a given label with .prob(). For example, consider
this two-way classification task:

>>> import nltk
>>> examples = [({"blue":True, "red":False},"BlueOne"), ({"blue":False,"red":True},"RedOne")]
>>> classifier = nltk.classify.naivebayes.NaiveBayesClassifier.train(examples)
>>> classifier.prob_classify({"blue":True,"red":False})
<ProbDist with 2 samples>
>>> dist = classifier.prob_classify({"blue":True,"red":False})
>>> list(dist.samples())
['BlueOne', 'RedOne']
>>> dist.prob("BlueOne")
0.9

On Thu, Nov 28, 2013 at 7:49 PM, J <jrubi...@gmail.com> wrote:
> Hm, it's just giving me:
> <ProbDist with 2 samples>

--
-- alexr

J

unread,
Nov 28, 2013, 8:17:21 PM11/28/13
to nltk-...@googlegroups.com
okay. I'm confused as my probabilities aren't adding to 1:

dist = classifier.prob_classify(dialogue_act_features(new_row[28]))
print(list(dist.samples()))
prob_one = dist.prob("1")
prob_zero = dist.prob("0")
print prob_one
print prob_zero


['1', '0']
2.10487845223e-140
4.40636834365e-194
Reply all
Reply to author
Forward
0 new messages