Evaluating the predictive accuracy of the NB model

9 views
Skip to first unread message

potato_head

unread,
Apr 16, 2016, 1:05:17 PM4/16/16
to scikit-image

Please guys,

What am I doing wrong with using scikitlearn from nltk to check the accuracy of the naive bayes classifier?


...readFile definition not needed 
#divide the data into training and testing sets
data = readFile('Data_test/')
training_set = list_nltk[:2000000]
testing_set = list_nltk[2000000:]

#applied Bag of words as a way to select and extract feature
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(training_set.split('\n'))

#apply tfd
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)

#Train the data
clf = MultinomialNB().fit(X_train_tf, training_set.split('\n'))

#now test the accuracy of the naive bayes classifier
test_data_features = count_vect.transform(testing_set)
X_new_tfidf = tf_transformer.transform(test_data_features)

predicted = clf.predict(X_new_tfidf)
print "%.3f" % nltk.classify.accuracy(clf, predicted)


The problem is when I print the nltk.classify.accuracy, it takes forever and I am suspecting this is because I have done something wrong but since I get no error, I can't figure out what it is that is wrong. I would really appreciate any pointer.

Stéfan van der Walt

unread,
Apr 16, 2016, 2:31:51 PM4/16/16
to scikit-image
Sorry, wrong mailing list.
> --
> You received this message because you are subscribed to the Google Groups
> "scikit-image" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scikit-image...@googlegroups.com.
> To post to this group, send email to scikit...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/scikit-image/acb93ac9-a262-40f0-ac4d-3d710a4c0053%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages