How to use jug with a classifier

Negacy Hailu

unread,

Jun 29, 2015, 12:21:47 PM6/29/15

to jug-...@googlegroups.com

I need to run a classifier for a fairly large text. My task is sentiment analysis. I am following chapter 6 of the book entitled "Building Machine Learning Systems with Python" by Luis Pedro Coelho and Willi Richert.

Currently, when I run my program for the whole dataset, I am having a memory issue. I need to run the program on multiple cores.

I can load the dataset into numpy X and y arrays with no memory issue error. The error is raised when I called the training model method. So, I added the TaskGenrator annotation at the top if the train_model function. The remaining of the code the same as the original code, which is available here:

@TaskGenerator

def train_model(clf_factory, X, Y, name="NB ngram", plot=False):

cv = ShuffleSplit(

n=len(X), n_iter=10, test_size=0.3, random_state=0)

train_errors = []

test_errors = []

scores = []

pr_scores = []

precisions, recalls, thresholds = [], [], []

...

return np.mean(train_errors), np.mean(test_errors)

Is there anything I missed here?

Thanks.

Luis Pedro Coelho

unread,

Jun 29, 2015, 12:32:05 PM6/29/15

to jug-...@googlegroups.com

Thanks for the message, I hope you are enjoying the book

I don't really know if there is a solution to the problem. Even if you have enough memory to run program in a single core you might need N times that amount to run N cores at the same time.

if Scikit learn has the ability to use multiple cores in the classifier you're training ( often through the argument called n_jobs), then that might be a much better solution than trying to do it through jug, Although you can still use jug for memoization, of course.

Do make sure you're running an up-to-date version of scikit learn, its memory usage has gone down dramatically in the last few versions .

Sorry, I don't know if there's much more that can be done for your case.

Luis

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Negacy Hailu

unread,

Jun 29, 2015, 12:36:49 PM6/29/15

to jug-...@googlegroups.com

Interesting!

Let me check if scikit-learn supports multicore. I will be back to you with what I got from there.

As for the book, yes, I am enjoying it. Really, a practical book.

Thanks.

N.

Reply all

Reply to author

Forward