How to use jug with a classifier

19 views
Skip to first unread message

Negacy Hailu

unread,
Jun 29, 2015, 12:21:47 PM6/29/15
to jug-...@googlegroups.com
I need to run a classifier for a fairly large text. My task is sentiment analysis. I am following chapter 6 of the book entitled "Building Machine Learning Systems with Python" by Luis Pedro Coelho and Willi Richert.

Currently, when I run my program for the whole dataset, I am having a memory issue. I need to run the program on multiple cores. 

I can load the dataset into numpy X and y arrays with no memory issue error. The error is raised when I called the training model method. So, I added the TaskGenrator annotation at the top if the train_model function. The remaining of the code the same as the original code, which is available here

@TaskGenerator
def train_model(clf_factory, X, Y, name="NB ngram", plot=False):
    cv = ShuffleSplit(
        n=len(X), n_iter=10, test_size=0.3, random_state=0)

    train_errors = []
    test_errors = []

    scores = []
    pr_scores = []
    precisions, recalls, thresholds = [], [], []

   ...

  
    
    return np.mean(train_errors), np.mean(test_errors)

Is there anything I missed here? 

Thanks.

Luis Pedro Coelho

unread,
Jun 29, 2015, 12:32:05 PM6/29/15
to jug-...@googlegroups.com
Thanks for the message, I hope you are enjoying the book

I don't really know if there is a solution to the problem. Even if you have enough memory to run program in a single core you might need N times that amount to run N cores at the same time.

if Scikit learn has the ability to use multiple cores in the classifier you're training ( often through the argument called n_jobs), then that might be a much better solution than trying to do it through jug, Although you can still use jug for memoization, of course.

Do make sure you're running an up-to-date version of scikit learn, its memory usage has gone down dramatically in the last few versions .

Sorry, I don't know if there's much more that can be done for your case.

Luis
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Negacy Hailu

unread,
Jun 29, 2015, 12:36:49 PM6/29/15
to jug-...@googlegroups.com
Interesting!

Let me check if scikit-learn supports multicore. I will be back to you with what I got from there.

As for the book, yes, I am enjoying it. Really, a practical book.

Thanks.

N.
Reply all
Reply to author
Forward
0 new messages