I need to run a classifier for a fairly large text. My task is sentiment analysis. I am following chapter 6 of the book entitled "Building Machine Learning Systems with Python" by Luis Pedro Coelho and Willi Richert.
Currently, when I run my program for the whole dataset, I am having a memory issue. I need to run the program on multiple cores.
I can load the dataset into numpy X and y arrays with no memory issue error. The error is raised when I called the training model method. So, I added the TaskGenrator annotation at the top if the train_model function. The remaining of the code the same as the original code, which is available
here:
@TaskGenerator
def train_model(clf_factory, X, Y, name="NB ngram", plot=False):
cv = ShuffleSplit(
n=len(X), n_iter=10, test_size=0.3, random_state=0)
train_errors = []
test_errors = []
scores = []
pr_scores = []
precisions, recalls, thresholds = [], [], []
...
return np.mean(train_errors), np.mean(test_errors)
Is there anything I missed here?
Thanks.