LdaMulticore spawning #workers processes but using a single processor

Stephen Wu

unread,

Jun 18, 2015, 5:02:47 PM6/18/15

to gen...@googlegroups.com

I'm running on a machine with 16 cores. LdaMulticore seems to recognize that I have 16 cores and by default starts 16 workers. However, all the workers are divvying up work on the same processor. So on my 900k-document corpus, this is taking a while.

I had a few hypotheses about why this was the case and talked to others about some of these. So far, I don't think the culprit is any of the below but I could be wrong:

I wrapped LdaMulticore in a custom scikit-learn estimator, and this estimator does give real results after being trained.
I am running on a 900k-doc corpus that sits in memory at about 10+GB
I'm kicking it off within iPython within a screen session
I've tested running a few other Python processes, and they all use the same CPU. E.g., I'm trying to parse wikipedia using gensim, and its worker(s) also use the same CPU.

Any help appreciated.

Stephen Wu

unread,

Jun 19, 2015, 12:21:16 PM6/19/15

to gen...@googlegroups.com

I killed the processes and reran them with no/minimal changes and parallelization is working just fine. Unclear why, which is a bit unsatisfying after several hours of digging.

Leading hypothesis: this was probably some OS-level thing, e.g., processes might have wanted to stay on the same processor to make use of caches efficiently.

stephen

Radim Řehůřek

unread,

Jun 19, 2015, 12:39:53 PM6/19/15

to gen...@googlegroups.com, ste...@trapit.com

Hello Stephen,

do you happen to have a log from when things didn't work (INFO level, or preferably DEBUG)?

I'm thinking maybe one of the processes failed / died for some reason, and the multiprocessing didn't recover. If that's the case, there should be a stack trace in the log.

Just a wild hypothesis :)

Radim

Stephen Wu

unread,

Jun 19, 2015, 1:34:33 PM6/19/15

to gen...@googlegroups.com

Thanks for following up. I haven't actually gotten the training to work in the end, so I'd welcome you looking at the issue!

I didn't see anything notable in INFO but unfortunately I don't have the logs for LdaMulticore. I was running make_wiki simultaneously, though, and it was trying to do everything on the same core that LdaMulticore was -- so maybe there's something in that. The make_wiki process would have completed but was just going really slow. Below is the fairly normal INFO output of make_wiki, and where I cut it off.

stephen

2015-06-18 10:17:54,373 : INFO : adding document #2990000 to Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250', u'soestdijk', u'phintella']...)
2015-06-18 10:20:31,873 : INFO : discarding 37835 tokens: [(u'giravee', 1), (u'actuariesindia', 1), (u'wonho', 1), (u'nerdocrumbesia', 1), (u'jidova', 1), (u'alfredomacias', 1), (u'ysa\u04f1e', 1), (u'saraldi', 1), (u'belvilacqua', 1), (u'cargharay', 1)]...
2015-06-18 10:20:31,879 : INFO : keeping 2000000 tokens which were in no less than 0 and no more than 3000000 (=100.0%) documents
2015-06-18 10:20:43,771 : INFO : resulting dictionary: Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250', u'soestdijk', u'phintella']...)
2015-06-18 10:20:43,940 : INFO : adding document #3000000 to Dictionary(2000000 unique tokens: [u'tripolitan', u'ftdna', u'fi\u0250', u'soestdijk', u'phintella']...)^C
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/swu/trapit/research/.virt/lib/python2.7/site-packages/gensim/scripts/make_wiki.py", line 83, in <module>
wiki = WikiCorpus(inp, lemmatize=lemmatize) # takes about 9h on a macbook pro, for 3.5m articles (june 2011)
File "/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/wikicorpus.py", line 270, in __init__
self.dictionary = Dictionary(self.get_texts())
File "/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py", line 58, in __init__
self.add_documents(documents, prune_at=prune_at)
File "/home/swu/trapit/research/.virt/local/lib/python2.7/site-packages/gensim/corpora/dictionary.py", line 124, in add_documents
logger.info("adding document #%i to %s", docno, self)
File "/usr/lib/python2.7/logging/__init__.py", line 1140, in info
self._log(INFO, msg, args, **kwargs)
File "/usr/lib/python2.7/logging/__init__.py", line 1258, in _log
self.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1268, in handle
self.callHandlers(record)
File "/usr/lib/python2.7/logging/__init__.py", line 1308, in callHandlers
hdlr.handle(record)
File "/usr/lib/python2.7/logging/__init__.py", line 748, in handle
self.emit(record)
File "/usr/lib/python2.7/logging/__init__.py", line 867, in emit
stream.write(fs % msg)
KeyboardInterrupt
Process PoolWorker-15:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 85, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 374, in get
racquire()

--
You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/2pYRRDaFriY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ode...@berkeley.edu

unread,

Jun 25, 2015, 12:01:18 AM6/25/15

to gen...@googlegroups.com

Hello,

I'm having the same problem and would also really appreciate some help.

Checking "ps -F -A | grep NameOfMyProgram" shows that Gensim is spawning the correct number of processes by default, but that they are all on the same processor (I'm on a 24 core Red Hat machine). I'm running inside a virtual environment, but it looks like that shouldn't effect things and when I launched from outside the virtual environment processes ran on 4 cores, which was better, but still not good. Note, I think I'm calling Gensim correctly as it does distribute to the two cores on my laptop when I run the same code there.

Any help or suggestions are really appreciated, as I'm not really sure where to go from here.

Thanks.

Orianna

Stephen Wu

unread,

Jun 25, 2015, 12:01:40 PM6/25/15

to gen...@googlegroups.com

Interesting, Orianna. My problem does reappear as well -- shutting down processes and restarting them doesn't always work. Also, I suspect that some of the methods may end up jumping on the same core later on in processing? Could be totally wrong about that. Radim, is there gensim-specific logging that you're looking for?

stephen

ode...@berkeley.edu

unread,

Jun 25, 2015, 4:45:25 PM6/25/15

to gen...@googlegroups.com

Hi,

Yes, this is a very unfortunate problem that I'll be happy to fix.

Ok, so I double checked that running in the virtual environment isn't causing any problems. When I run outside I also get 26 processes allocating to one processor ( I have 24 processors). The output of ps looks like:

>> ps -F -A

UID PID PPID C SZ RSS PSR STIME TTY TIME CMD

[*snip*]

odemasi 61669 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61670 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61671 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61672 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61673 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61674 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61675 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61676 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61681 59981 0 2738764 9821680 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61682 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61683 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61684 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61685 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61686 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61687 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61688 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61689 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61694 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61698 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61699 59981 0 2738764 9821704 23 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61700 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61701 59981 0 2738764 9821704 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61702 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python RunLDA.py 2

odemasi 61703 59981 0 2738764 9821696 14 03:42 pts/5 00:00:00 python RunLDA.py 2

[*snip*]

The standard out that I'm getting is:

/home/odemasi/Packages/venv/lib/python2.6/site-packages/numpy/lib/utils.py:95: DeprecationWarning: `scipy.sparse.sparsetools` is deprecated!

scipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.

warnings.warn(depdoc, DeprecationWarning)

/home/odemasi/Packages/venv/lib/python2.6/site-packages/scipy/lib/_util.py:67: DeprecationWarning: Module scipy.linalg.blas.fblas is deprecated, use scipy.linalg.blas instead

DeprecationWarning)

2015-06-25 03:36:38,835 : INFO : adding document #0 to Dictionary(0 unique tokens: [])

2015-06-25 03:39:34,893 : INFO : built Dictionary(5060602 unique tokens: [u'loyalsubscribers', u'iftheyclosedchipotleiddie', u'\u666e\u6bb5\u306e\u53e3\u8abf\u3067\u4f55\u6ce3\u3044\u3066\u308b\u3093\u3067\u3059\u304b\u79c1\u306f\u3069\u3053\u306b\u3082\u884c\u304d\u307e\u305b\u3093\u304b\u3089\u5927\u4e08\u592b\u3067\u3059\u3092\u8a00\u3046', u'deargodmakeatrade', u'billycorgan']...) from 1 documents (total 5060602 corpus positions)

2015-06-25 03:39:36,283 : INFO : using symmetric alpha at 0.01

2015-06-25 03:39:36,283 : INFO : using serial LDA version on this node

2015-06-25 03:42:20,479 : WARNING : input corpus stream has no len(); counting documents

2015-06-25 03:42:25,018 : INFO : running online LDA training, 100 topics, 1 passes over the supplied corpus of 100000 documents, updating every 48000 documents, evaluating every ~100000 documents, iterating 50x with a convergence threshold of 0.001000

2015-06-25 03:42:25,018 : WARNING : too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy

2015-06-25 03:42:25,023 : INFO : training LDA model using 24 processes

2015-06-25 03:42:27,407 : INFO : PROGRESS: pass 0, dispatched chunk #0 = documents up to #2000/100000, outstanding queue size 1

Traceback (most recent call last):

File "/usr/lib64/python2.6/multiprocessing/queues.py", line 242, in _feed

send(obj)

SystemError: NULL result without error in PyObject_Call

2015-06-25 03:42:30,449 : INFO : PROGRESS: pass 0, dispatched chunk #1 = documents up to #4000/100000, outstanding queue size 2

2015-06-25 03:42:30,612 : INFO : PROGRESS: pass 0, dispatched chunk #2 = documents up to #6000/100000, outstanding queue size 3

2015-06-25 03:42:30,793 : INFO : PROGRESS: pass 0, dispatched chunk #3 = documents up to #8000/100000, outstanding queue size 4

A little more about my application: each document is very tiny and right now I'm constraining the training to 100,000 documents. It takes < 1min to load and stream through the data. I know that running with this little data won't give me much performance gain, but until I can get it dispersing the work I can't run withe more data. The process has already been running for 17 hours, and that seems like a ridiculously long time for a corpus that is a few MB (9 million documents is ~1.5GB).

Any suggestions of what to check next?

Thanks!
Orianna

ode...@berkeley.edu

unread,

Jun 26, 2015, 2:16:17 PM6/26/15

to gen...@googlegroups.com

Hi Stephen,

tl;dr: I'm hoping it's just a problem with openBLAS declaring task affinity to the processor the job is launched from, but I can't resolve the issue with the fixes I found online, so I'm sharing with you in hopes you have brighter ideas than I had.

Are your scipy and numpy also compiled against OpenBLAS or GotoBLAS? I think that's what I'm working with (OpenBLAS) and it seems that other people have also had trouble getting multiple processes to associate with different cores in Python. In particular, I was looking at the following and it looked like it pertained to our problem:

http://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy

http://bugs.python.org/issue17038#msg180663

I tried both launching gensim with:

export OPENBLAS_MAIN_FREE=1

python myLDAscript.py

and by putting

import os

os.system('taskset -p 0xffffffff %d' % os.getpid()) # also tried os.system('taskset -p 0xff %d' % os.getpid())

at the begining of myLDAscript.py. Sometimes that gave me a memory error, so I took it back out:

2015-06-26 00:21:20,469 : INFO : using symmetric alpha at 0.01

2015-06-26 00:21:20,469 : INFO : using serial LDA version on this node

Traceback (most recent call last):

File "RunLDA_copy.py", line 52, in <module>

lda = models.ldamulticore.LdaMulticore(corpus_memory_friendly, id2word=dictionary, num_topics=NUMTOPICS, workers=None)

File "/usr/lib/python2.6/site-packages/gensim/models/ldamulticore.py", line 141, in __init__

gamma_threshold=gamma_threshold)

File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line 313, in __init__

self.sync_state()

File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line 326, in sync_state

self.expElogbeta = numpy.exp(self.state.get_Elogbeta())

File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line 161, in get_Elogbeta

return dirichlet_expectation(self.get_lambda())

File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line 157, in get_lambda

return self.eta + self.sstats

MemoryError

or

2015-06-26 00:22:24,037 : INFO : using symmetric alpha at 0.01

2015-06-26 00:22:24,037 : INFO : using serial LDA version on this node

Traceback (most recent call last):

File "RunLDA_copy2.py", line 52, in <module>

lda = models.ldamulticore.LdaMulticore(corpus_memory_friendly, id2word=dictionary, num_topics=NUMTOPICS, workers=None)

File "/usr/lib/python2.6/site-packages/gensim/models/ldamulticore.py", line 141, in __init__

gamma_threshold=gamma_threshold)

File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line 311, in __init__

self.state = LdaState(self.eta, (self.num_topics, self.num_terms))

File "/usr/lib/python2.6/site-packages/gensim/models/ldamodel.py", line 79, in __init__

self.sstats = numpy.zeros(shape)

MemoryError

I tried editing gensim/utils.py, gensim/matutils.py and put os.system('taskset -p 0xff %d' % os.getpid()) after the imports in there, but that didn't seem to fix things either, so I took it out. I did try running the toy script (with an svd at the heart of the loop) from the stackoverflow above. It ran and distributed to the multiple cores just fine, so I couldn't reproduce the error the user had, even though I'm also running against OpenBLAS. However, gensim still won't work, so I tried the fixes above to no avail.

After all that, I was inspired by http://xcorr.net/2013/05/19/python-refuses-to-use-multiple-cores-solution/ and tried following that by putting

import numpy

import scipy

import affinity

import multiprocessing

affinity.set_process_affinity_mask(0,2**multiprocessing.cpu_count()-1)

at the header of myLDAscript.py. That also didn't work.

On a realted note, I also tried to get the distributed gensim running on my machine, but, well, that didn't go too well. If you got this working and have any suggestions, it would be great.

I'm at my wits' end. If you have any thoughts I'd love to hear them, otherwise I might switch to another package that my team has used before. Thanks!

Orianna

http://stackoverflow.com/questions/12592018/multiprocessing-pool-processes-locked-to-a-single-core

ode...@berkeley.edu

unread,

Jun 26, 2015, 2:28:09 PM6/26/15

to gen...@googlegroups.com

I just made the vocabulary smaller and now it seems to be distributing, and even more important, flying. I set the OPENBLAS_MAIN_FREE environment variable and nothing else.

Cloud Marked

unread,

Dec 6, 2016, 12:52:29 PM12/6/16

to gensim

Has anyone ever figured out what the problem was? I am seeing identical behavior with only one difference. The error message is different. Instead of complaining about a NULL, the error message is about the length being invalid. All the other symptoms are same, processes get spawned but all except 1 are doing nothing. I did reset affinity in the code, and set OPENBLAS_MAIN_FREE=1

If anyone has figured it out, please, share.

Reply all

Reply to author

Forward