running path_similarity in parallel fails

142 views
Skip to first unread message

rtwomey

unread,
Apr 4, 2013, 7:32:25 PM4/4/13
to nltk-...@googlegroups.com
Hi all, 

Apologies for the formatting, this is my first post.

I'm searching a set of sentences to find those that match search terms with a path_similarity() value above a threshold. This works great as a linear, single-process search, though it is rather slow. For medium-sized datasets (1500 sentences) it takes about a minute. I'm working on a real-time interactive project.

In the attached example code, the function to find related sentences, findRelatedMP(), succeeds when I run it with only one process (serial execution):

rtwomeys-work-object-4:nltk2.0 rtwomey$ python find_related_exampleMP.py 

sentences:  ["OK.  I've got a ham sandwich here.  Is it energy or is it information?", 'where in the field of activity can you derive pleasure.', 'I keep wanting to run into kate.', 'school made more sense in high school.', "oil, chemical and atomic workers int'l union.", 'find some teeth, buddy.', 'donald judd had teeth.', "there's a part of me that goes out and meets something in each of these things that I see.  why am I so eager to identify?", "I don't think it's ever been exhausted, that sense of potential.", 'life as an uncoverable aesthetic phenomenon.'] 

search terms:  ['teeth', 'pleasure'] 

searching... * * * 
3 matches:
find some teeth, buddy.
where in the field of activity can you derive pleasure.
donald judd had teeth.

parallel: time elapsed: 1.11793398857

It fails when I try to execute with more than one process:

twomeys-work-object-4:nltk2.0 rtwomey$ python find_related_exampleMP.py 

sentences:  ["OK.  I've got a ham sandwich here.  Is it energy or is it information?", 'where in the field of activity can you derive pleasure.', 'I keep wanting to run into kate.', 'school made more sense in high school.', "oil, chemical and atomic workers int'l union.", 'find some teeth, buddy.', 'donald judd had teeth.', "there's a part of me that goes out and meets something in each of these things that I see.  why am I so eager to identify?", "I don't think it's ever been exhausted, that sense of potential.", 'life as an uncoverable aesthetic phenomenon.'] 

search terms:  ['teeth', 'pleasure'] 

searching...Process Process-1:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "find_related_exampleMP.py", line 42, in worker
    result = findRelatedStatementsSyns(sent, term_syns)
  File "find_related_exampleMP.py", line 27, in findRelatedStatementsSyns
    if checkSyns(word, term_syns, wn.NOUN):
  File "find_related_exampleMP.py", line 13, in checkSyns
    for syns in wn.synsets(word):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1201, in synsets
    for offset in index[form].get(p, [])]
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1059, in _synset_from_pos_and_offset
    synset = self._synset_from_pos_and_line(pos, data_file_line)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1159, in _synset_from_pos_and_line
    raise WordNetError('line %r: %s' % (data_file_line, e))
WordNetError: line 'n 0000 ~ 05808794 n 0000 | the cognitive processes involved in producing and understanding linguistic communication; "he didn\'t have the language to express his feelings"  \n': invalid literal for int() with base 10: 'n'
 + +^C
Traceback (most recent call last):
  File "find_related_exampleMP.py", line 133, in <module>
    main()
  File "find_related_exampleMP.py", line 117, in main
    results = findrelatedMP(sentences, 2, term_syns)
  File "find_related_exampleMP.py", line 70, in findrelatedMP
    resultdict.update(out_q.get())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 117, in get
    res = self._recv()
 
Has anyone used the synset path_similiarity() functions in a parallel fashion with either the Python multiprocessing or multithreading modules? 

I've worked through examples of multiprocessing to do numerical calculations and they are fine. 

Thanks!

Robert

find_related_exampleMP.py

Steven Bird

unread,
Apr 5, 2013, 6:02:26 AM4/5/13
to nltk-...@googlegroups.com
Perhaps the wordnet corpus reader is not thread-safe. You might like to discuss this on the nltk-dev mailing list.

-Steven
--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages