Unexpected performance decrease when shared objects are locally compiled

62 views
Skip to first unread message

Ferhat Arslan

unread,
Apr 5, 2024, 1:19:48 PMApr 5
to Gensim
Hi all,

I am doing some experiments with the inner workings of the word2vec model.
When I implement my changes to the corresponding files (word2vec.py, word2vec_inner.pyx, word2vec_inner.pxd), create a C file (the exact commands I use are below) and compile it (below) I get a huge decrease in the performance.
I wanted to make sure whether the decrease is due to the changes I introduced. So I created a C file from original pyx and pxd files and compiled it. To my surprise this also led to a huge performance decrease.

For example, for the specific task I have, if I use a single worker, the shipped .so file performs in ~14.4s while the .so file I compiled executes in ~35s. Even worse, when I switch to multiple threads, the .so file I compiled executes in ~125s (!) compared to the original's 10s.

--
Below is how I acquire the C file (based on https://github.com/piskvorky/gensim/blob/develop/setup.py)
[python]
>> import Cython.Build
>> from setuptools import Extension
>> extension = Extension('word2vec_inner', sources=['word2vec_inner.pyx'], language='c', extra_compile_args=[])
>> Cython.Build.cythonize([extension], language_level=3)

And to compile this into a shared object I use (again based on https://github.com/piskvorky/gensim/blob/develop/setup.py)
[bash]
>> gcc -pthread -shared -I/usr/include/python3.10/ -O3 -o word2vec_inner.cpython-310-x86_64-linux-gnu.so -fPIC word2vec_inner.c

Any hints as to what I might be doing wrong is much appreciated.

Kind regards,
Ferhat

Ferhat Arslan

unread,
Apr 6, 2024, 7:32:25 PMApr 6
to Gensim
Hi all,

I was able to solve this problem thanks to an earlier issue on github (https://github.com/piskvorky/gensim/issues/3490) which hinted that problem is more on the Cython's side rather than gensim.

Basically, I uninstalled my Cython and installed an earlier version (to be exact, I used v0.29.37) and, voila, all is well now.

Gordon Mohr

unread,
Apr 10, 2024, 11:59:19 AMApr 10
to Gensim
Thanks for the update & confirmation of a possible workaround.

If by chance you're on a Linux, and can compare the results of builds that are identical except with regard to the varying Cython versions, could you check if the results of `ldd word2vec_inner.so` are the same for both builds? And, if different, let us know (ideally in that issue #3490) the differences?

- Gordon

Reply all
Reply to author
Forward
0 new messages