Compiling a huge cython file

168 views
Skip to first unread message

Shalom Rav

unread,
May 15, 2011, 8:51:45 PM5/15/11
to cython-users
Suppose that one has 1000s of different 'pyx' files. How practical
would it be to 'unite' all of them into one BIG '.so' file?

I am talking about a 'pyx' file that starts like this:
______________________________________
import numpy
cimport numpy
cimport cython
include "moudule1.pyx"
include "moudule2.pyx"
include "moudule3.pyx"
...................................
...................................
include "module300000.pyx"
______________________________________

Is it at all realistic / beneficial? can anyone approximate how long
it would take to compile such a file?

Thanks.

Glenn Hutchings

unread,
May 16, 2011, 3:31:57 AM5/16/11
to cython...@googlegroups.com
On Monday, 16 May 2011 01:51:45 UTC+1, Shalom Rav wrote:
Is it at all realistic / beneficial? can anyone approximate how long
it would take to compile such a file?

One problem is that if you change a single line of one file, you'll have to recompile everything again.  With separate files, it's only the one that changed.

Robert Bradshaw

unread,
May 16, 2011, 11:41:31 AM5/16/11
to cython...@googlegroups.com

300000 times as long as compiling a single file at best, probably a
lot longer (one heuristic is that C compilers are somewhere between
linear and quadratic in the time they take to compile code, and many a
C compiler would have trouble compiling such a large .c output file.
So, not realistic or beneficial--why would you want to do this rather
than write modular code in several .so files?

- Robert

Shalom Rav

unread,
May 16, 2011, 12:00:13 PM5/16/11
to cython-users
Robert,

Here's my situation: I will have a need to use (from within one python
script) up to 10000s of individual '.so' modules.

I have no problem compiling & importing each one, yet, have been told
that python will take a lot of time to load the given number (10000s)
of imports.

Hence, I thought it might be beneficial to 'unite' many '.so' files
together into one BIG '.so' (or perhaps, several BIG '.so' files) --
this way, python will only have 3-4 '.so' files to load.

Do you have any advice on how to go about it in the best way? thanks.

On May 16, 11:41 am, Robert Bradshaw <rober...@math.washington.edu>
wrote:

Robert Bradshaw

unread,
May 16, 2011, 12:59:26 PM5/16/11
to cython...@googlegroups.com
On Mon, May 16, 2011 at 9:00 AM, Shalom Rav <csharppl...@gmail.com> wrote:
> Robert,
>
> Here's my situation: I will have a need to use (from within one python
> script) up to 10000s of individual '.so' modules.
>
> I have no problem compiling & importing each one, yet, have been told
> that python will take a lot of time to load the given number (10000s)
> of imports.

It will also take a non-trivial amount of time to load multi-GB .so
files, not to mention that you can't load only part of it if you only
need part of it.

> Hence, I thought it might be beneficial to 'unite' many '.so' files
> together into one BIG '.so' (or perhaps, several BIG '.so' files) --
> this way, python will only have 3-4 '.so' files to load.
>
> Do you have any advice on how to go about it in the best way? thanks.

Premature optimization is the root of all evil. I'd write whatever
you're trying to write naturally and then worry about something like
this only if it becomes a problem. Unless your modules are absolutely
trivial, I'd imagine you'll spend much less time importing each one
than executing each one. Either you actually use all of them in one
sitting (in which case the total runtime will be huge compared to the
importing (seconds?), or you could only import the one(s) you need
right before you use them (with the obvious runtime and memory
benefits).

- Robert

Shalom Rav

unread,
May 16, 2011, 10:22:56 PM5/16/11
to cython-users
Robert,

I did some simple benchmarking for several of my '.so' modules
(generated by cython). Here's what I got:
______________________________________
'.so' import took: 0.00460481643677 [sec]
function call took: 1.59740447998e-05 [sec]
______________________________________

Obviously, it takes much more time to import the module than to
execute the function. Given the time it takes to load modules,
assuming linear behavior, it would require a wait of ~50[sec] to
import 10,000 modules.

Is there anything that can be done to speed-up the time it takes to
import? (without pre-importing all modules ahead of time...)


On May 16, 12:59 pm, Robert Bradshaw <rober...@math.washington.edu>
wrote:

Robert Bradshaw

unread,
May 17, 2011, 1:49:25 AM5/17/11
to cython...@googlegroups.com
On Mon, May 16, 2011 at 7:22 PM, Shalom Rav <csharppl...@gmail.com> wrote:
> Robert,
>
> I did some simple benchmarking for several of my '.so' modules
> (generated by cython). Here's what I got:
> ______________________________________
> '.so' import took: 0.00460481643677 [sec]
> function call took: 1.59740447998e-05 [sec]
> ______________________________________
>
> Obviously, it takes much more time to import the module than to
> execute the function. Given the time it takes to load modules,
> assuming linear behavior, it would require a wait of ~50[sec] to
> import 10,000 modules.
>
> Is there anything that can be done to speed-up the time it takes to
> import? (without pre-importing all modules ahead of time...)

So, to clarify, each module has one function that's only called once?
Atypical, but in that case, maybe it does make sense to consolidate
them into fewer .pyx files (if it makes sense to statically compile so
many different functions rather than have runtime variation). You
haven't really stated what you're trying to accomplish, that would
help.

Shalom Rav

unread,
May 17, 2011, 9:13:30 AM5/17/11
to cython-users
Robert,

Yes, every module has one function that is called once.

I get the impression that consolidation of 'pyx' files does make
sense. Yet, it will force me to re-compile to '.so' even when just one
new 'pyx' file is added.
Instead of doing so, is there a way to take existing cython-generated
'.so' files and simply 'unite' them together in some way?


On May 17, 1:49 am, Robert Bradshaw <rober...@math.washington.edu>
wrote:

Yury V. Zaytsev

unread,
May 17, 2011, 9:39:26 AM5/17/11
to cython...@googlegroups.com
On Tue, 2011-05-17 at 06:13 -0700, Shalom Rav wrote:

> Instead of doing so, is there a way to take existing cython-generated
> '.so' files and simply 'unite' them together in some way?

In theory you can compile them as static libraries (.a files) and then
link them in one big .so. But it seems to me that you are addressing the
problem backwards.

Maybe you should describe why you have such a weird collection of
functions in the first place and what you want to achieve, then it could
turn out that a totally different approach is more practical.

--
Sincerely yours,
Yury V. Zaytsev


Dag Sverre Seljebotn

unread,
May 18, 2011, 7:30:57 AM5/18/11
to cython...@googlegroups.com
On 05/17/2011 03:13 PM, Shalom Rav wrote:
> Yes, every module has one function that is called once.

But then why isn't using pure Python faster, since that skips Cython and
gcc compilation time? Do you mean the Python process is started
thousands of times, each which call each function once?

> Instead of doing so, is there a way to take existing cython-generated
> '.so' files and simply 'unite' them together in some way?

This is not very well supported. It is not technically impossible, but
involves some ugly hacks etc., since Cython doesn't directly support it.

Keep in mind that compilation has a large constant-time overhead.
Depending on your code, perhaps recompiling 1000 pyx files each time one
changes isn't too bad *shrug*.

To get sensible answers I really think you need to provide more
information about what you attempt to do. Your use case appears very
strange to most of us.

Dag Sverre

Reply all
Reply to author
Forward
0 new messages