Re: [cython-users] embedding many cython modules in C++ project

960 προβολές
Παράβλεψη και μετάβαση στο πρώτο μη αναγνωσμένο μήνυμα

Gabriel Jacobo

μη αναγνωσμένη,
31 Ιουλ 2012, 4:03:32 μ.μ.31/7/12
ως cython...@googlegroups.com
2012/7/31 jcopeland <mon...@gmail.com>
I'm looking for thoughts on my approach to see if any of you cython wizards have dealt with this before.

I've got a C++ base project and I'd like to be able to write portions of the project in python/cython.  This is primarily to speed development but also to take advantage of speed enhancements and the slight code obfuscation offered by translating python code to C and compiling.

So currently I've got a makefile that identifies all pyx/py files and runs cython on them to generate .cpp files.  The compiler is ran against the .cpp files to generate .o files which are then linked with the larger project.

From experimentation and looking at the code generated when using the --embed option I've determined the general flow(less error checking, conditional compilation, etc) to run my cython code is Py_Initialize -> initMODULENAME -> call whatever cdef'd functions of interest -> Py_Finalize when all done.  To make it easier for the C++ coder I'm thinking I'll have a single function to call and init things, and then a single function to take care of cleanup and error checking.

Here's the problem.  This works well for a single module but the issue I'm grappling with now is how to deal with scaling this to numerous modules.  I've found that any module that is included by any python function must also have it's initMODULENAME function called before calling my functions.

I've started looking at auto-generating some code so that the proper init functions get called either directly after Py_Initialize or before Py_Initialize using either PyImport_AppendInittab() or PyImport_ExtendInittab().

Am I going about this the right way?  Any thoughts on a better approach?


I'm doing something like that in my game engine, I'm statically linking cythonized modules together with the Python interpreter and C/C++ code, you may want to take a look at the source code, here's the utility that puts it together: https://bitbucket.org/gabomdq/ignifuga/src/b39b0e131f74/tools/schafer.py

In short, what the utility does is go through the source files and it flattens the structure so for example backends/sdl/Renderer.pyx becomes backends+sdl+Renderer.pyx (this solves the issue of different modules in different subdirectories being named the same). There's a few search/replaces done for this (changing the name of the Cython generated init functions), and then the script generates a "glue" file that looks like this. This also requires that each folder contains a __init__.py file that you cythonize as well, which must include a __path__.
For a Python 2.7.2 interpreter linked statically, you also have to do apply a small patch to the import.c file so this added builtin files are found.

I hope some of this gives you a better idea of what you are up against.

An example of the glue code:

#include "Python.h"
static PyMethodDef nomethods[] = {  {NULL, NULL}};
extern void initignifuga(void);
extern void initignifuga_Scene(void);
extern void initignifuga_Task(void);
extern void initignifuga_Entity(void);
extern void initignifuga_Gilbert(void);
...

PyMODINIT_FUNC
initignifuga(){
    PyObject* module;
    PyObject* __path__;

    // Add a __path__ attribute so Python knows that this is a package
    PyObject* package_ignifuga = PyImport_AddModule("ignifuga");
    Py_InitModule("ignifuga", nomethods);
    __path__ = PyList_New(1);
    PyList_SetItem(__path__, 0, PyString_FromString("ignifuga"));
    PyModule_AddObject(package_ignifuga, "__path__", __path__);
    PyImport_AppendInittab("ignifuga", initignifuga);
    PyImport_AppendInittab("ignifuga.Scene", initignifuga_Scene);
    PyImport_AppendInittab("ignifuga.Task", initignifuga_Task);
    PyImport_AppendInittab("ignifuga.Entity", initignifuga_Entity);
    PyImport_AppendInittab("ignifuga.Gilbert", initignifuga_Gilbert);
...
    }

--
Gabriel.

Stefan Behnel

μη αναγνωσμένη,
1 Αυγ 2012, 1:14:08 π.μ.1/8/12
ως cython...@googlegroups.com
jcopeland, 31.07.2012 20:57:
> I'm looking for thoughts on my approach to see if any of you cython wizards
> have dealt with this before.
>
> I've got a C++ base project and I'd like to be able to write portions of
> the project in python/cython. This is primarily to speed development but
> also to take advantage of speed enhancements and the slight code
> obfuscation offered by translating python code to C and compiling.
>
> So currently I've got a makefile that identifies all pyx/py files and runs
> cython on them to generate .cpp files. The compiler is ran against the
> .cpp files to generate .o files which are then linked with the larger
> project.
>
> From experimentation and looking at the code generated when using the
> --embed option I've determined the general flow(less error checking,
> conditional compilation, etc) to run my cython code is Py_Initialize ->
> initMODULENAME -> call whatever cdef'd functions of interest -> Py_Finalize
> when all done. To make it easier for the C++ coder I'm thinking I'll have
> a single function to call and init things, and then a single function to
> take care of cleanup and error checking.
>
> Here's the problem. This works well for a single module but the issue I'm
> grappling with now is how to deal with scaling this to numerous modules.
> I've found that any module that is included by any python function must
> also have it's initMODULENAME function called before calling my functions.

There's a "cython_freeze" script in the bin/ directory that can generate an
init function for a set of Cython modules.

Stefan

jcopeland

μη αναγνωσμένη,
1 Αυγ 2012, 11:11:45 π.μ.1/8/12
ως cython...@googlegroups.com, stef...@behnel.de
Stefan, thanks for that.  I'd found cython_freeze earlier but quickly moved past thinking it was the code generator for embed by mistake.  Looking more closely it seems to be pretty close to what I need.  The only difference is instead of running a single script as main (or py_main), I want to just inialize the python interpreter, call a bunch of my own cdef'ed/c functions to do interesting things, return data, etc. and then later finalize the interpreter.  It looks like hacking cython_freeze would work pretty well for my simple case.

I'm still interested in the applicable part of Gabriel's schafer tool.  It seems to do some name mangling on package modules to prevent issues with identically named modules at different levels in the directory structure that would probably be useful in less simple cases than I'm dealing with on this immediate need.

Gabriel, one thing haven't yet figured out is why the __init__.py packages need the __path__ set for embedding.  I'm not so familiar with putting things in packages and would like to understand this better.  Also FYI, I had a relatively clean VM setup with Ubuntu 12.04-64bit so I attempted a linux64 build on your tool to see things in action as well as the -demo.  I ran into build problems with it complaining about missing two libraries -lgccp and -lgc.  I have all dependencies installed (-D) without issue and my google-fu was not enough to figure out what I may be missing.

Anyone else, keep the thoughts coming!

Bradley Froehle

μη αναγνωσμένη,
1 Αυγ 2012, 11:30:30 π.μ.1/8/12
ως cython...@googlegroups.com, stef...@behnel.de
Try `apt-get install libgc1c2`.

Gabriel Jacobo

μη αναγνωσμένη,
1 Αυγ 2012, 11:59:50 π.μ.1/8/12
ως cython...@googlegroups.com
2012/8/1 jcopeland <mon...@gmail.com>

Stefan, thanks for that.  I'd found cython_freeze earlier but quickly moved past thinking it was the code generator for embed by mistake.  Looking more closely it seems to be pretty close to what I need.  The only difference is instead of running a single script as main (or py_main), I want to just inialize the python interpreter, call a bunch of my own cdef'ed/c functions to do interesting things, return data, etc. and then later finalize the interpreter.  It looks like hacking cython_freeze would work pretty well for my simple case.

I'm still interested in the applicable part of Gabriel's schafer tool.  It seems to do some name mangling on package modules to prevent issues with identically named modules at different levels in the directory structure that would probably be useful in less simple cases than I'm dealing with on this immediate need.

Gabriel, one thing haven't yet figured out is why the __init__.py packages need the __path__ set for embedding.  I'm not so familiar with putting things in packages and would like to understand this better.  Also FYI, I had a relatively clean VM setup with Ubuntu 12.04-64bit so I attempted a linux64 build on your tool to see things in action as well as the -demo.  I ran into build problems with it complaining about missing two libraries -lgccp and -lgc.  I have all dependencies installed (-D) without issue and my google-fu was not enough to figure out what I may be missing.

Anyone else, keep the thoughts coming!


As they've already mentioned you have to install the libgc1c2 package, I've added it to the dependencies Schafer installs, thanks for the report!

As to the __init__.py files, this is related to the way Python searches for packages, and IIRC it's required for the case where you import a module from a different submodule, like this...suppose you have 

myapp/a/ModuleA.py
myapp/b/ModuleB.py

If you want to do in ModuleB.py "from myapp.a.ModuleA import something", you need those files and the __path__ attribute in them. There's an alternative that involves more glue code for settings things up, but I found this alternative to be the simplest one in my case.

-- 
Gabriel.

jcopeland

μη αναγνωσμένη,
4 Αυγ 2012, 4:03:42 μ.μ.4/8/12
ως cython...@googlegroups.com
So first the successes, and then I'll explain my current struggle...   

So I've had more of a chance to mess around with things.  I got things working using a modified version of cython_freeze against a set of modules in a directory.  My modifications were just slicing up the existing cython_freeze so that instead of generating a main() function I instead generate an initialize, close, and optionally a main function (making use of the initialize and close routines).  Resulting code ends up looking like this when ran against 2 modules (main_w and pysrc):

#include <Python.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

# define MODINIT(name)  init ## name

PyMODINIT_FUNC MODINIT(main_w) (void);
PyMODINIT_FUNC MODINIT(pysrc) (void);

static struct _inittab inittab[] = {
    {"main_w", MODINIT(main_w)},
    {"pysrc", MODINIT(pysrc)},
    {NULL, NULL}
};

extern int __pyx_module_is_main_main_w;

void Python_Initialize_Session() {
    if (PyImport_ExtendInittab(inittab)) {
        fprintf(stderr, "No memory\n");
        exit(1);
    }
    Py_Initialize();
}

int Python_Close_Session() {
int r = 0;
    if (PyErr_Occurred()) {
        r = 1;
        PyErr_Print(); /* This exits with the right code if SystemExit. */
        if (Py_FlushLine()) PyErr_Clear();
    }
    Py_Finalize();
    return r;
}

Again the intent is to have the ability to embed compiled python in a C project but allow more flexibility than just running a main file.  In this case in C one could do something like this:

    Python_Initialize_Session();
    initpysrc(); //This is essentially the same as "import pysrc"
    //Call whatever "cdef public" functions you want from pysrc and do procesing
    Python_Close_Session(); // When all done



This setup seems like it would work quite well for a set of modules in a single directory, but doesn't really have the ability to deal with sub-directories containing packages (correct if I'm wrong).  So now I'm pursuing more along the lines of Gabriel and looking to do some name mangling to get things to work with packages.  

Gabriel, I've had a go at hacking up the schafer tool you referenced in an earlier post.  I've got it pulling all desired files, name mangling, and putting resulting cpp files in a single build/<arch>/cython_src/ directory.  What I'm struggling with is how to integrate the resulting <PKG>_glue.c file (with associated init<PKG>() function so that I can import and use that package elsewhere in my project.

I've tried many things all without success.  First I tried calling the function prior to my Py_Initialize() call but that resulted in segfault issues since your making use of python API stuff in the function.  I tried calling after Py_Initialize() without segfault, but can't seem to get any of my other code to be able to import any modules within the package.

I also tried hooking the init<PKG> call into my cython_freeze stuff so that the pkg would get mapped to the _glue init file when I did the extendInittab calls before Py_Initialize.  

I'm thinking that the issues are either related to some misunderstanding I have about the __path__ stuff in the __init__.py of the packages, or that there something I don't have because I'm not using your patched python interpreter.  On the __path__ stuff I've mirrored what you did your project.  I'm still looking into your patches on the interpreter.  You said before you only needed the patch to import.c when dealing with a statically compiled python.  Right now I'm still doing dynamic linking, but perhaps there is still some need with something being done in the glue code.

I'll be working out a minimal example project to post, but wanted to see if anyone had any ideas.

jcopeland

μη αναγνωσμένη,
4 Αυγ 2012, 6:28:11 μ.μ.4/8/12
ως cython...@googlegroups.com


I'm thinking that the issues are either related to some misunderstanding I have about the __path__ stuff in the __init__.py of the packages, or that there something I don't have because I'm not using your patched python interpreter.  On the __path__ stuff I've mirrored what you did your project.  I'm still looking into your patches on the interpreter.  You said before you only needed the patch to import.c when dealing with a statically compiled python.  Right now I'm still doing dynamic linking, but perhaps there is still some need with something being done in the glue code.

So with further investigation I found a blog article (http://mdqinc.com/blog/2011/08/statically-linking-python-with-cython-generated-modules-and-packages/ --probably you Gabriel) that gave a little more insight to the patch to python's import.c.  I found that this patch was needed even when dynamically linking to python as when adding things to the inittab I guess they're being added as builtins and suffer from the same shallow module search issue.

That gets me moving at least, but does make me wonder if there is another way to get importing of packages to work when embedding cython compiled python scripts within a C project without requiring a custom python build.

Gabriel Jacobo

μη αναγνωσμένη,
4 Αυγ 2012, 7:27:28 μ.μ.4/8/12
ως cython...@googlegroups.com
2012/8/4 jcopeland <mon...@gmail.com>



I'm thinking that the issues are either related to some misunderstanding I have about the __path__ stuff in the __init__.py of the packages, or that there something I don't have because I'm not using your patched python interpreter.  On the __path__ stuff I've mirrored what you did your project.  I'm still looking into your patches on the interpreter.  You said before you only needed the patch to import.c when dealing with a statically compiled python.  Right now I'm still doing dynamic linking, but perhaps there is still some need with something being done in the glue code.

So with further investigation I found a blog article (http://mdqinc.com/blog/2011/08/statically-linking-python-with-cython-generated-modules-and-packages/ --probably you Gabriel) that gave a little more insight to the patch to python's import.c.  I found that this patch was needed even when dynamically linking to python as when adding things to the inittab I guess they're being added as builtins and suffer from the same shallow module search issue.

That gets me moving at least, but does make me wonder if there is another way to get importing of packages to work when embedding cython compiled python scripts within a C project without requiring a custom python build.


That blog post is indeed mine. I believe you can achieve what you want by doing a 1:1 mapping of the pyx modules structure to .so files...

a/Module1.pyx
b/Module2.pyx

becomes...

a/Module1.so
b/Module2.so

And you just import using the standard Python functionality. The problem starts when you want to embed packages (not single modules, but packages where the hierachy is important) because the builtin module plumbing in Python (at least 2.x) doesn't seem to be aware of hierachies and just assumes a flat structure. So, if you want to use the standard interpreter without any changes, mantain each compiled module separate and it should work fine.

--
Gabriel.

dgym....@gmail.com

μη αναγνωσμένη,
11 Νοε 2016, 2:43:33 μ.μ.11/11/16
ως cython-users
Sorry to resurrect this thread, but I was trying to statically link packages with Python 3.5.2 and the posts here were extremely helpful, I just needed one more step.

As above I edited the Cython generated c files to give them a uniquely named init function, and changed the name they registered with to have the full path.
I also used PyImport_AppendInittab to register the init functions, but did so before calling Py_InitializeEx() as per the Python documentation.

At this stage the modules were visible in sys.builtin_module_names, but attempting to import them would fail. It seems that the builtin module loader doesn't work quite how I want it to, so I added a custom import hook that did.

My complete code ended up looking like this:

int main(int argc, char** argv) {
   
// Use PyImport_ExtendInittab or PyImport_AppendInittab to register
   
// the additional builtins first.
   
PyImport_ExtendInittab(builtins);


   
// Then initialize Python.
   
Py_InitializeEx(0);


   
// Then add an import hook to handle the builtins.
   
PyRun_SimpleString(
       
"import importlib.abc\n" \
       
"import importlib.machinery\n" \
       
"import sys\n" \
       
"\n" \
       
"\n" \
       
"class Finder(importlib.abc.MetaPathFinder):\n" \
       
"    def find_spec(self, fullname, path, target=None):\n" \
       
"        if fullname in sys.builtin_module_names:\n" \
       
"            return importlib.machinery.ModuleSpec(\n" \
       
"                fullname,\n" \
       
"                importlib.machinery.BuiltinImporter,\n" \
       
"            )\n" \
       
"\n" \
       
"\n" \
       
"sys.meta_path.append(Finder())\n" \
   
);


   
// Now everything should work, carry on as normal...


   
return 0;
}


Thank you for all the help getting this working.

Απάντηση σε όλους
Απάντηση στον συντάκτη
Προώθηση
0 νέα μηνύματα