So at PyCon, me and Mark talked with David Malcolm, who is the author of
the Python plugin for gcc [1]. What it allows you to do is very easily
hook your own Python script into the compilation pipeline of existing
gcc on your system. I.e., given that gccpython.so is built, one can do
LD_PRELOAD=python27.so gcc -fplugin=/path/to/gccpython.so
-fplugin-arg-python-script=myscript.py -DMYFLAG=1 -o foo.o -c foo.c
Then, given myscript.py (example in footnote [2]), one gets direct
access to the internal AST of gcc.
An obvious usecase is to use this to parse C header files, and generate
pxd files from them. One simply creates a .c-file that includes a given
header-file, then runs the above line with a Python script that walks
the abstract syntax tree and emits the pxd file.
Existing work on auto-generation of pxd files has mostly used gccxml,
which is a pain to compile and rather unsupported if my understanding is
correct. This approach is much more elegant, and will often work with
the gcc the user has installed (even though an extension module must be
compiled, and GCC development headers present on the system).
For prior work, there is already cwrap,
https://github.com/enthought/cwrap
which separates the backend and the frontend. So one could write a
gcc-python-plugin frontend in addition to the current gccxml frontend.
OTOH, I worry that this project is a tad over-engineered and wouldn't
oppose something that just used gcc-python-plugin and emits declarations
directly as strings.
I think this makes for a *great* project for a GSoC student, or anyone
else. It is something that Cython users are really craving, though one
doesn't need to dig into the dirty internals of Cython. I'd be happy to
be a GSoC mentor for this.
I sat down with David Malcolm and made sure I could make this do what I
wanted to do, so I may be able to provide further details to anyone
interested.
Dag
[1]
http://gcc-python-plugin.readthedocs.org/en/latest/index.html
https://fedorahosted.org/gcc-python-plugin/
[2]
# This script is adapted from another usecase so not everything makes
# immediate sende, but it shows how easy it is to get access to the
# guts of the GCC AST.
import gcc
from gccutils import get_src_for_loc, cfg_to_dot, invoke_dot
def on_pass_execution(p, fn):
if p.name == '*free_lang_data':
# The '*free_lang_data' pass is called once,
# rather than per-function,
# and occurs immediately after "*build_cgraph_edges",
# which is the
# pass that initially builds the callgraph
#
# So at this point we're likely to get a good view of
# the callgraph before further optimization passes manipulate
# it
for u in gcc.get_translation_units():
for decl in u.block.vars: # vars means decls
print('%r:%s:%s' % (decl.location, decl, decl.type))
gcc.register_callback(gcc.PLUGIN_PASS_EXECUTION,
on_pass_execution)
I've heard clang has great bindings too, especially for C++, but is of
course not as commonly-used as gcc.