python-based idl parser / codegenerator

792 views
Skip to first unread message

lkcl

unread,
Jul 10, 2013, 7:14:32 PM7/10/13
to blin...@chromium.org
http://git.savannah.gnu.org/cgit/pythonwebkit.git/tree/pywebkitgtk/wkcodegen?h=python_codegen

i notice that people are beginning a stub version of a python-based codegenerator for blink.  the above link already contains one - something that i thought people might like to be made aware of before doing all that work.

it's based on... don't laugh: mozilla's xpidl and gobject's codegen.  xpidl i cut/paste and then modified to comprehend the webkit IDL syntax (which was easy to do).  codegen i then modified to comprehend xpidl's AST.  that joined the two disparate bits of code together.  i then modified codegen further to comprehend the webkit datatypes (mostly this was in argtypes.py) and the rest of the work was in modifying it to output a python-based module.

the last bit was obviously specific to the task being done: to provide direct python bindings to webkit.  however, it should not be a difficult task to add in alternatives that cover other programming languages.

to give you some idea of how easy this was, i'm also the author of the gobject webkit bindings.  those took me six to eight weeks.  by contrast, creating the python webkit bindings *including* the above work to create *from scratch* the entire wkcodegen mash-up took only eight DAYS.  much of that time was spent in other areas such as modifying GNUmakefile.am to create the python webkit module.

the only dependency: python-ply.

copyright on the modified code has been assigned to the FSF.  Licenses vary depending on the original authors [of xpidl and codegen].

l.

Kentaro Hara

unread,
Jul 10, 2013, 7:46:34 PM7/10/13
to lkcl, ko...@chromium.org, Nils Barth, blink-dev
Thanks for the info!

We're also working on implementing IDL parser and code generator in Python. Yesterday we have just landed a stub to trunk [1].

Regarding IDL parser, we're using Pepper's IDL parser [2] with a couple of Blink-specific tweaks. nbarth@ already completed the new IDL parser in his local branch and I think we can land it next week.

Regarding code generator, we're using jinja template engine [3]. See this CL [4] about how the template-based code generator looks like. kojih@ is implementing the new code generator and now it passes 60% of all IDL files. After landing the new IDL parser, we're planning to move IDL files from the old Perl code generator to the new Python code generator incrementally.

If you're interested in the IDL compiler work in Blink, you can follow this mete bug [5].




--
Kentaro Hara, Tokyo, Japan

Nils Barth

unread,
Jul 10, 2013, 10:38:13 PM7/10/13
to lkcl, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
Hi Luke,
Thanks so much for the info!

I don't think we can effectively reuse your code (or even could have had we know about it earlier), and we in fact already have done most of the work, but your description of what you're doing is very useful.
(Perhaps WebKit could use your parser or code generator approach for all bindings?)


(Outline of what we're doing below; for updates follow:
Issue 239771: Rewrite the IDL compiler in Python.)

We're in fact taking a very similar approach: we're also using PLY (Python Lex-Yacc) for the parser/frontend, and a template engine, in our case Jinja, for the code generator/backend.

These libraries make the rewrite clean and straightforward, but actual implementation takes lots of detailed work, similar to what you saw:
* complexity of existing V8 C++ bindings
(main complexity, language-specific);
* reading AST
(converting to Python objects as Intermediate Representation; straightforward but long, especially as we want to maintain exact compatibility with Perl during the transition);
* build changes
(GYP rather than make);
* reusing existing Chromium code
(see below).

In terms of code reuse, rather than copy-pasting and modifying existing programs, we're using existing Chromium code as much as possible. We currently have 4 IDL parsers + code generators: Blink, Chromium bindings, Dart, and Pepper – actually 6 if you include old/new Blink and Pepper – and we'd like this to be 1, both to avoid code duplication and for maintainability.

There already is a PLY-based IDL parser in the Chromium trunk, namely the Pepper parser version 2 at src/tools/idl_parser, so for code reuse, the new Blink IDL parser derives from this base one. Ideally they'd in fact be merged, but we probably need a few compatibility tweaks. Thus we get the same result as using (py?)xpidl (a Python Lex-Yacc based parser), without the code duplication.

As haraken noted, we've already made the build changes and the parser is complete locally, so we should land that in the next week, and then incrementally land the template-based code generator.

Thanks again for the note!

Best,
Nils

luke.leighton

unread,
Jul 16, 2013, 6:13:10 PM7/16/13
to Nils Barth, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
On Thu, Jul 11, 2013 at 3:38 AM, Nils Barth <nba...@chromium.org> wrote:
> Hi Luke,
> Thanks so much for the info!

no problem - sorry for taking a while to get back to you.

> I don't think we can effectively reuse your code

heck it's not even mine! i just took the best-of-breed proven code
from two disparate projects and spliced them together :)

>(or even could have had we
> know about it earlier), and we in fact already have done most of the work,
> but your description of what you're doing is very useful.
> (Perhaps WebKit could use your parser or code generator approach for all
> bindings?)

yehhh, about that: as many people will know already, the efforts in i
think it was 2009 to add gobject bindings using the existing
perl-based code generator illustrated very graphically why it is that
the perl-based codegenerator needs to have its arms and legs ripped
off, flung to the furthest corners of the earth and the rest buried
deep at the bottom of the mariana trench.

i don't know if you know the background but the discussion got to
*three hundred and fifty* separate messages on the one bugreport, and
at one time i had over 650 simultaneous vi sessions open - so many
were open in one xterm that i had to do e.g. "jobs | grep Node" in
order to find them.

the complexity was so overwhelming that it caused the webkit
developers to be unable to cope: i simply couldn't explain to them
what the decisions were, and i certainly didn't have time to go back
and "redo" things step-by-step in a way that would allow them to
"follow along" so to speak, as the IDL files are, as you've no doubt
already discovered, horribly, horribly inter-linked.

the bottom line is: they blamed me for the mess, and rather than fix
the problems, one of the developers began what can only be described
as a very carefully arranged vendetta which gave him perfect
justification to ban me from webkit's mailing list and bugtracker.

answer: they could... but they're so embarrassed by their behaviour
that the chances are quite remote. i've done the best i can by
assigning the copyright of the code to the FSF but they'll have to
work things out for themselves.

eric seidel whom i respect enormously unfortunately got caught up in
the cross-fire.

but... anyway, an interesting lesson that i'm glad to see that you're
avoiding by doing a decent redesign.

with that in mind - one thing to consider: given the background and
the stress caused by the existing perl-based design, do you *really*
want to keep it around, even by copying it accidentally by following a
conversion path?


> (Outline of what we're doing below; for updates follow:
> Issue 239771: Rewrite the IDL compiler in Python.)
>
> We're in fact taking a very similar approach: we're also using PLY (Python
> Lex-Yacc) for the parser/frontend,

ply's pretty awesome. but the funny thing is i didn't choose it: the
decision was made by the mozilla foundation when they did xpidl all
those years ago.

> and a template engine, in our case Jinja,
> for the code generator/backend.

if it's got the ability to do pre, middle and post function
generation, as well as return types, pre- and post- code-generation on
argument types as distinct from being used as return types, then you
have everything you need here.

the pre- and post- code-generation as associated with variable
"types" is rather important as you need to do type-conversion prior to
using an argument and then clean-up afterwards.

that's what the argtypes.py stuff from gobject's codegen is all
about. one class per "type" - int, float, object, date and so on -
and each "type" takes care of itself.

that then made the main "module" generation - where the functions and
property codegeneration etc. are created - much easier to do.

makes for vastly simpler and manageable code. i really didn't have
to do a vast amount of testing. churned out the right pre/post stuff
for each argtype one by one, then ran it.

testing-wise this stuff's so critically dependent on getting it right
that bugs were, to be honest, blindingly obvious :) miss out even a
single refcount on a pre-gen and you *really* know about it, instantly
:) miss out a single decref on a post-gen and again the sheer
overwhelming amount of memory lost again is blindingly obvious.


> These libraries make the rewrite clean and straightforward, but actual
> implementation takes lots of detailed work, similar to what you saw:

... detailed and *very* intense concentration in order to maintain
waaay above-average number of different tasks/issues/languages in your
brain at once...

... i sympathise :)

> * complexity of existing V8 C++ bindings

ahh, it's not so baaaad :) really!

> (main complexity, language-specific);
>
> * reading AST

... here i avoided that effort entirely, by making an
already-existing tried-and-tested and proven codebase comprehend
webkit IDL.

> (converting to Python objects as Intermediate Representation;
> straightforward but long, especially as we want to maintain exact
> compatibility with Perl during the transition);

[ yehh, i invite you to reconsider that decision in light of the
legacy that its design will carry forward, namely: the strain that the
design or lack of placed onto anyone who got involved with it. ]

again, here's where i avoided that effort entirely, by not only
dropping that legacy but also just... re-using that proven codebase.
i think the only changes i needed to do were to make it comprehend the
webkit type-qualifiers (the ones in square brackets).

> * build changes
>
> (GYP rather than make);

oo fuuun :)

> * reusing existing Chromium code
>
> (see below).
>
>
> In terms of code reuse, rather than copy-pasting and modifying existing
> programs, we're using existing Chromium code as much as possible. We
> currently have 4 IDL parsers + code generators: Blink, Chromium bindings,
> Dart, and Pepper – actually 6 if you include old/new Blink and Pepper – and
> we'd like this to be 1, both to avoid code duplication and for
> maintainability.

mmmm.... my guess is that you'll end up getting bitten in the ass if
you try that :)

oh wait.... someone's already added a *completely separate* blink IDL
parser+generator (as separate and distinct from the perl-based
codegenerator one)?

if so, maybe people have learned from that absolute nightmare attempt
to add gobject bindings, after all.

but... yeah, if you're looking to do common code-generation with
different front-ends for each language, and the codebase reflects that
through providing infrastructure and then front-end language-specific
generators, then yeah that's a great idea.



> There already is a PLY-based IDL parser in the Chromium trunk, namely the
> Pepper parser version 2 at src/tools/idl_parser, so for code reuse, the new
> Blink IDL parser derives from this base one. Ideally they'd in fact be
> merged, but we probably need a few compatibility tweaks. Thus we get the
> same result as using (py?)xpidl (a Python Lex-Yacc based parser), without
> the code duplication.

it's the AST that ideally needs to be the same in each. gosh there
appears to have been quite a lot of disparate efforts and a lot of
non-communication going on. hmmm....

> As haraken noted, we've already made the build changes and the parser is
> complete locally, so we should land that in the next week, and then
> incrementally land the template-based code generator.

... bear in mind that the main reason i had such an enormous amount
of difficulty with the webkit team was that this stuff was flat-out
impossible to verify or even code up in anything remotely approaching
an "incremental" fashion. it was so inter-dependent that it was very
much an all-or-nothing deal, and verification very much had to be done
at the higher levels. i.e. as it's so low-level, through the sheer
overwhelming amount of times the code gets called, any problems
*instantly* showed up, and so even doing unit tests turned out to be
completely unnecessary.

i mention this because if you are deploying the exact same
ultra-strict code-review procedures that the webkit developers are
deploying, you're going to run into problems.

just... something to watch out for that because of the critical
inter-dependence of the IDL files, and that that inter-dependence gets
naturally reflected into the code, at some point you may end up with a
rather large patch that brings everything together that simply can't
be split into smaller chunks in any meaningful way, and you may end up
having to take a leap of faith at a top level that it just.... "works"
:)


> Thanks again for the note!

no problem sah.

/peace

l.

Nils Barth

unread,
Jul 23, 2013, 1:44:26 AM7/23/13
to luke.leighton, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
Thanks for the detailed background Luke!
I'll follow-up on more technical points off-list, but to answer a few points of general interest.

Summary
The 100% complete Python parser has landed (r154633), the Python code generator (Koji) should start landing incrementally this week, currently 80% complete in local branch.

We will not have any technical debt from the Perl compiler per se (it does not constrain the new design), and we will not be keeping it around once we've finished the Python.

However, we are using the Perl compiler in the interim so that we can land the new Python compiler incrementally, avoiding the massive CL/leap of faith problem.

Once that's done, we'll delete the Perl.


Details
with that in mind - one thing to consider: given the background and
the stress caused by the existing perl-based design, do you *really*
want to keep it around, even by copying it accidentally by following a
conversion path? 

We will eliminate the Perl compiler as soon as the conversion to Python is complete; we're not keeping it around.
It's being used for files that can't yet be processed by the Python compiler (e.g., SVG IDLs).

Also, the designs (both front-end parser and back-end code generator) are completely new, very modular, and not constrained by the legacy Perl: data structures strictly follow the Web IDL spec.
haraken's detailed reviewing was very helpful in preventing subtle legacies from creeping into the new code.

We are of course using the hard-won knowledge in the existing Perl to see where the pitfalls are and which hacks we need to reimplement for now, but any hacks are tacked on to the clean design and flagged, and will be removed when fixed.
(There are many, so this will take time.)


> (converting to Python objects as Intermediate Representation;
> straightforward but long, especially as we want to maintain exact
> compatibility with Perl during the transition);

 [ yehh, i invite you to reconsider that decision in light of the
legacy that its design will carry forward, namely: the strain that the
design or lack of placed onto anyone who got involved with it. ]
 
This "compatibility" is just an export function, so we can check that the parsers have the same data. Python uses a different IR than Perl (Python closely follows the spec), so the export code is a bit twisty, but this will all be removed when the Perl compiler is gone, and there will be no technical debt after that.


 oh wait.... someone's already added a *completely separate* blink IDL
parser+generator (as separate and distinct from the perl-based
codegenerator one)?

Yes, that would be me (CL 18190004).
The initial parser+generator were stubs;
the actual parser is CL 15801003, which just landed,
and the code generator will be rolled out by Koji over a number of CLs as it is completed, allowing incremental testing and review.


 but... yeah, if you're looking to do common code-generation with
different front-ends for each language, and the codebase reflects that
through providing infrastructure and then front-end language-specific
generators, then yeah that's a great idea.

Actually other way around: the IDLs are the input language, so the frontend is the same (modulo slight dialect differences), but we'll need several backends (Blink, Chrome extensions, Pepper, Dart).

At least the parser can be shared entirely or almost entirely, and hopefully even some of the backends can be combined (e.g., Blink and Chrome extensions), if that makes sense engineering-wise.


 ... bear in mind that the main reason i had such an enormous amount
of difficulty with the webkit team was that this stuff was flat-out
impossible to verify or even code up in anything remotely approaching
an "incremental" fashion.

Since we already have a working (Perl) compiler, we can work incrementally by checking that the Python compiler generates the same code as the Perl, and then switching files over one by one.

This requires having separate Perl/Python flows (which took some setup), and is why we're keeping the Perl compiler around until the Python one is complete: it lets us work incrementally.

The parser was one big CL, but the code generator will land incrementally (starting this week), and will accordingly be coded and reviewed incrementally. We can do it one IDL file at a time if necessary.


 just... something to watch out for that because of the critical
inter-dependence of the IDL files, and that that inter-dependence gets
naturally reflected into the code,

The dependencies do make it complicated, and make writing the code generator backend quite difficult, as you saw.
Once I finish followup on the parser, I'll be tackling this next.

Thanks again for sharing your experiences!
I've a few question that I'll send separately.

Best,
Nils

Adam Barth

unread,
Jul 23, 2013, 2:34:33 AM7/23/13
to Nils Barth, luke.leighton, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
On Mon, Jul 22, 2013 at 10:44 PM, Nils Barth <nba...@chromium.org> wrote:
Thanks for the detailed background Luke!
I'll follow-up on more technical points off-list, but to answer a few points of general interest.

Summary
The 100% complete Python parser has landed (r154633), the Python code generator (Koji) should start landing incrementally this week, currently 80% complete in local branch.

We will not have any technical debt from the Perl compiler per se (it does not constrain the new design), and we will not be keeping it around once we've finished the Python.

However, we are using the Perl compiler in the interim so that we can land the new Python compiler incrementally, avoiding the massive CL/leap of faith problem.

Once that's done, we'll delete the Perl.

I'm excited that this project is progressing!  It's important that the Perl implementation not linger too long in the tree because it means we're (temporarily) back in the situation of having two code generators to maintain.

Adam

Nils Barth

unread,
Jul 23, 2013, 2:45:25 AM7/23/13
to Adam Barth, luke.leighton, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
Adam Barth:
I'm excited that this project is progressing!  It's important that the Perl implementation not linger too long in the tree because it means we're (temporarily) back in the situation of having two code generators to maintain.

Thanks, and agreed!
Koji is 85% finished with the code generator locally (CL 17572008), so once this lands (in stages), hopefully all new development will be on Python, with Perl only for a few especially hairy IDLs.
I don't know how long the remaining IDLs will take, but we're very motivated to get this to 100% ASAP.

luke.leighton

unread,
Jul 23, 2013, 3:38:50 AM7/23/13
to Nils Barth, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
On Tue, Jul 23, 2013 at 6:44 AM, Nils Barth <nba...@chromium.org> wrote:
> Thanks for the detailed background Luke!
> I'll follow-up on more technical points off-list, but to answer a few points
> of general interest.

... all of which i think says you're well on the way to success here :)

luke.leighton

unread,
Jul 23, 2013, 3:57:15 AM7/23/13
to Nils Barth, Adam Barth, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
quick summary about the inter-dependencies, don't complicate the
parser by making it a 2-stage with analysis - punt that to the
compiler (c/c++) which is designed for that job. so, auto-generate
code where the assumption is that e.g. the headers reflect the exact
same inter-dependencies, do all the forward-declarations needed and so
on where the target compiler (c/c++) will sort it all out.
gotta go.
l.

lkcl .

unread,
Mar 12, 2014, 8:37:04 AM3/12/14
to Nils Barth, Adam Barth, Kentaro Hara, Koji Hara, Noel Allen, blink-dev
hi folks, just checking in to see how this went - did it go ok? you
encountered the issue of maintaining a cache (on a per-binding basis)
which uses the base classes 'type' to manually re-cast back up on the
language side (e.g. return result from appendChild down-casting to
type Node, how do you get back to its correct object type... *on the
language* side), and you didn't fall for the trick of thinking you
have to increase the Ref count on the webkit object in the cache
*every* time the bindings-cache object references it (like i did,
once), did you? :)

l.

p.s. interesting decision on the choice to use aura, one i can say
given the current available main two toolkit options is actually a
good one. have you considered going *even lower level* (even simpler,
e.g. a la directfb liblite1.2) and then implementing the window
manager in e.g. javascript or other high-level language, just like is
done in gecko?
http://slashdot.org/comments.pl?sid=4886551&cid=46461855
Reply all
Reply to author
Forward
0 new messages