Opcode changes: any documentation?

Claudio Freire

unread,

Jul 30, 2010, 10:42:38 AM7/30/10

to Unladen Swallow

Hi guys.

I've been itching to test a theory with unladen-swallow:

Unladen does a lot of low-level optimizations right? It won't do any python-python inlining, last time I checked.

Python's biggest performance issue in normal code are function calls. Not only that, Python's lack of syntactic support for signature overloading means any function that wants to be "polymorphic" must do so programatically (ie: if isinstance(param1, list):...). This means a lot of runtime overhead that could be optimized out.

Since unladen is no tracing JIT, it fails to optimize *those* situations.

(I'm talking out of pure theory here, I haven't checked, so correct me if I'm wrong).

Now, as global lookups are "in theory dynamic but in practice static", polymorphic calls are just like that. In theory, in general, they're dynamic. The function, considered in all its uses, must handle all possible arguments. But if you consider only a specific call site, a lot of python functions (even library functions) can be heavily specialized and a lot of cruft removed.

I wrote an optimizer once that inspects bytecode and does that sort of specialization and inlining. It's rough, but I've been using it to perform bytecode validation (it so happens that during the course of specialization it find many common programming mistakes), and it complements pylint/pychecker nicely.

But I've been wondering if these optimizations and site-specific specializations would also complement unladen-swallow's low level optimizations. If it works, it would be like adding python-python inlining to unladen-swallow. I wanted to see if this would produce a significant performance increase across the board - on synthetic benchmarks, the optimizer often speeds up the benchmark many-fold. In the function call benchmark, for instance, the python-python inlining practically removes the entire benchmark (because that specific one only calls no-op functions, so after inlining it's simply a for loop that does nothing).

Thing is... sorry for the big preamble... I've had some trouble adapting the optimizer to unladen-swallow.

As you can imagine, it depends on precise modelling of how each opcode works, and a few minor changes to the opcodes used in unladen-swallow bytecode has been giving me trouble. I've been trying to adapt the optimizer taking a peek at unladen's dispatch loop, but it's burdensome.

Is there any documentation on the changes made other than the source? A summary of differences in opcodes perhaps?
Having a list of things to look out for would help wonders.

Reid Kleckner

unread,

Jul 30, 2010, 12:19:00 PM7/30/10

to Claudio Freire, Unladen Swallow

Are you essentially trying to do bytecode specialization and inlining?
Reminds me of the quickening approach that guy was talking about on
python-dev. We want to do inlining as well for all the reasons that
you mentioned, and ebo has been working on it.

I don't think we have a complete list of the modifications we've made
anywhere, but to my knowledge we haven't done too much. What comes to
mind is:

- There's feedback code in the eval loop that you will probably bump into
- We removed IMPORT_NAME and made it a builtin function, and then added it back
- In the future, I'm trying to add LOAD_METHOD and CALL_METHOD
- There's the recent WITH_CLEANUP and END_FINALLY changes

Reid

Jörg Blank

unread,

Jul 30, 2010, 12:42:19 PM7/30/10

to unladen...@googlegroups.com

Hi,

> Unladen does a lot of low-level optimizations right? It won't do any
> python-python inlining, last time I checked.

Not yet. But I have a patch that must be updated to trunk :-]

> Since unladen is no tracing JIT, it fails to optimize *those* situations.

Not necessarily. You just have to implement all those little optimizations.

> I wrote an optimizer once that inspects bytecode and does that sort of
> specialization and inlining.

I guess you are generating new code objects or change old ones. We do
not do that because we like to keep the complete CPython semantics, i.e.
a optimized program should behave exactly like normal (as long as you
don't touch a _* function).
In case of inlineing that means preserving all the stack frames and
being able to bail into the eval loop, even when in inlined code.

> In
> the function call benchmark, for instance, the python-python inlining
> practically removes the entire benchmark (because that specific one only
> calls no-op functions, so after inlining it's simply a for loop that
> does nothing).

That benchmark is only good for measuring the time the call code takes.
Anecdote: During my work P/P Inlineing that benchmark run with 20 times
faster or 20 times slower depending on how much was inlined.

> Is there any documentation on the changes made other than the source? A
> summary of differences in opcodes perhaps?
> Having a list of things to look out for would help wonders.

You can take a look at Python/import.c: There is a list of all opcode
changes.

Regards,
J�rg "ebo" Blank

Claudio Freire

unread,

Jul 30, 2010, 1:35:08 PM7/30/10

to Jörg Blank, unladen...@googlegroups.com

On Fri, Jul 30, 2010 at 1:42 PM, Jörg Blank <e...@4geeks.de> wrote:

> I wrote an optimizer once that inspects bytecode and does that sort of
> specialization and inlining.

I guess you are generating new code objects or change old ones.

Yep

We do
not do that because we like to keep the complete CPython semantics, i.e.
a optimized program should behave exactly like normal (as long as you
don't touch a _* function).
In case of inlineing that means preserving all the stack frames and
being able to bail into the eval loop, even when in inlined code.

I see.
Well, I would certainly not have a problem with messed up frames. I bet lots of people would be comfortable with the idea of slightly convoluted tracebacks when running optimized code.
So you could consider it for -OO I guess ?

> In
> the function call benchmark, for instance, the python-python inlining
> practically removes the entire benchmark (because that specific one only
> calls no-op functions, so after inlining it's simply a for loop that
> does nothing).

That benchmark is only good for measuring the time the call code takes.
Anecdote: During my work P/P Inlineing that benchmark run with 20 times
faster or 20 times slower depending on how much was inlined.

Yep, I've noticed the same (with regular python). If you inline too much, the increased bytecode volume hurts the cache big time.
I've coded a lot of size and recursion limits to get around that. I can't gauge how beneficial inlining a particular call would be, though, something unladen could certainly do with proper feedback.

> Is there any documentation on the changes made other than the source? A
> summary of differences in opcodes perhaps?
> Having a list of things to look out for would help wonders.

You can take a look at Python/import.c: There is a list of all opcode
changes.

Cool, thanks for the tip.

Jörg Blank

unread,

Jul 30, 2010, 4:36:11 PM7/30/10

to unladen...@googlegroups.com

Hi,

> Well, I would certainly not have a problem with messed up frames. I bet
> lots of people would be comfortable with the idea of slightly convoluted
> tracebacks when running optimized code.

There are more programs that look at such stuff than you might think.
This is likely true for code written with that in mind, but not for
'legacy code'.

> So you could consider it for -OO I guess ?

Nope, sry. While I generally encourage bytecode-hackery (see:
http://bitbucket.org/ebo/pyvm), for US this is entirely userspace,
nothing we want to implement.