[Python-ideas] Bytecode JIT

61 views
Skip to first unread message

Soni L.

unread,
Jun 30, 2017, 11:10:55 AM6/30/17
to python...@python.org
CPython should get a tracing JIT that turns slow bytecode into fast
bytecode.

A JIT doesn't have to produce machine code. bytecode-to-bytecode
compilation is still compilation. bytecode-to-bytecode compilation works
on iOS, and doesn't require deviating from C.

(This "internal bytecode" should do things like know that 2 variables
necessarily hold integers, doing just "x = y + z" in C in an IADD
instruction as opposed to all those middle-of-function typechecks and
overhead. You can typecheck once at the start of the function and run
separate traces on that. Since this "internal bytecode" is extremely
unsafe, it should be considered an implementation detail and never
exposed to external code.)

_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Oleg Broytman

unread,
Jun 30, 2017, 11:16:26 AM6/30/17
to python...@python.org
On Fri, Jun 30, 2017 at 12:09:52PM -0300, "Soni L." <faked...@gmail.com> wrote:
> CPython should get a

You're welcome to create one. Go on, send your pull requests!

Oleg.
--
Oleg Broytman http://phdru.name/ p...@phdru.name
Programmers don't die, they just GOSUB without RETURN.

Steven D'Aprano

unread,
Jun 30, 2017, 11:22:04 AM6/30/17
to python...@python.org
On Fri, Jun 30, 2017 at 12:09:52PM -0300, Soni L. wrote:

> CPython should get a tracing JIT that turns slow bytecode into fast
> bytecode.

Are you volunteering to do the work?



--
Steve

David Mertz

unread,
Jun 30, 2017, 11:51:44 AM6/30/17
to Steven D'Aprano, python-ideas
PyPy does basically this. So does the tentative project Pyjion. Also Numba, but on a pre-function basis. It's not a bad ideas, and one that currently exists with varying degrees of refinement in several projects. I may have forgotten a few others. I suppose Brython in a sense.

This is very unlikely to make it into core CPython because code simplicity is one of its goals. But all those other projects would welcome your help.

Koos Zevenhoven

unread,
Jun 30, 2017, 1:38:15 PM6/30/17
to python-ideas
On Jun 30, 2017 5:16 PM, "Oleg Broytman" <p...@phdru.name> wrote:
On Fri, Jun 30, 2017 at 12:09:52PM -0300, "Soni L." <faked...@gmail.com> wrote:
> CPython should get a

   You're welcome to create one. Go on, send your pull requests!

But if you are planning to do that, it is still a good idea to ask for feedback here first. That will increase the chances of acceptance by a lot. Also, it doesn't necessarily need to be your own idea :)

-- Koos

Mark Lawrence via Python-ideas

unread,
Jun 30, 2017, 2:53:00 PM6/30/17
to python...@python.org
On 30/06/2017 16:09, Soni L. wrote:
> CPython should get a tracing JIT that turns slow bytecode into fast
> bytecode.
>
> A JIT doesn't have to produce machine code. bytecode-to-bytecode
> compilation is still compilation. bytecode-to-bytecode compilation works
> on iOS, and doesn't require deviating from C.
>
> (This "internal bytecode" should do things like know that 2 variables
> necessarily hold integers, doing just "x = y + z" in C in an IADD
> instruction as opposed to all those middle-of-function typechecks and
> overhead. You can typecheck once at the start of the function and run
> separate traces on that. Since this "internal bytecode" is extremely
> unsafe, it should be considered an implementation detail and never
> exposed to external code.)
>

Patches are always welcome. When do you intend delivering yours?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email has been checked for viruses by AVG.
http://www.avg.com

Victor Stinner

unread,
Jun 30, 2017, 6:19:20 PM6/30/17
to Soni L., python...@python.org
2017-06-30 17:09 GMT+02:00 Soni L. <faked...@gmail.com>:
> CPython should get a tracing JIT that turns slow bytecode into fast
> bytecode.
>
> A JIT doesn't have to produce machine code. bytecode-to-bytecode compilation
> is still compilation. bytecode-to-bytecode compilation works on iOS, and
> doesn't require deviating from C.

Optimizations require to make assumptions on the code, and deoptimize
if an assumption becomes wrong. I call these things "guards". If I
understood correctly, PyPy is able to deoptimize a function in the
middle of the function, while executing it. In my FAT Python project,
I tried something simpler: add guards at the function entry point, and
decide at the entry which version of the code should be run (FAT
Python allows to have more than 2 versions of the code for the same
function).

I described my implementation in the PEP 510:
https://www.python.org/dev/peps/pep-0510/

I agree that you *can* emit more efficient bytecode using assumptions.
But I'm not sure that the best speedup will be worth it. For example,
if your maximum speedup is 20% but the JIT compiler increases the
startup time and uses more memory, I'm not sure that users will use
it. The design will restrict indirectly the maximum speed.

At the bytecode level, you cannot specialize bytecode for 1+2 (x+y
with x=1 and y=2) for example. The BINARY_ADD instruction calls
PyNumber_Add(), but a previous experience showed that the dispatch
inside PyNumber_Add() to reach long_add() is expensive.

I'm trying to find a solution to not make CPython 20% faster, but 2x
faster. See my talk at the recent Python Language Summit (at Pycon
US):
https://github.com/haypo/conf/raw/master/2017-PyconUS/summit.pdf
https://lwn.net/Articles/723949/

My mid-term/long-term plan for FAT Python is to support multiple
optimizers, and allow developers to choose between bytecode ("Python"
code) and machine code ("C" code). For example, work on an optimizer
reusing Cython rather than writing a new compiler from scratch. My
current optimizer works at the AST level and emits more efficient
bytecode by rewriting the AST.

But another major design choice in FAT Python is to run the optimizer
ahead-of-time (AoT), rather than just-in-time (JIT). Maybe it will not
work. We will see :-)

I suggest you to take a look at my notes to make CPython faster:
http://faster-cpython.readthedocs.io/

FAT Python homepage:
http://faster-cpython.readthedocs.io/fat_python.html

--

You may also be interested by my Pycon US talk about CPython
optimization in 3.5, 3.6 and 3.7:
https://lwn.net/Articles/725114/

Victor

Soni L.

unread,
Jun 30, 2017, 6:33:22 PM6/30/17
to Victor Stinner, python...@python.org
If you can assert that the sum(s) never overflow an int, you can avoid
hitting long_add() entirely, and avoid all the checks around it. IADD
would be IADD as opposed to NADD because it would add two ints
specifically, not two numbers. And it would do no overflow checks
because the JIT already told it no overflow can happen.

Brett Cannon

unread,
Jul 1, 2017, 12:53:37 PM7/1/17
to Koos Zevenhoven, python-ideas
I think Oleg was more responding to the fact that Soni said "CPython should" do something. Phrasing it that way comes off as demanding instead of just sharing an idea. Oleg tried to turn it around and point out that if Soni thinks this should happen then he should be ready to contribute the work to see it happen.

-brett


-- Koos

Oleg Broytman

unread,
Jul 1, 2017, 1:18:02 PM7/1/17
to python...@python.org
Hi, All!

On Sat, Jul 01, 2017 at 04:22:31PM +0000, Brett Cannon <br...@python.org> wrote:
> On Fri, Jun 30, 2017, 10:38 Koos Zevenhoven, <k7h...@gmail.com> wrote:
> > On Jun 30, 2017 5:16 PM, "Oleg Broytman" <p...@phdru.name> wrote:
> >
> > On Fri, Jun 30, 2017 at 12:09:52PM -0300, "Soni L." <faked...@gmail.com>
> > wrote:
> > > CPython should get a
> >
> > You're welcome to create one. Go on, send your pull requests!
> >
> > But if you are planning to do that, it is still a good idea to ask for
> > feedback here first. That will increase the chances of acceptance by a lot.
> > Also, it doesn't necessarily need to be your own idea :)
>
> I think Oleg was more responding to the fact that Soni said "CPython
> should" do something. Phrasing it that way comes off as demanding instead

Exactly!

> of just sharing an idea. Oleg tried to turn it around and point out that if
> Soni thinks this should happen then he should be ready to contribute the
> work to see it happen.

I think the sentence "Python should have <some big and complex to
implement feature>" should be ;-) forbidden if it is not followed with
"I'm in the middle of development. Expect the 1st PR in <a short timeframe>."

Python can only have features that You, the <User>, implemented (or
paid for) and submitted.

> -brett
>
> > -- Koos

Oleg.
--
Oleg Broytman http://phdru.name/ p...@phdru.name
Programmers don't die, they just GOSUB without RETURN.

Nick Timkovich

unread,
Jul 1, 2017, 1:36:30 PM7/1/17
to python-ideas
On Sat, Jul 1, 2017 at 1:17 PM, Oleg Broytman <p...@phdru.name> wrote:
   I think the sentence "Python should have <some big and complex to
implement feature>" should be ;-) forbidden if it is not followed with
"I'm in the middle of development. Expect the 1st PR in <a short timeframe>."

   Python can only have features that You, the <User>, implemented (or
paid for) and submitted.

Devil's advocate: why prepare a patch and submit it if it is going to be dismissed out of hand. Trying to gauge support for the idea is a reasonable first-step.

Devil's devil's advocate: if it has value, it could stand on it's own and gain it's own group of supporters as a CPython fork.

Nick

Paul Moore

unread,
Jul 1, 2017, 5:52:50 PM7/1/17
to Nick Timkovich, python-ideas
On 1 July 2017 at 18:35, Nick Timkovich <promet...@gmail.com> wrote:
> Devil's advocate: why prepare a patch and submit it if it is going to be
> dismissed out of hand. Trying to gauge support for the idea is a reasonable
> first-step.

That's perfectly OK, but it's important to phrase the email in a way
that makes that clear - "I'm considering putting together a PR for
Python to implement X. Does that sound like a good idea, or does
anyone have suggestions for potential issues I might consider? Also,
is there any prior work in this area that I should look into?"

"Python should have X" implies (a) that you are criticising the python
developers for missing that feature out, (b) that you consider your
position self-evident, and (c) that you expect someone to implement
it.

People have different ways of expressing themselves, so we should all
be prepared to allow some leeway in how people put their ideas across.
But the writer has some responsibility for the tone, too.

Paul

Victor Stinner

unread,
Jul 1, 2017, 6:35:45 PM7/1/17
to Soni L., python-ideas
Let's say that you have a function "def mysum (x; y): return x+y", do you always want to use your new IADD instruction here? What if I call mysum ("a", "b")?

Victor

Soni L.

unread,
Jul 1, 2017, 6:54:05 PM7/1/17
to Victor Stinner, python-ideas
Let's say that you do. Given how short it is, it would just get inlined.
Your call of mysum ("a", "b") would indeed not use IADD, nor would it be
a call. It would potentially not invoke any operators, but instead get
replaced with "ab".

When you have a tracing JIT, you can do away with a lot of overhead. You
can inline functions, variables, do away with typechecks, and many other
things. This holds true even if that JIT never emits a single byte of
machine code.

Chris Angelico

unread,
Jul 1, 2017, 7:33:41 PM7/1/17
to python-ideas
On Sun, Jul 2, 2017 at 8:52 AM, Soni L. <faked...@gmail.com> wrote:
> On 2017-07-01 07:34 PM, Victor Stinner wrote:
>>
>> Let's say that you have a function "def mysum (x; y): return x+y", do you
>> always want to use your new IADD instruction here? What if I call mysum
>> ("a", "b")?
>>
>> Victor
>
>
> Let's say that you do. Given how short it is, it would just get inlined.
> Your call of mysum ("a", "b") would indeed not use IADD, nor would it be a
> call. It would potentially not invoke any operators, but instead get
> replaced with "ab".
>
> When you have a tracing JIT, you can do away with a lot of overhead. You can
> inline functions, variables, do away with typechecks, and many other things.
> This holds true even if that JIT never emits a single byte of machine code.

Let's try a more complicated example.

# demo.py
def mysum(x, y):
return x + y

def do_stuff(a, b):
print(mysum("foo", "bar"))
print(mysum(5, 7))
print(mysum(a, 42))
print(mysum(b, "spam"))


What can you optimize here? Now let's look at a file that might call it:

# cruel.py
import demo

def nasty(x, y):
demo.mysum = random.choice([
lambda x, y: x + y,
lambda x, y: f"{x} + f{y}",
lambda x, y: "muahahaha",
])
return Ellipsis

demo.mysum = nasty
demo.do_stuff("what", "now?")

Unless you can prove that this doesn't happen, you can't really
optimize much of mysum away. That's where a tracing JIT compiler has
the advantage: it can notice *at run time* that you're not doing this
kind of thing, and in effect, forfeit the optimizations when you're
running your tests (since test suites are where this kind of
monkey-patching tends to happen).

ChrisA

rym...@gmail.com

unread,
Jul 1, 2017, 10:58:52 PM7/1/17
to Soni L., Victor Stinner, python-ideas
This is literally PyPy. There's little reason for something like this to end up in official CPython, at least for now.


--
Ryan (ライアン)
Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone else
http://refi64.com
On Jul 1, 2017 at 5:53 PM, <Soni L.> wrote:



On 2017-07-01 07:34 PM, Victor Stinner wrote:
> Let's say that you have a function "def mysum (x; y): return x+y", do
> you always want to use your new IADD instruction here? What if I call
> mysum ("a", "b")?
>
> Victor

Let's say that you do. Given how short it is, it would just get inlined.
Your call of mysum ("a", "b") would indeed not use IADD, nor would it be
a call. It would potentially not invoke any operators, but instead get
replaced with "ab".

When you have a tracing JIT, you can do away with a lot of overhead. You
can inline functions, variables, do away with typechecks, and many other
things. This holds true even if that JIT never emits a single byte of
machine code.

Soni L.

unread,
Jul 1, 2017, 11:15:51 PM7/1/17
to rym...@gmail.com, Victor Stinner, python-ideas



On 2017-07-01 11:57 PM, rym...@gmail.com wrote:
This is literally PyPy. There's little reason for something like this to end up in official CPython, at least for now.

It's literally not PyPy. PyPy's internal bytecode, for one, does have typechecks. And PyPy emits machine code, which is not something I wanna deal with because you shouldn't need to write a C compiler AND a whole assembly backend just to port python to a new CPU architecture. A C compiler should be enough.

Steven D'Aprano

unread,
Jul 2, 2017, 1:43:24 AM7/2/17
to python...@python.org
On Sat, Jul 01, 2017 at 07:52:55PM -0300, Soni L. wrote:
>
>
> On 2017-07-01 07:34 PM, Victor Stinner wrote:
> >Let's say that you have a function "def mysum (x; y): return x+y", do
> >you always want to use your new IADD instruction here? What if I call
> >mysum ("a", "b")?
> >
> >Victor
>
> Let's say that you do. Given how short it is, it would just get inlined.
> Your call of mysum ("a", "b") would indeed not use IADD, nor would it be
> a call. It would potentially not invoke any operators, but instead get
> replaced with "ab".

What you are describing sounds more like the output of a keyhole
optimizer that folds constants, only extended to look inside functions.
I expect that it would have to be a VERY clever optimizer, since it
would have to do a complete whole-of-program static analysis to be sure
that mysum has not been replaced, shadowed or redefined by the time it
is called.

I won't say that is outright impossible, but it would be *extremely*
hard to do something like that at compile time.


> When you have a tracing JIT, you can do away with a lot of overhead. You
> can inline functions, variables, do away with typechecks, and many other
> things. This holds true even if that JIT never emits a single byte of
> machine code.

What you are describing sounds more like an "Ahead Of Time" (AOT)
compiler to me. Especially the part about doing away with typechecks. As
far as I know you can really only do away with typechecks or other
guards if you know ahead of time (at compile time) what the types of
values are, and that requires static typing.



--
Steve

Chris Angelico

unread,
Jul 2, 2017, 1:53:38 AM7/2/17
to python-ideas
On Sun, Jul 2, 2017 at 3:41 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>> Let's say that you do. Given how short it is, it would just get inlined.
>> Your call of mysum ("a", "b") would indeed not use IADD, nor would it be
>> a call. It would potentially not invoke any operators, but instead get
>> replaced with "ab".
>
> What you are describing sounds more like the output of a keyhole
> optimizer that folds constants, only extended to look inside functions.
> I expect that it would have to be a VERY clever optimizer, since it
> would have to do a complete whole-of-program static analysis to be sure
> that mysum has not been replaced, shadowed or redefined by the time it
> is called.
>
> I won't say that is outright impossible, but it would be *extremely*
> hard to do something like that at compile time.

Isn't that the sort of thing that the "versioned globals dictionary"
was supposed to do? If your globals haven't changed, you know that the
optimizer was correct.

But that's still a hard problem. Or at very least, it's decidedly
non-trivial, and the costs are significant, so the net benefits aren't
proven.

ChrisA

Steven D'Aprano

unread,
Jul 2, 2017, 8:14:14 AM7/2/17
to python...@python.org
On Sun, Jul 02, 2017 at 03:52:34PM +1000, Chris Angelico wrote:
> On Sun, Jul 2, 2017 at 3:41 PM, Steven D'Aprano <st...@pearwood.info> wrote:
> >> Let's say that you do. Given how short it is, it would just get inlined.
> >> Your call of mysum ("a", "b") would indeed not use IADD, nor would it be
> >> a call. It would potentially not invoke any operators, but instead get
> >> replaced with "ab".
> >
> > What you are describing sounds more like the output of a keyhole
> > optimizer that folds constants, only extended to look inside functions.
> > I expect that it would have to be a VERY clever optimizer, since it
> > would have to do a complete whole-of-program static analysis to be sure
> > that mysum has not been replaced, shadowed or redefined by the time it
> > is called.
> >
> > I won't say that is outright impossible, but it would be *extremely*
> > hard to do something like that at compile time.
>
> Isn't that the sort of thing that the "versioned globals dictionary"
> was supposed to do? If your globals haven't changed, you know that the
> optimizer was correct.

That only solves the problem of mysum being modified, not whether the
arguments are ints. You still need to know whether it is safe to call
some low-level (fast) integer addition routine, or whether you have to
go through the (slow) high-level Python code.

In any case, guards are a kind of runtime check. It might not be an
explicit isinstance() check, but it is logically implies one. If x was
an int, and nothing has changed, then x is still an int.

If Victor is around, he might like to comment on how his FAT Python
handles this.

> But that's still a hard problem. Or at very least, it's decidedly
> non-trivial, and the costs are significant, so the net benefits aren't
> proven.

In fairness, they are proven for other languages, and they certainly
worked for things like Psyco. So this isn't completely pie-in-the-sky
dreaming.



--
Steve

Chris Angelico

unread,
Jul 2, 2017, 8:19:02 AM7/2/17
to python-ideas
On Sun, Jul 2, 2017 at 10:13 PM, Steven D'Aprano <st...@pearwood.info> wrote:
>> But that's still a hard problem. Or at very least, it's decidedly
>> non-trivial, and the costs are significant, so the net benefits aren't
>> proven.
>
> In fairness, they are proven for other languages, and they certainly
> worked for things like Psyco. So this isn't completely pie-in-the-sky
> dreaming.

Yeah. It's the realm of "let's put in some solid research, then do
some proof of concept work, and maybe it'll be worth going ahead with"
- it's not "Python should be able to optimize this away".

ChrisA

Soni L.

unread,
Jul 2, 2017, 9:33:49 AM7/2/17
to python...@python.org


On 2017-07-02 02:41 AM, Steven D'Aprano wrote:
> On Sat, Jul 01, 2017 at 07:52:55PM -0300, Soni L. wrote:
>>
>> On 2017-07-01 07:34 PM, Victor Stinner wrote:
>>> Let's say that you have a function "def mysum (x; y): return x+y", do
>>> you always want to use your new IADD instruction here? What if I call
>>> mysum ("a", "b")?
>>>
>>> Victor
>> Let's say that you do. Given how short it is, it would just get inlined.
>> Your call of mysum ("a", "b") would indeed not use IADD, nor would it be
>> a call. It would potentially not invoke any operators, but instead get
>> replaced with "ab".
> What you are describing sounds more like the output of a keyhole
> optimizer that folds constants, only extended to look inside functions.
> I expect that it would have to be a VERY clever optimizer, since it
> would have to do a complete whole-of-program static analysis to be sure
> that mysum has not been replaced, shadowed or redefined by the time it
> is called.

Runtime. Not static. This is the same kind of stuff LuaJIT (and any
other JIT) does.

>
> I won't say that is outright impossible, but it would be *extremely*
> hard to do something like that at compile time.
>
>
>> When you have a tracing JIT, you can do away with a lot of overhead. You
>> can inline functions, variables, do away with typechecks, and many other
>> things. This holds true even if that JIT never emits a single byte of
>> machine code.
> What you are describing sounds more like an "Ahead Of Time" (AOT)
> compiler to me. Especially the part about doing away with typechecks. As
> far as I know you can really only do away with typechecks or other
> guards if you know ahead of time (at compile time) what the types of
> values are, and that requires static typing.
>
>

You can do that at runtime with a JIT and flushing the JIT cache when
your assumptions (guards) change. (altho in reality you wouldn't flush
the whole JIT cache because that'd be expensive.)

Victor Stinner

unread,
Jul 3, 2017, 7:49:26 AM7/3/17
to Steven D'Aprano, python-ideas
2017-07-02 14:13 GMT+02:00 Steven D'Aprano <st...@pearwood.info>:
> That only solves the problem of mysum being modified, not whether the
> arguments are ints. You still need to know whether it is safe to call
> some low-level (fast) integer addition routine, or whether you have to
> go through the (slow) high-level Python code.
>
> In any case, guards are a kind of runtime check. It might not be an
> explicit isinstance() check, but it is logically implies one. If x was
> an int, and nothing has changed, then x is still an int.
>
> If Victor is around, he might like to comment on how his FAT Python
> handles this.

FAT Python design is generic, you are free to implement any kind of
check. A check is an object which provides a C callback.

FAT Python provides the following guards:

GuardArgType
GuardBuiltins
GuardDict
GuardFunc
GuardGlobals

About "mysym being modified", it's handled by this guard:

http://fatoptimizer.readthedocs.io/en/latest/fat.html#GuardFunc

Right now, only the func.__code__ is watched. It's not enought, but
it's a compromise to keep my implementation simple :-)

Tomorrow, if FAT Python becomes a real thing, the builtin function
type can be modified to add a version as we have for dictionaries, and
the version will be increased for any function modification: argument
defaults, arguments, name, etc.

We would only have to modify GuardFunc implementation, not users of this guard.

To really respect the Python semantics, guards became more complex
than expected. GuardBuiltins doesn't only check that len() is still
the same function in builtins. It only has to the globals name of the
current frame, globals()[name] and builtins of the current frame.
Python allows crazy stuff like running a single function with custom
builtin functions: see exec() function.

Victor
Reply all
Reply to author
Forward
0 new messages