Python Front-end to GCC

Philip Herron

unread,

Oct 20, 2013, 1:56:46 PM10/20/13

to

Hey,

I've been working on GCCPY since roughly november 2009 at least in its
concept. It was announced as a Gsoc 2010 project and also a Gsoc 2011
project. I was mentored by Ian Taylor who has been an extremely big
influence on my software development carrer.

Gccpy is an Ahead of time implementation of Python ontop of GCC. So it
works as you would expect with a traditional compiler such as GCC to
compile C code. Or G++ to compile C++ etc.

Whats interesting and deserves a significant mention is my work is
heavily inspired by Paul Biggar's phd thesis on optimizing dynamic
languages and his work on PHC a ahead of time php compiler. I've had
so many ups and down in this project and i need to thank Andi Hellmund
for his contributions to the project.
http://paulbiggar.com/research/#phd-dissertation

The project has taken so many years as an in my spare time project to
get to this point. I for example its taken me so long simply to
understand a stabilise the core fundamentals for the compiler and how
it could all work.

The release can be found here. I will probably rename the tag to the
milestone (lucy) later on.
https://github.com/redbrain/gccpy/releases/tag/v0.1-24
(Lucy is our dog btw, German Shepard (6 years young) loves to lick
your face off :) )

Documentation can be found http://gcc.gnu.org/wiki/PythonFrontEnd.
(Although this is sparse partialy on purpose since i do not wan't
people thinking this is by any means ready to compile real python
applications)

I've found some good success with this project in compiling python
though its largely unknown to the world simply because i am nervous of
the compiler and more specifically the python compiler world.

But at least to me there is at least to me an un-answered question in
current compiler implementations. AOT vs Jit.

Is a jit implementation of a language (not just python) better than
traditional ahead of time compilation.

What i can say is ahead of time at least strips out the crap needed
for the users code to be run. As in people are forgetting the basics
of how a computer works in my opinion when it comes to making code run
faster. Simply need to reduce the number of instructions that need to
be executed in order to preform what needs to be done. Its not about
Jit and bla bla keyword llvm keyword instruction scheduling keyword
bla.

I could go into the arguments but i feel i should let the project
speak for itself its very immature so you really cant compare it to
anything like it but it does compile little bits and bobs fairly well
but there is much more work needed.

There is nothing at steak, its simply an idea provoked from a great
phd thesis and i want to see how it would work out. I don't get funded
of paid. I love working on compilers and languages but i don't have a
day job doing it so its my little pet to open source i believe its at
least worth some research.

I would really like to hear the feedback good and bad. I can't
describe how much work i've put into this and how much persistence
I've had to have in light of recent reddit threads talking about my
project.

I have so many people to thank to get to this point! Namely Ian
Taylor, Paul Biggar, Andi Hellmund, Cyril Roelandt Robert Bradshaw,
PyBelfast, and the Linux Outlaws community. I really couldn't have got
to this point in my life without the help of these people!

Thanks!

--Phil

victorg...@gmail.com

unread,

Oct 20, 2013, 6:10:20 PM10/20/13

to

On Sunday, October 20, 2013 3:56:46 PM UTC-2, Philip Herron wrote:
> I've been working on GCCPY since roughly november 2009 at least in its
> concept. It was announced as a Gsoc 2010 project and also a Gsoc 2011
> project. I was mentored by Ian Taylor who has been an extremely big
> influence on my software development carrer.

Cool!

> Documentation can be found http://gcc.gnu.org/wiki/PythonFrontEnd.
> (Although this is sparse partialy on purpose since i do not wan't
> people thinking this is by any means ready to compile real python
> applications)

Is there any document describing what it can already compile and, if possible, showing some benchmarks?

> But at least to me there is at least to me an un-answered question in
> current compiler implementations. AOT vs Jit.
>
> Is a jit implementation of a language (not just python) better than
> traditional ahead of time compilation.
>
> What i can say is ahead of time at least strips out the crap needed
> for the users code to be run. As in people are forgetting the basics
> of how a computer works in my opinion when it comes to making code run
> faster. Simply need to reduce the number of instructions that need to
> be executed in order to preform what needs to be done. Its not about
> Jit and bla bla keyword llvm keyword instruction scheduling keyword
> bla.

Maybe a less agressive tone (and a lot more research before going into sweeping statements that do nothing to further your own project) could result in a better level of discussion?

> I could go into the arguments but i feel i should let the project
> speak for itself its very immature so you really cant compare it to
> anything like it but it does compile little bits and bobs fairly well
> but there is much more work needed.

Can you compare it to Nuitka (http://nuitka.net/), ShedSkin (http://nuitka.net/) and Pythran (http://pythonhosted.org/pythran/) when you think it's mature enough? These projects have test suits you can borrow to chart you progress along the full-language support road.

It'd be good to find a place for your project on http://compilers.pydata.org/ , as soon as you better describe its workings.

> There is nothing at steak, its simply an idea provoked from a great
> phd thesis and i want to see how it would work out. I don't get funded
> of paid. I love working on compilers and languages but i don't have a
> day job doing it so its my little pet to open source i believe its at
> least worth some research.

It's very interesting indeed. Congratulations and thank you for your work.

Victor

Mark Janssen

unread,

Oct 20, 2013, 11:35:03 PM10/20/13

to Philip Herron, Python List

> Gccpy is an Ahead of time implementation of Python ontop of GCC. So it
> works as you would expect with a traditional compiler such as GCC to
> compile C code. Or G++ to compile C++ etc.

That is amazing. I was just talking about how someone should make a
front-end to GCC on this list a couple of months ago. Awesome!

> Documentation can be found http://gcc.gnu.org/wiki/PythonFrontEnd.
> (Although this is sparse partialy on purpose since i do not wan't
> people thinking this is by any means ready to compile real python
> applications)

What's missing?

> I've found some good success with this project in compiling python
> though its largely unknown to the world simply because i am nervous of
> the compiler and more specifically the python compiler world.
>
> But at least to me there is at least to me an un-answered question in
> current compiler implementations. AOT vs Jit.
>
> Is a jit implementation of a language (not just python) better than
> traditional ahead of time compilation.

Not at all. The value of jit compilation, I believe, is purely for
the dynamic functionality that it allows. AOT compilation will never
allow that, but in return you get massive performance and runtime-size
gains (that is, you don't need a massive interpreter environment
anymore!) If your compiler produces an executable program without the
need for the python interpreter environment: Two major wins.

> What i can say is ahead of time at least strips out the crap needed
> for the users code to be run. As in people are forgetting the basics
> of how a computer works in my opinion when it comes to making code run
> faster.

Agreed.

> I could go into the arguments but i feel i should let the project
> speak for itself its very immature so you really cant compare it to
> anything like it but it does compile little bits and bobs fairly well
> but there is much more work needed.

I wish I had the resources to try it myself, but would love to see
some performance numbers (say factorizations, or bubble-sorts, etc).
Also runtime executable sizes.

> I would really like to hear the feedback good and bad. I can't
> describe how much work i've put into this and how much persistence
> I've had to have in light of recent reddit threads talking about my
> project.

Please reference threads in question, would like to see the issues raised.
--
MarkJ
Tacoma, Washington

Steven D'Aprano

unread,

Oct 21, 2013, 3:46:07 AM10/21/13

to

On Sun, 20 Oct 2013 20:35:03 -0700, Mark Janssen wrote:

[Attribution to the original post has been lost]

>> Is a jit implementation of a language (not just python) better than
>> traditional ahead of time compilation.
>
> Not at all. The value of jit compilation, I believe, is purely for the
> dynamic functionality that it allows. AOT compilation will never allow
> that, but in return you get massive performance and runtime-size gains

On the contrary, you have that backwards. An optimizing JIT compiler can
often produce much more efficient, heavily optimized code than a static
AOT compiler, and at the very least they can optimize different things
than a static compiler can. This is why very few people think that, in
the long run, Nuitka can be as fast as PyPy, and why PyPy's ultimate aim
to be "faster than C" is not moonbeams:

http://morepypy.blogspot.com.au/2011/02/pypy-faster-than-c-on-carefully-crafted.html

http://morepypy.blogspot.com.au/2011/08/pypy-is-faster-than-c-again-string.html

See here for a discussion on what JIT compilers can do which static
compilers can't:

http://en.wikipedia.org/wiki/Just-in-time_compilation

Of course, the disadvantage of a JIT compiler is that there's a certain
amount of runtime overhead: it takes time to analyse the running code,
and more time to compile it on the fly. This is why, for example, PyPy
likely won't be as fast for really short-running scripts as a static
compiler could be. Again, see the Wikipedia article.

JIT compilation is really about optimization, which is why languages like
Java and .NET which could easily be compiled to machine code at compile
time generally use an intermediate bytecode and a JIT compiler instead.
They're not doing it for dynamism since they aren't dynamic languages.
It's a way of generating more aggressive (i.e. better but harder)
optimizations based on information only available at runtime.

--
Steven

Oscar Benjamin

unread,

Oct 21, 2013, 5:55:10 AM10/21/13

to Steven D'Aprano, Python List

On 21 October 2013 08:46, Steven D'Aprano <st...@pearwood.info> wrote:
> On Sun, 20 Oct 2013 20:35:03 -0700, Mark Janssen wrote:
>
> [Attribution to the original post has been lost]
>>> Is a jit implementation of a language (not just python) better than
>>> traditional ahead of time compilation.
>>
>> Not at all. The value of jit compilation, I believe, is purely for the
>> dynamic functionality that it allows. AOT compilation will never allow
>> that, but in return you get massive performance and runtime-size gains
>
> On the contrary, you have that backwards. An optimizing JIT compiler can
> often produce much more efficient, heavily optimized code than a static
> AOT compiler, and at the very least they can optimize different things
> than a static compiler can. This is why very few people think that, in
> the long run, Nuitka can be as fast as PyPy, and why PyPy's ultimate aim
> to be "faster than C" is not moonbeams:

That may be true but both the examples below are spurious at best. A
decent AOT compiler would reduce both programs to the NULL program as
noted by haypo:
http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-carefully-crafted.html?showComment=1297205903746#c2530451800553246683

> http://morepypy.blogspot.com.au/2011/02/pypy-faster-than-c-on-carefully-crafted.html
>
> http://morepypy.blogspot.com.au/2011/08/pypy-is-faster-than-c-again-string.html

I just modified the add() example so that none of the operations can
be entirely optimised away:

def main():
i = 0
a = 0.0
b = 0.0
while i < 1000000000:
a += 1.0
b += add(a, a)
i += 1
print(b)

Similarly for the C version:

#include <stdio.h>

double add(double a, double b);

int main()
{
int i = 0;
double a = 0;
double b = 0;
while (i < 1000000000) {
a += 1.0;
b += add(a, a);
i++;
}
printf("%f", b);
}

My timings:

$ gcc -O3 x.c y.c
$ time ./a.exe
1000000000134218000.000000
real 0m5.609s
user 0m0.015s
sys 0m0.000s
$ time pypy y.py
1.00000000013e+18

real 0m9.891s
user 0m0.060s
sys 0m0.061s

So the pypy version takes twice as long to run this. That's impressive
but it's not "faster than C".

I also compared a script that uses intensive decimal computation run
under CPython 3.3 and PyPy 2.1 (pyver 2.7). This is essentially a
comparison between the C implementation of the decimal module and
pypy's jit'd optimisation of the pure Python module. CPython 3.3 takes
10 seconds and PyPy 2.1 takes 45 seconds. Again that's impressive (a
lot of work went into making the C implementation of the decimal
module as fast as it is) but it's not faster than C.

I don't mean to criticise PyPy. I've just tested it and I am impressed
and I think I'll definitely try and use it where possible. I do think
that some of the marketing there is misleading though.

Oscar

Philip Herron

unread,

Oct 21, 2013, 7:08:13 AM10/21/13

to

Hey all,

Thanks, i've been working on this basically on my own 95% of the compiler is all my code, in my spare time. Its been fairly scary all of this for me. I personally find this as a real source of interest to really demystify compilers and really what Jit compilation really is under the hood.

For example if your really interested to see what jit compilation really is (not magic) look at:
http://compilers.iecc.com/comparch/article/10-03-063

And i really believe in this project. Though i really want to say away from comparing my project to any others out there. As they are much more mature than mine.

For example its only a few commits back your successfully able to compile:

import sys

So i think its fairly unfair comparisons could get me into trouble. Whats taken so long is i do not reuse the python runtime like the others. Other projects do this to maintain computability mostly. But i wanted to test doing this entirely from scratch partly for my own interest and that the python runtime was designed for an interpreter, not compilers at least ahead of time.

Its interesting a few things come up what about:

exec and eval. I didn't really have a good answer for this at my talk at PYCon IE 2013 but i am going to say no. I am not going to implement these. Partly because eval and exec at least to me are mostly from developing interpreters as a debugging exercise so the test doesn't have to invoke the program properly and feed in strings to interpret at least thats what i have done in the past with an virtual machine i wrote before gccpy.

I think anything you look up on eval and exec you shouldn't use them in production code as it could lead to a lot of issues. I can see their use in quick dirty uses but i don't think they really are part of the python language in my opinion its just some builtin functions that you can invoke. Another issue i had was:

class myClass: pass

I currently don't allow this but i've re thought this and i am going to try and implement this properly before i had a really complicated to pass to quess the type of a struct but i realise currently with how things work this may not be the case.

As a personal comment i think this is kind of funny as why not use a dict. But i can see the use for it now, and i want to implement it.

What i will say is gccpy is moving along with each milestone i will achieve over the course of the first half of 2014 i reckon i could at least compile half of a decnt python test suite. Its taken me so long to get used to the GCC code base and how i want to implement things i've re-wrote the runtime and the compiler probably at least 4 times and i am going to rewrite part of the runtime today for the next week or so. I think tis a worth while project.

I don't think this will ever be on par with PyPy or CPython as professional as those projects are i really respect their work i mean i look up to them (loads) i am just a guy who likes compilers and it isnt my day job i don't get paid for this i just enjoy the challenge and i hope you understand that this is my baby and i am very protective of my project :).

I hope in a few months when i start compiling testsuiites i will publish benchmarks what i will say is it looks pretty good right now only 1 case so far (that i have tested) where i am not faster than CPython and its only because i havent been compiling the runtime library with correct CFLAGS like -O2 etc i wasnt passing anything another is i have tonnnes of debugging malloc all over the show slowing it down because i need to rewrite part of the runtime so yeah. I've been fighting GCC for 4 years now i am fighting my own code ;).

Final note i am not saying JIT is bad or not the way to do things i personally think this question isn't answered and i can see the case for it there are down sides that jit people wont talk about.

The complexity of maintaining a j it in project is probably the main one and optimisations at runtime make it even more mental as they are hard to do is an under statement never mind they aren't as common as you would want to believe outside of target specifics which gcc already knows (-mtune*). I do believe JIT is the way forward but i think something in languages needs to really be designed from that perspective and maybe even cpu's to help with some kind of instructions to help maintain a destination between runtime and user code or something (making tail call optimisations easier on dynamic languages) maybe?

Thanks.

Mark Janssen

unread,

Oct 21, 2013, 4:14:51 PM10/21/13

to Steven D'Aprano, Python List

On Mon, Oct 21, 2013 at 12:46 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> On Sun, 20 Oct 2013 20:35:03 -0700, Mark Janssen wrote:
>
> [Attribution to the original post has been lost]
>>> Is a jit implementation of a language (not just python) better than
>>> traditional ahead of time compilation.
>>
>> Not at all. The value of jit compilation, I believe, is purely for the
>> dynamic functionality that it allows. AOT compilation will never allow
>> that, but in return you get massive performance and runtime-size gains
>
> On the contrary, you have that backwards. An optimizing JIT compiler can
> often produce much more efficient, heavily optimized code than a static
> AOT compiler,

This is horseshit.

> and at the very least they can optimize different things
> than a static compiler can.

Okay sure. But now you've watered down your claim that's it's not
saying much of anything.

> This is why very few people think that, in
> the long run, Nuitka can be as fast as PyPy, and why PyPy's ultimate aim
> to be "faster than C" is not moonbeams:

It is moonbeams, but that's a good thing. I think you don't
understand how computers work, Steven.
In any event, PyPy is a great project for those who want experiment
with compiler and language design.

> JIT compilation is really about optimization,

No.

> which is why languages like
> Java and .NET which could easily be compiled to machine code at compile
> time generally use an intermediate bytecode and a JIT compiler instead.
> They're not doing it for dynamism since they aren't dynamic languages.
> It's a way of generating more aggressive (i.e. better but harder)
> optimizations based on information only available at runtime.

This must is true, but not for reasons you understand.

--
MarkJ
Tacoma, Washington

Mark Janssen

unread,

Oct 21, 2013, 4:26:06 PM10/21/13

to Philip Herron, Python List

On Mon, Oct 21, 2013 at 4:08 AM, Philip Herron
<herron...@googlemail.com> wrote:
> Thanks, i've been working on this basically on my own 95% of the compiler is all my code, in my spare time. Its been fairly scary all of this for me. I personally find this as a real source of interest to really demystify compilers and really what Jit compilation really is under the hood.

So I'm curious, not having looked at your code, are you just
translating python code into C code to make your front-end to gcc?
Like converting "[1,2,3]" into a C linked-list data structure and
making 1 an int (or BigNum?)?

--
MarkJ
Tacoma, Washington

Ned Batchelder

unread,

Oct 21, 2013, 4:29:36 PM10/21/13

to pytho...@python.org, Mark Janssen

On 10/21/13 4:14 PM, Mark Janssen wrote:
> On Mon, Oct 21, 2013 at 12:46 AM, Steven D'Aprano <st...@pearwood.info> wrote:

>> On Sun, 20 Oct 2013 20:35:03 -0700, Mark Janssen wrote:
>>
>> [Attribution to the original post has been lost]
>>>> Is a jit implementation of a language (not just python) better than
>>>> traditional ahead of time compilation.
>>> Not at all. The value of jit compilation, I believe, is purely for the
>>> dynamic functionality that it allows. AOT compilation will never allow
>>> that, but in return you get massive performance and runtime-size gains
>> On the contrary, you have that backwards. An optimizing JIT compiler can
>> often produce much more efficient, heavily optimized code than a static
>> AOT compiler,

> This is horseshit.

Surely we can discuss this without resorting to swearing.

>> and at the very least they can optimize different things
>> than a static compiler can.

> Okay sure. But now you've watered down your claim that's it's not
> saying much of anything.
>

>> This is why very few people think that, in
>> the long run, Nuitka can be as fast as PyPy, and why PyPy's ultimate aim
>> to be "faster than C" is not moonbeams:

> It is moonbeams, but that's a good thing. I think you don't
> understand how computers work, Steven.

I think Steven has earned the benefit of the doubt when it comes to
understanding computer. Did you read the links he provided? Do you
believe that author also doesn't understand how computers work?

Mark, disagreement doesn't have to lead to ad-hominem attacks. If you
think Steven is wrong about what JIT compilers can do, please explain.
Simply calling his statements horseshit and saying he doesn't understand
computers is not a rebuttal.

> In any event, PyPy is a great project for those who want experiment
> with compiler and language design.
>

>> JIT compilation is really about optimization,

> No.

Please tell us what JIT compilation is about.

>
>> which is why languages like
>> Java and .NET which could easily be compiled to machine code at compile
>> time generally use an intermediate bytecode and a JIT compiler instead.
>> They're not doing it for dynamism since they aren't dynamic languages.
>> It's a way of generating more aggressive (i.e. better but harder)
>> optimizations based on information only available at runtime.

> This must is true, but not for reasons you understand.
>

Again, a statement with no supporting information.

*sigh*

If someone says something that seems wrong to you, there are a number of
possibilities: 1) you are wrong, 2) they are wrong, or 3) you haven't
understood each other yet. Please have the humility to at least
consider option 3 first.

--Ned.

Philip Herron

unread,

Oct 21, 2013, 5:03:51 PM10/21/13

to

No its not like those 'compilers' i dont really agree with a compiler generating C/C++ and saying its producing native code. I dont really believe its truely within the statement. Compilers that do that tend to put in alot of type saftey code and debugging internals at a high level to get things working in other projects i am not saying python compilers here i havent analysed enough to say this.

What i mean as a front-end is jut like GCC G++ gccgo gfortran it all works the same each of these are front-ends you can pass all those mental gcc options like -O3 -mtune -fblabla. it is implemented as part of gcc and you can 'bootstrap python'. You can -fdump-tree-all etc.

What i can say is jit compilation is really mistified' in a big way when it comes to projects like pypy when its implemented in python how can it call mmap to make an executable memory block etc. When it comes to compilation i think it gets fairly mistified in the python compiler community even more.

Mark Janssen

unread,

Oct 21, 2013, 7:04:03 PM10/21/13

to Philip Herron, Python List

> No its not like those 'compilers' i dont really agree with a compiler generating C/C++ and saying its producing native code. I dont really believe its truely within the statement. Compilers that do that tend to put in alot of type saftey code and debugging internals at a high level to get things working in other projects i am not saying python compilers here i havent analysed enough to say this.

Hmm, well what I'd personally find interesting from a computer science
point of view is a app that will take a language specification in BNF
(complete with keywords and all) and output C code which is then
compiled to an executable as normal. This is how a front-end should
be designed. A middle-layer for translating common language elements
like lists, sets, etc, could make it easy.

--
MarkJ
Tacoma, Washington

Steven D'Aprano

unread,

Oct 21, 2013, 7:41:29 PM10/21/13

to

On Mon, 21 Oct 2013 10:55:10 +0100, Oscar Benjamin wrote:

> On 21 October 2013 08:46, Steven D'Aprano <st...@pearwood.info> wrote:

>> On the contrary, you have that backwards. An optimizing JIT compiler
>> can often produce much more efficient, heavily optimized code than a
>> static AOT compiler, and at the very least they can optimize different
>> things than a static compiler can. This is why very few people think
>> that, in the long run, Nuitka can be as fast as PyPy, and why PyPy's
>> ultimate aim to be "faster than C" is not moonbeams:
>
> That may be true but both the examples below are spurious at best. A
> decent AOT compiler would reduce both programs to the NULL program as
> noted by haypo:
> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-carefully-
crafted.html?showComment=1297205903746#c2530451800553246683

Are you suggesting that gcc is not a decent compiler? If "optimize away
to the null program" is such an obvious thing to do, why doesn't the most
popular C compiler in the [FOSS] world do it?

[...]

> So the pypy version takes twice as long to run this. That's impressive
> but it's not "faster than C".

Nobody is saying that PyPy is *generally* capable of making any arbitrary
piece of code run as fast as hand-written C code. You'll notice that the
PyPy posts are described as *carefully crafted* examples.

I believe that, realistically, PyPy has potential to bring Python into
Java and .Net territories, namely to run typical benchmarks within an
order of magnitude of C speeds on the same benchmarks. C is a very hard
target to beat, because vanilla C code does *so little* compared to other
languages: no garbage collection, no runtime dynamism, very little
polymorphism. So benchmarking simple algorithms plays to C's strengths,
while ignoring C's weaknesses.

--
Steven

rusi

unread,

Oct 21, 2013, 11:40:05 PM10/21/13

to

On Tuesday, October 22, 2013 1:59:36 AM UTC+5:30, Ned Batchelder wrote:
> On 10/21/13 4:14 PM, Mark Janssen wrote:
>

> > On Mon, Oct 21, 2013 at 12:46 AM, Steven D'Aprano wrote:
> >> An optimizing JIT compiler can
> >> often produce much more efficient, heavily optimized code than a static
> >> AOT compiler,
> > This is horseshit.
>
> Surely we can discuss this without resorting to swearing.

Thanks Ned for the intervention.
It is good to have 'interesting' and even 'hot' discussions.
Beyond a point they remain hot but not interesting

Piet van Oostrum

unread,

Oct 21, 2013, 11:45:56 PM10/21/13

to

A language specification in BNF is just syntax. It doesn't say anything
about semantics. So how could this be used to produce executable C code
for a program? BNF is used to produce parsers. But a parser isn't
sufficient.
--
Piet van Oostrum <pi...@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

Mark Janssen

unread,

Oct 22, 2013, 12:24:38 AM10/22/13

to Piet van Oostrum, Python List

> A language specification in BNF is just syntax. It doesn't say anything
> about semantics. So how could this be used to produce executable C code
> for a program? BNF is used to produce parsers. But a parser isn't
> sufficient.

A C program is just syntax also. How does the compiler generate
executable machine code? Extrapolate into a Python front-end to C.

--
MarkJ
Tacoma, Washington

Dave Angel

unread,

Oct 22, 2013, 12:39:21 AM10/22/13

to pytho...@python.org

On 22/10/2013 00:24, Mark Janssen wrote:

>> A language specification in BNF is just syntax. It doesn't say anything
>> about semantics. So how could this be used to produce executable C code
>> for a program? BNF is used to produce parsers. But a parser isn't
>> sufficient.
>

> A C program is just syntax also. How does the compiler generate
> executable machine code? Extrapolate into a Python front-end to C.
>

Did you even read the paragraph you quoted above? The BNF specification
does NOT completely describe a language, it only defines its syntax. So
if the only thing you knew about C was its BNF, you could certainly not
write a C compiler. And neither could anyone else. Fortunately for the
C community, the language specification included much more than a BNF
grammar. At a minimum, you have to specify both the syntax and the
semantics.

All that has nothing to do with an actual program written in an
actually defined language, whether C or Python.

--
DaveA

Steven D'Aprano

unread,

Oct 22, 2013, 1:25:46 AM10/22/13

to

Like every other language, C programs are certainly not *just* syntax.
Here is some syntax:

&foo bar^ :=

What machine code should be produced? Right now, you should be saying
"How the hell do I know? What does that line of code *do*???" and you
would be right to ask that question.

Syntax on its own doesn't mean anything. You also need to know the
*semantics* of the code, in other words, *what it means*. That knowledge
is inherent to the compiler: a C compiler "understands" what C code
means, a Pascal compiler "understands" Pascal code, a Forth compiler
"understands" Forth.

Merely parsing code doesn't capture that understanding. Parsing a line
like "foo <= bar" will give you something like this:

NAME "foo"
OPERATOR "<="
NAME "bar"
END_LINE

What does the operator "<=" actually do? What does "foo" represent? Is it
a variable, a constant, a global, a local, something else? What are the
rules for deciding whether a variable is in the local scope, a global
scope, or something else? Does the language even have scopes?

--
Steven

Antoine Pitrou

unread,

Oct 22, 2013, 4:55:15 AM10/22/13

to pytho...@python.org

Philip Herron <herron.philip <at> googlemail.com> writes:
>
> Its interesting a few things come up what about:
>
> exec and eval. I didn't really have a good answer for this at my talk at
PYCon IE 2013 but i am going to say no. I am
> not going to implement these. Partly because eval and exec at least to me
are mostly from developing
> interpreters as a debugging exercise so the test doesn't have to invoke
the program properly and feed in
> strings to interpret at least thats what i have done in the past with an
virtual machine i wrote before gccpy.

If you don't implement exec() and eval() then people won't be able to use
namedtuples, which are a common datatype factory.

As for the rest: well, good luck writing an AOT compiler producing
interesting results on average *pure* Python code. It's already been tried
a number of times, and has generally failed. Cython mitigates the issue by
exposing a superset of Python (including type hints, etc.).

Regards

Antoine.

Philip Herron

unread,

Oct 22, 2013, 5:08:00 AM10/22/13

to

Thanks for that interesting example, i haven't looked into how its implemented but on initially looking at this is am nearly sure i can implement this without using exec or eval. I've found this a lot in implementing my run time. Exec and eval at least to me in the past I've used them as debug hooks into a toy virtual machine i wrote i don't particularly think they are part of a language nor should people really use them.

Thanks

Oscar Benjamin

unread,

Oct 22, 2013, 5:14:16 AM10/22/13

to Steven D'Aprano, Python List

On 22 October 2013 00:41, Steven D'Aprano

<steve+comp....@pearwood.info> wrote:
> On Mon, 21 Oct 2013 10:55:10 +0100, Oscar Benjamin wrote:
>
>> On 21 October 2013 08:46, Steven D'Aprano <st...@pearwood.info> wrote:
>
>>> On the contrary, you have that backwards. An optimizing JIT compiler
>>> can often produce much more efficient, heavily optimized code than a
>>> static AOT compiler, and at the very least they can optimize different
>>> things than a static compiler can. This is why very few people think
>>> that, in the long run, Nuitka can be as fast as PyPy, and why PyPy's
>>> ultimate aim to be "faster than C" is not moonbeams:
>>
>> That may be true but both the examples below are spurious at best. A
>> decent AOT compiler would reduce both programs to the NULL program as
>> noted by haypo:
>> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-carefully-
> crafted.html?showComment=1297205903746#c2530451800553246683
>
> Are you suggesting that gcc is not a decent compiler?

No.

> If "optimize away
> to the null program" is such an obvious thing to do, why doesn't the most
> popular C compiler in the [FOSS] world do it?

It does if you pass the appropriate optimisation setting (as shown in
haypo's comment). I should have been clearer.

gcc compiles programs in two phases: compilation and linking.
Compilation creates the object files x.o and y.o from x.c and y.c.
Linking creates the output binary a.exe from x.o and y.o. The -O3
optimisation setting used in the blog post enables optimisation in the
compilation phase. However each .c file is compiled independently so
because the add() function is defined in x.c and called in y.c the
compiler is unable to inline it. It also can't remove it as dead code
because although it knows that the return value isn't used it doesn't
know if the call has side effects.

You might think it's silly that gcc can't optimise across source files
and if so you're right because actually it can if you enable link time
optimisation with the -flto flag as described by haypo. So if I do
that with the code from the blog post I get (using mingw gcc 4.7.2 on
Windows):

$ cat x.c

double add(double a, double b)

{
return a + b;
}
$ cat y.c

double add(double a, double b);

int main()
{
int i = 0;
double a = 0;

while (i < 1000000000) {
a += 1.0;

add(a, a);
i++;
}
}
$ gcc -O3 -flto x.c y.c
$ time ./a.exe

real 0m0.063s
user 0m0.015s
sys 0m0.000s
$ time ./a.exe # warm cache

real 0m0.016s
user 0m0.015s
sys 0m0.015s

So gcc can optimise this all the way to the null program which takes
15ms to run (that's 600 times faster than pypy).

Note that even if pypy could optimise it all the way to the null
program it would still be 10 times slower than C's null program:

$ touch null.py
$ time pypy null.py

real 0m0.188s
user 0m0.076s
sys 0m0.046s
$ time pypy null.py # warm cache

real 0m0.157s
user 0m0.060s
sys 0m0.030s

> [...]
>> So the pypy version takes twice as long to run this. That's impressive
>> but it's not "faster than C".

(Actually if I enable -flts with that example the C version runs 6-7
times faster due to inlining.)

> Nobody is saying that PyPy is *generally* capable of making any arbitrary
> piece of code run as fast as hand-written C code. You'll notice that the
> PyPy posts are described as *carefully crafted* examples.

They are more than carefully crafted. They are useless and misleading.
It's reasonable to contrive of a simple CPU-intensive programming
problem for benchmarking. But the program should do *something* even
if it is contrived. Both programs here consist *entirely* of dead
code. Yes it's reasonable for the pypy devs to test things like this
during development. No it's not reasonable to showcase this as an
example of the potential for pypy to speed up any useful computation.

> I believe that, realistically, PyPy has potential to bring Python into
> Java and .Net territories, namely to run typical benchmarks within an
> order of magnitude of C speeds on the same benchmarks. C is a very hard
> target to beat, because vanilla C code does *so little* compared to other
> languages: no garbage collection, no runtime dynamism, very little
> polymorphism. So benchmarking simple algorithms plays to C's strengths,
> while ignoring C's weaknesses.

As I said I don't want to criticise PyPy. I've just started using it
and I it is impressive. However both of those blog posts are
misleading. Not only that but the authors must know exactly why they
are misleading. Because of that I will take any other claims with a
big pinch of salt in future.

Oscar

Philip Herron

unread,

Oct 22, 2013, 5:32:30 AM10/22/13

to

On Tuesday, 22 October 2013 10:14:16 UTC+1, Oscar Benjamin wrote:
> On 22 October 2013 00:41, Steven D'Aprano
>

You sir deserve a medal! I think alot of people are taking these sorts of benchmarks completely out of context and its great to see such a well rounded statement.

I applaud you so much! I've been sort of banging my head against the wall to describe what you just did as succinctly as that and couldn't.

Steven D'Aprano

unread,

Oct 22, 2013, 6:10:50 AM10/22/13

to

On Tue, 22 Oct 2013 08:55:15 +0000, Antoine Pitrou wrote:

> If you don't implement exec() and eval() then people won't be able to
> use namedtuples, which are a common datatype factory.

Philip could always supply his own implementation of namedtuple that
doesn't use exec.

But either way, if he doesn't implement eval and exec, what he has is not
Python, but a subset of Python. Perhaps an interesting and useful subset.

--
Steven

Steven D'Aprano

unread,

Oct 22, 2013, 8:00:37 AM10/22/13

to

On Tue, 22 Oct 2013 10:14:16 +0100, Oscar Benjamin wrote:

> On 22 October 2013 00:41, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>> On Mon, 21 Oct 2013 10:55:10 +0100, Oscar Benjamin wrote:
>>
>>> On 21 October 2013 08:46, Steven D'Aprano <st...@pearwood.info> wrote:
>>
>>>> On the contrary, you have that backwards. An optimizing JIT compiler
>>>> can often produce much more efficient, heavily optimized code than a
>>>> static AOT compiler, and at the very least they can optimize
>>>> different things than a static compiler can. This is why very few
>>>> people think that, in the long run, Nuitka can be as fast as PyPy,
>>>> and why PyPy's ultimate aim to be "faster than C" is not moonbeams:
>>>
>>> That may be true but both the examples below are spurious at best. A
>>> decent AOT compiler would reduce both programs to the NULL program as
>>> noted by haypo:
>>> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-
carefully-
>> crafted.html?showComment=1297205903746#c2530451800553246683

Keep in mind that the post's author, Maciej Fijalkowski, is not a native
English speaker (to the best of my knowledge). You or I would probably
have called the post a *contrived* example, not a "carefully crafted one"
-- the meaning is the same, but the connotations are different.

Micro-benchmarks are mostly of theoretical interest, and contrived ones
even more so, but still of interest. One needs to be careful not to read
too much into them, but also not to read too little into them.

>> Are you suggesting that gcc is not a decent compiler?
>
> No.
>
>> If "optimize away
>> to the null program" is such an obvious thing to do, why doesn't the
>> most popular C compiler in the [FOSS] world do it?
>
> It does if you pass the appropriate optimisation setting (as shown in
> haypo's comment). I should have been clearer.

"C can do nothing 10 times faster than Python!" -- well, okay, but what
does that tell you about my long-running web server app? Benchmarks at
the best of time are only suggestive, benchmarks for null programs are
even less useful.

The very next comment after Haypo is an answer to his observation:

[quote]
@haypo print the result so the loop don't get removed as dead
code. Besides, the problem is really the fact that's -flto is
unfair since python imports more resemble shared libraries
than statically-compiled files.

I'll be honest, I don't know enough C to really judge that claim, but I
have noticed that benchmarks rarely compare apples and oranges,
especially when C is involved. You can't eliminate all the differences
between the code being generated, or at least not easily, since different
languages have deep-seated differences in semantics that can't be
entirely eliminated. But you should at least make some effort to compare
code that does the same thing the same way.

Here's an example: responding to a benchmark showing a Haskell compiler
generating faster code than a C compiler, somebody re-wrote the C code
and got the opposite result:

http://jacquesmattheij.com/when-haskell-is-not-faster-than-c

Again, I can't judge the validity of all of the changes he made, but one
stood out like a sore thumb:

[quote]
C does not require you to set static global arrays to ‘0’, so the
for loop in the main function can go...

Wait a minute... Haskell, I'm pretty sure, zeroes memory. C doesn't. So
the C code is now doing less work. Yes, your C compiler will allow you to
avoid zeroing memory before using it, and you'll save some time
initially. But eventually[1] you will need to fix the security
vulnerability by adding code to zero the memory, exactly as Haskell and
other more secure languages already do. So *not* zeroing the memory is
cheating. It's not something you'd do in real code, not if you care about
security and correctness. Even if you don't care about security, you
should care about benchmarking both languages performing the same amount
of work.

Now, I may be completely off-base here. Some Haskell expert may chime up
to say that Haskell does not, in fact, zero memory. But it does
*something*, I'm sure, perhaps it tracks what memory is undefined and
prevents reads from it, or something. Whatever it does, if it does it at
runtime, the C benchmark better do the same thing, or it's an unfair
comparison:

"Safely drive to the mall obeying all speed limits and traffic signals in
a Chevy Volt, versus speed down the road running red lights and stop
signs in a Ford Taurus" -- would it be any surprise that the Taurus is
faster?

[...]

> They are more than carefully crafted. They are useless and misleading.
> It's reasonable to contrive of a simple CPU-intensive programming
> problem for benchmarking. But the program should do *something* even if
> it is contrived. Both programs here consist *entirely* of dead code.

But since the dead code is *not* eliminated, it is actually executed. If
it's executed, it's not really dead, is it? Does it really matter that
you don't do anything with the result? I'm with Maciej on this one --
*executing* the code given is faster in PyPy than in C, at least for this
C compiler. Maybe C is faster to not execute it. Is that really an
interesting benchmark? "C does nothing ten times faster than PyPy does
something!"

Given a sufficiently advanced static analyser, PyPy could probably
special-case programs that do nothing. Then you're in a race to compare
the speed at which the PyPy runtime environment can start up and do
nothing, versus a stand-alone executable that has to start up and do
nothing. If this is a benchmark that people care about, I suggest they
need to get out more :-)

Ultimately, this is an argument as what counts as a fair apples-to-apples
comparison, and what doesn't. Some people consider that for a fair test,
the code has to actually be executed. If you optimize away code and don't
execute it, that's not a good benchmark. I agree with them. You don't. I
can see both sides of the argument, and think that they both have
validity, but on balance agree with the PyPy guys here: a compiler that
optimizes away "for i = 1 to 1000: pass" to do-nothing is useful, but if
you wanted to find out the runtime cost of a for-loop, you would surely
prefer to disable that optimization and time how long it takes the for
loop to actually run.

The actual point that the PyPy developers keep making is that a JIT
compiler can use runtime information to perform optimizations which a
static compiler like gcc cannot, and I haven't seen anyone dispute that
point. More in the comments here:

[quote]
The point here is not that the Python implementation of
formatting is better than the C standard library, but that
dynamic optimisation can make a big difference. The first
time the formatting operator is called its format string is
parsed and assembly code for assembling the output generated.
The next 999999 times that assembly code is used without
doing the parsing step. Even if sprintf were defined locally,
a static compiler can’t optimise away the parsing step, so
that work is done redundantly every time around the loop.

http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-again-string.html?showComment=1312357475889#c6708170690935286644

Also possibly of interest:

http://beza1e1.tuxen.de/articles/faster_than_C.html

[1] Probably not until after the Zero Day exploit is released.

--
Steven

Chris Angelico

unread,

Oct 22, 2013, 8:20:52 AM10/22/13

to pytho...@python.org

On Tue, Oct 22, 2013 at 11:00 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> Given a sufficiently advanced static analyser, PyPy could probably
> special-case programs that do nothing. Then you're in a race to compare
> the speed at which the PyPy runtime environment can start up and do
> nothing, versus a stand-alone executable that has to start up and do
> nothing. If this is a benchmark that people care about, I suggest they
> need to get out more :-)

Like every benchmark, it has its uses. Just last week I was tinkering
with several high level languages in order to see which one would
start, do a fairly trivial task, and shut down, in the shortest space
of time. Why? Because I wanted that trivial task to be added to our
rapid-deployment sequence at a point where a human would be waiting on
it, and the task itself wasn't critical (it was an early-catcher for a
particular type of bug). Delaying my fellow developers by even one
second at that point would be unacceptable; the various options at my
disposal took anywhere from 250 to 750 ms (cold cache, which is what
matters) to run. Yes, I know a second isn't long. But I was trying to
sell a concept, and if I can say that it adds "practically no time" to
an interactive action, that's a lot better than even one second.
Considering that rapiding took about 1200ms (ish - again, cold cache)
previously, adding even just 250ms is noticeable. Benchmarking an
empty program would get very close to this actual real-world scenario.

(Eventually I merged the functionality into an unrelated script just
for the sake of saving on interpreter startup time.)

ChrisA

Dave Angel

unread,

Oct 22, 2013, 10:04:57 AM10/22/13

to pytho...@python.org

On 22/10/2013 08:00, Steven D'Aprano wrote:

> On Tue, 22 Oct 2013 10:14:16 +0100, Oscar Benjamin wrote:
>

<snip>

> Here's an example: responding to a benchmark showing a Haskell compiler
> generating faster code than a C compiler, somebody re-wrote the C code
> and got the opposite result:
>
> http://jacquesmattheij.com/when-haskell-is-not-faster-than-c
>
> Again, I can't judge the validity of all of the changes he made, but one
> stood out like a sore thumb:
>
> [quote]

> C does not require you to set static global arrays to οΏ½0οΏ½, so the

> for loop in the main function can go...
>
> Wait a minute... Haskell, I'm pretty sure, zeroes memory. C doesn't. So

Static int variables are in fact zeroed. However, most C compilers do
it by putting four bytes (or whatever) into the image of the
executable so it has no runtime cost.

<snip>

> But eventually[1] you will need to fix the security
> vulnerability by adding code to zero the memory, exactly as Haskell and
> other more secure languages already do. So *not* zeroing the memory is
> cheating. It's not something you'd do in real code, not if you care
> about security and correctness.

I agree with most of what you say in the message, but here you go on to
say the C code is unsafely skipping initialization, which is not the
case.

By the way, a C compiler typically handles any initialization of a
static variable the same way. So if you declare and initialize a static
variable as

int myvar = 34 * 768;

it'll put the product directly in the executable image, and no runtime
code is generated.

Perhaps you were thinking of an automatic variable, which is not
initialized unless the programmer says so, and is then initialized with
code.

--
DaveA

Oscar Benjamin

unread,

Oct 22, 2013, 10:15:32 AM10/22/13

to Steven D'Aprano, Python List

On 22 October 2013 13:00, Steven D'Aprano

<steve+comp....@pearwood.info> wrote:
> On Tue, 22 Oct 2013 10:14:16 +0100, Oscar Benjamin wrote:
>
>> On 22 October 2013 00:41, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>>>
>>> Are you suggesting that gcc is not a decent compiler?
>>
>> No.
>>
>>> If "optimize away
>>> to the null program" is such an obvious thing to do, why doesn't the
>>> most popular C compiler in the [FOSS] world do it?
>>
>> It does if you pass the appropriate optimisation setting (as shown in
>> haypo's comment). I should have been clearer.
>
> "C can do nothing 10 times faster than Python!" -- well, okay, but what
> does that tell you about my long-running web server app? Benchmarks at
> the best of time are only suggestive, benchmarks for null programs are
> even less useful.

This is precisely my point. They should show a benchmark that is not
semantically equivalent to the null program. I modified their example
to do that so that it wasn't simply a case of removing dead code and
then found that the C version performed 6-7 times faster than the PyPy
version. Had they simply stated that I would have been impressed.

At the bottom of this post I show a much better benchmark that shows
how PyPy can come very close to C performance for intensive floating
point computation. Note that although it is simple the benchmark
actually produces a result and none of the computation can be skipped
as dead code (why would I put that in?). Also note that both the C
binary and the script produce exactly the same numeric output. It's
also a simple example of numerical integration - something that is
often a bottleneck in scientific computation.

For the benchmark I find that the gcc -O3 binary runs in 4.6s and PyPy
runs the script in 6.9s (CPython 2.7 takes 600 seconds). That is
impressive and makes me think that there may be no need for me to use
C for things like that. To be sure I'd have to scale it up a bit to
see what happens when I break it apart into many functions and use
lists in PyPy vs arrays in C.

> [...]
>> They are more than carefully crafted. They are useless and misleading.
>> It's reasonable to contrive of a simple CPU-intensive programming
>> problem for benchmarking. But the program should do *something* even if
>> it is contrived. Both programs here consist *entirely* of dead code.
>
> But since the dead code is *not* eliminated, it is actually executed. If
> it's executed, it's not really dead, is it? Does it really matter that
> you don't do anything with the result? I'm with Maciej on this one --
> *executing* the code given is faster in PyPy than in C, at least for this
> C compiler. Maybe C is faster to not execute it. Is that really an
> interesting benchmark? "C does nothing ten times faster than PyPy does
> something!"

I don't think it is reasonable to compare those things and I was
joking when I said that the optimised C version was 600 times faster
because this is an absurd benchmark - see the much better one below.

> Given a sufficiently advanced static analyser, PyPy could probably
> special-case programs that do nothing. Then you're in a race to compare
> the speed at which the PyPy runtime environment can start up and do
> nothing, versus a stand-alone executable that has to start up and do
> nothing. If this is a benchmark that people care about, I suggest they
> need to get out more :-)

Like Chris I have also had situations where startup time mattered and
it can vary substantially between different interpreters and binaries.
I have GNU Octave on Windows and it literally takes 20 seconds to
start up. Matlab is worse: it takes about 1 minute so I don't tend to
use it for CLI scripts much.

Oscar

The benchmark:

$ cat euler.py
#!/usr/bin/env pypy

import math

def main():
x = 1
v = 0
t = 0
T = 1
dt = 2**-30
while t < T:
dxdt = v
dvdt = - x
x += dt * dxdt
v += dt * dvdt
t += dt
print('t = %.2e' % t)
print('x = %.2e' % x)
print('v = %.2e' % v)
print('x_err = %.e' % (x - math.cos(t)))
print('x_err = %.2e' % (v + math.sin(t)))

main()
$ time pypy euler.py
t = 1.00e+00
x = 5.40e-01
v = -8.41e-01
x_err = 3e-10
x_err = -3.92e-10

real 0m6.907s
user 0m0.076s
sys 0m0.045s
$ cat euler.c
#include <stdio.h>
#include <math.h>

int main()
{
double x = 1;
double v = 0;
double t = 0;
double T = 1;
double dt = pow(2, -30);
double dxdt, dvdt;
while (t < T)
{
dxdt = v;
dvdt = - x;
x += dt * dxdt;
v += dt * dvdt;
t += dt;
}
printf("t = %.2e\n", t);
printf("x = %.2e\n", x);
printf("v = %.2e\n", v);
printf("x_err = %.e\n", x - cos(t));
printf("x_err = %.2e\n", v + sin(t));

return 0;
}
$ gcc -O3 euler.c
$ time ./a.exe
t = 1.00e+000
x = 5.40e-001
v = -8.41e-001
x_err = 3e-010
x_err = -3.92e-010

real 0m4.609s
user 0m0.015s
sys 0m0.000s

$ time python euler.py # CPython 2.7
t = 1.00e+00
x = 5.40e-01
v = -8.41e-01
x_err = 3e-10
x_err = -3.92e-10

real 9m51.818s
user 0m0.015s
sys 0m0.015s

Mark Janssen

unread,

Oct 22, 2013, 11:04:21 AM10/22/13

to Dave Angel, Steven D'Aprano, Python List

I love it. Watch this...

[context]

>>> A language specification in BNF is just syntax. It doesn't say anything
>>> about semantics. So how could this be used to produce executable C code
>>> for a program? BNF is used to produce parsers. But a parser isn't
>>> sufficient.
>>

>> A C program is just syntax also. How does the compiler generate
>> executable machine code? Extrapolate into a Python front-end to C.

[Dave Angel responds:]

> Did you even read the paragraph you quoted above? The BNF specification
> does NOT completely describe a language, it only defines its syntax.

[Steven D'Aprano responds:]

> Like every other language, C programs are certainly not *just* syntax.
> Here is some syntax:
>
> &foo bar^ :=

Now, I don't know where y'all were taught Computer Science, but BNF
specifies not only syntax (which would be the *tokens* of a language),
but also its *grammar*; how syntax relates to linguistic categories
like keywords, and tokens relate to each other.

Dave is claiming that BNF only defines the syntax of a language, but
then Stephen goes on to supply some syntax that a BNF specification of
the language would not allow (even though Steven calls it "syntax"
which is what BNF in Dave's claim parses).

So which of you is confused? I ask that in the inclusive (not
exclusive OR) sense.... ;^) <-- face says "both".

Mark Janssen
Tacoma, Washington.

Steven D'Aprano

unread,

Oct 22, 2013, 11:22:59 AM10/22/13

to

On Tue, 22 Oct 2013 14:04:57 +0000, Dave Angel wrote:

[...]

> I agree with most of what you say in the message,

Glad to hear I wasn't completely full of it. As a non-C developer, I'm
very conscious that a lot of what I know about C is second hand.

> but here you go on to
> say the C code is unsafely skipping initialization, which is not the
> case.

Are you talking generically, or specifically about the C code referenced
in the link I gave?

"Memory is always zeroed" is one of the advantages of Go over C and C++,
at least according to Rob Pike:

http://commandcenter.blogspot.com.au/2012/06/less-is-exponentially-
more.html

> By the way, a C compiler typically handles any initialization of a
> static variable the same way. So if you declare and initialize a static
> variable as
>
> int myvar = 34 * 768;
>
> it'll put the product directly in the executable image, and no runtime
> code is generated.

Yes, that's a keyhole optimization, CPython does the same sort of thing:

py> import dis
py> dis.dis(compile("myvar = 34 * 768", "", "exec"))
1 0 LOAD_CONST 3 (26112)
3 STORE_NAME 0 (myvar)
6 LOAD_CONST 2 (None)
9 RETURN_VALUE

> Perhaps you were thinking of an automatic variable, which is not
> initialized unless the programmer says so, and is then initialized with
> code.

No, I was thinking of an array. Arrays aren't automatically initialised
in C.

--
Steven

Grant Edwards

unread,

Oct 22, 2013, 11:36:38 AM10/22/13

to

On 2013-10-22, Dave Angel <da...@davea.name> wrote:
> On 22/10/2013 08:00, Steven D'Aprano wrote:

>> [quote]

>> C does not require you to set static global arrays to ?0?, so the

>> for loop in the main function can go...
>>
>> Wait a minute... Haskell, I'm pretty sure, zeroes memory. C doesn't. So
>
> Static int variables are in fact zeroed. However, most C compilers
> do it by putting four bytes (or whatever) into the image of the
> executable so it has no runtime cost.

No, that's not how gcc works (nor is it how any other C compiler I've
ever seen works). Static variables get located in a "bss" section[1],
which is zeroed out at run-time by startup code that gets executed
before main() is called. The ELF executable contains headers that
describe the size/location of bss section, but the object file
contains no actual _data_.

[1] IIRC, the name "bss" is a historical hold-over from the PDP-11
assembler directive that is used to declare a section of memory
that is to be filled with zeros. Not all compilers use that
section name, but they all use the same mechanism.

> int myvar = 34 * 768;
>
> it'll put the product directly in the executable image, and no
> runtime code is generated.

That is true.

--
Grant Edwards grant.b.edwards Yow! My EARS are GONE!!
at
gmail.com

Grant Edwards

unread,

Oct 22, 2013, 11:39:42 AM10/22/13

to

On 2013-10-22, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
> On Tue, 22 Oct 2013 14:04:57 +0000, Dave Angel wrote:
>
> [...]
>> I agree with most of what you say in the message,
>
> Glad to hear I wasn't completely full of it. As a non-C developer, I'm
> very conscious that a lot of what I know about C is second hand.
>
>
>> but here you go on to
>> say the C code is unsafely skipping initialization, which is not the
>> case.
>
> Are you talking generically, or specifically about the C code
> referenced in the link I gave?

In C, static/global variables are always zeroed.

> "Memory is always zeroed" is one of the advantages of Go over C and C++,
> at least according to Rob Pike:
>
> http://commandcenter.blogspot.com.au/2012/06/less-is-exponentially-more.html

Perhaps he's talking about automatic variables or malloc()ed memory?

You'd have to ask him.

>> Perhaps you were thinking of an automatic variable, which is not
>> initialized unless the programmer says so, and is then initialized with
>> code.
>
> No, I was thinking of an array. Arrays aren't automatically initialised
> in C.

If they are static or global, then _yes_they_are_. They are zeroed.

--
Grant Edwards grant.b.edwards Yow! I selected E5 ... but
at I didn't hear "Sam the Sham
gmail.com and the Pharoahs"!

Ned Batchelder

unread,

Oct 22, 2013, 11:46:11 AM10/22/13

to Mark Janssen, Dave Angel, Steven D'Aprano, Python List

On 10/22/13 11:04 AM, Mark Janssen wrote:
> I love it. Watch this...
>
> [context]

>>>> A language specification in BNF is just syntax. It doesn't say anything
>>>> about semantics. So how could this be used to produce executable C code
>>>> for a program? BNF is used to produce parsers. But a parser isn't
>>>> sufficient.

>>> A C program is just syntax also. How does the compiler generate
>>> executable machine code? Extrapolate into a Python front-end to C.
> [Dave Angel responds:]
>> Did you even read the paragraph you quoted above? The BNF specification
>> does NOT completely describe a language, it only defines its syntax.
> [Steven D'Aprano responds:]
>> Like every other language, C programs are certainly not *just* syntax.
>> Here is some syntax:
>>
>> &foo bar^ :=
> Now, I don't know where y'all were taught Computer Science, but BNF
> specifies not only syntax (which would be the *tokens* of a language),
> but also its *grammar*; how syntax relates to linguistic categories
> like keywords, and tokens relate to each other.

Mark, you had expressed interest in "an app that will take a language

specification in BNF (complete with keywords and all) and output C code

which is then compiled to an executable". I'm interested in how that
app might work.

Here's a BNF for a (very!) simple language:

<program> ::= <number> <op> <number>
<op> ::= "*!?" | "--+" | "..:"

That means these are three valid programs:

123 *!? 456
2 --+ 2
1001 ..: 4

What will the app output as C code for each of these?

>
> Dave is claiming that BNF only defines the syntax of a language, but
> then Stephen goes on to supply some syntax that a BNF specification of
> the language would not allow (even though Steven calls it "syntax"
> which is what BNF in Dave's claim parses).
>
> So which of you is confused? I ask that in the inclusive (not
> exclusive OR) sense.... ;^) <-- face says "both".

Could you please be less snarky? We're trying to communicate here, and
it is not at all clear yet who is confused and who is not. If you are
interested in discussing technical topics, then discuss them.

--Ned.

>
> Mark Janssen
> Tacoma, Washington.

Antoine Pitrou

unread,

Oct 22, 2013, 11:51:48 AM10/22/13

to pytho...@python.org

If you go that way, we already have Cython (which is both a subset and
superset of Python, although I don't know if it's still a strict subset these
days).

Regards

Antoine.

Mark Lawrence

unread,

Oct 22, 2013, 11:52:47 AM10/22/13

to pytho...@python.org

On 22/10/2013 16:46, Ned Batchelder wrote:
>
> Could you please be less snarky? We're trying to communicate here, and
> it is not at all clear yet who is confused and who is not. If you are
> interested in discussing technical topics, then discuss them.
>
> --Ned.
>

Well put, particularly when considering that both Dave Angel and Steven
D'Aprano, the targets of the snarkiness, have been regular, helpful
contributors over many years.

--
Python is the second best programming language in the world.
But the best has yet to be invented. Christian Tismer

Mark Lawrence

Benjamin Kaplan

unread,

Oct 22, 2013, 12:03:49 PM10/22/13

to pytho...@python.org

On Tue, Oct 22, 2013 at 8:04 AM, Mark Janssen <dreamin...@gmail.com> wrote:
> I love it. Watch this...
>
> [context]

>>>> A language specification in BNF is just syntax. It doesn't say anything
>>>> about semantics. So how could this be used to produce executable C code
>>>> for a program? BNF is used to produce parsers. But a parser isn't
>>>> sufficient.
>>>

>>> A C program is just syntax also. How does the compiler generate
>>> executable machine code? Extrapolate into a Python front-end to C.
>
> [Dave Angel responds:]
>> Did you even read the paragraph you quoted above? The BNF specification
>> does NOT completely describe a language, it only defines its syntax.
>
> [Steven D'Aprano responds:]
>> Like every other language, C programs are certainly not *just* syntax.
>> Here is some syntax:
>>
>> &foo bar^ :=
>
> Now, I don't know where y'all were taught Computer Science, but BNF
> specifies not only syntax (which would be the *tokens* of a language),
> but also its *grammar*; how syntax relates to linguistic categories
> like keywords, and tokens relate to each other.
>

> Dave is claiming that BNF only defines the syntax of a language, but
> then Stephen goes on to supply some syntax that a BNF specification of
> the language would not allow (even though Steven calls it "syntax"
> which is what BNF in Dave's claim parses).
>
> So which of you is confused? I ask that in the inclusive (not
> exclusive OR) sense.... ;^) <-- face says "both".
>

I don't know where you were taught English, but syntax is " the way in
which linguistic elements (as words) are put together to form
constituents (as phrases or clauses) ", not the set of valid words
(tokens) in a language. A grammar, such as those grammars written in
BNF, describe the rules for the syntax of a language. And, as Steven
said, it still doesn't describe the semantics of a language, which is
the set of instructions described by the syntax.

Steven D'Aprano

unread,

Oct 22, 2013, 12:40:32 PM10/22/13

to

On Tue, 22 Oct 2013 15:39:42 +0000, Grant Edwards wrote:

>> No, I was thinking of an array. Arrays aren't automatically initialised
>> in C.
>
> If they are static or global, then _yes_they_are_. They are zeroed.

Not that I don't believe you, but do you have a reference for this?
Because I keep finding references to uninitialised C arrays filled with
garbage if you don't initialise them.

Wait... hang on a second...

/fires up the ol' trusty gcc

[steve@ando c]$ cat array_init.c
#include<stdio.h>

int main()
{
int i;
int arr[10];
for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
printf("\n");
return 0;
}

[steve@ando c]$ gcc array_init.c
[steve@ando c]$ ./a.out
arr[0] = -1082002360
arr[1] = 134513317
arr[2] = 2527220
arr[3] = 2519564
arr[4] = -1082002312
arr[5] = 134513753
arr[6] = 1294213
arr[7] = -1082002164
arr[8] = -1082002312
arr[9] = 2527220

What am I missing here?

--
Steven

Mark Lawrence

unread,

Oct 22, 2013, 12:50:29 PM10/22/13

to pytho...@python.org

arr is local to main, not static or global.

Chris Kaynor

unread,

Oct 22, 2013, 12:52:09 PM10/22/13

to pytho...@python.org

The array you made there is an auto variable (stack), not a static or a global. Try one of the following (neither has been tested):

Static:

int main()
{
int i;
static int arr[10];

for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
printf("\n");
return 0;
}

Global:

int arr[10];

int main()
{
int i;

for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr[i]);
}
printf("\n");
return 0;
}

As for a reference: http://stackoverflow.com/questions/1831290/static-variable-initialization and http://stackoverflow.com/questions/3373108/why-are-static-variables-auto-initialized-to-zero, both of which then reference the C++ standard.

--
Steven
--
https://mail.python.org/mailman/listinfo/python-list

Frank Miles

unread,

Oct 22, 2013, 12:53:07 PM10/22/13

to

What you're missing is that arr[] is an automatic variable. Put
a "static" in front of it, or move it outside the function (to become
global) and you'll see the difference.

Steven D'Aprano

unread,

Oct 22, 2013, 1:09:01 PM10/22/13

to

On Tue, 22 Oct 2013 08:04:21 -0700, Mark Janssen wrote:

>>>> A language specification in BNF is just syntax. It doesn't say
>>>> anything about semantics. So how could this be used to produce
>>>> executable C code for a program? BNF is used to produce parsers. But
>>>> a parser isn't sufficient.
>>>
>>> A C program is just syntax also. How does the compiler generate
>>> executable machine code? Extrapolate into a Python front-end to C.
>
> [Dave Angel responds:]
>> Did you even read the paragraph you quoted above? The BNF
>> specification does NOT completely describe a language, it only defines
>> its syntax.
>
> [Steven D'Aprano responds:]
>> Like every other language, C programs are certainly not *just* syntax.
>> Here is some syntax:
>>
>> &foo bar^ :=
>
> Now, I don't know where y'all were taught Computer Science,

Melbourne University Computer Science Department. How about you?

> but BNF
> specifies not only syntax (which would be the *tokens* of a language),
> but also its *grammar*; how syntax relates to linguistic categories
> like keywords, and tokens relate to each other.

I'm not about to get into a debate about the difference between syntax
and grammar as they apply to computer languages, because it doesn't
matter. Neither syntax nor grammar tell you what something *means*, the
semantics of the code. The parser knows that (say) "x ^% y" is legal and
(say) "x y ^%" is not, but it doesn't know what machine code to generate
when it sees "x ^% y". That's not the job of the parser.

I expect that some compilers -- ancient ones, or lousy ones, or simple
ones -- have a single routine that do all the parsing, lexing, code
generation, linking, optimizing, etc., rather than separate routines that
do the parsing, the code generation, and so on. But even those compilers
cannot just take a description of the syntax and intuit what it means.

Syntax isn't enough. At some point, the compiler needs to know that "^"
means to generate code to dereference pointers (like in Pascal), or
perhaps it's bitwise or (like in C), or maybe even exponentiation (like
in VisualBasic), or perhaps it's something completely different.

> Dave is claiming that BNF only defines the syntax of a language, but
> then Stephen goes on to supply some syntax that a BNF specification of
> the language would not allow (even though Steven calls it "syntax" which
> is what BNF in Dave's claim parses).

I think you have misunderstood me. I gave an example of some invented
syntax that your hypothetical language might choose to use. Here it is
again:

&foo bar^ :=

Since I didn't provide the BNF specification for that syntax, you aren't
in a position to say it is illegal. You should assume that it is legal
syntax. If you really insist, I'll waste my time and yours and generate a
BNF specification that allows that as valid syntax, or you could just
accept that it's legal for this imaginary language.

Your task is to describe what the code does, and hence what machine code
your hypothetical compiler will generate, when it sees that line of code.

You should be asking "How the hell can I tell what that does?" Exactly.
That's the point. Neither can the compiler, not from syntax alone.

--
Steven

Neil Cerutti

unread,

Oct 22, 2013, 1:15:50 PM10/22/13

to

On 2013-10-22, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
> On Tue, 22 Oct 2013 14:04:57 +0000, Dave Angel wrote:
>> but here you go on to say the C code is unsafely skipping
>> initialization, which is not the case.
>
> Are you talking generically, or specifically about the C code
> referenced in the link I gave?
>
> "Memory is always zeroed" is one of the advantages of Go over C and C++,
> at least according to Rob Pike:
>
> http://commandcenter.blogspot.com.au/2012/06/less-is-exponentially-
> more.html

Go initializes variables to defined zero values, not simply to
all-bits zero as (I think) C does.

This isn't as great a feature as it seems, since the zero value
for some built in types, e.g., map, is unusable without manual
construction. In addition, you can't define a zero value for your
own types.

--
Neil Cerutti

Piet van Oostrum

unread,

Oct 22, 2013, 1:20:51 PM10/22/13

to

Mark Janssen <dreamin...@gmail.com> writes:

> I love it. Watch this...
>
> [context]
>>>> A language specification in BNF is just syntax. It doesn't say anything
>>>> about semantics. So how could this be used to produce executable C code
>>>> for a program? BNF is used to produce parsers. But a parser isn't
>>>> sufficient.
>>>
>>> A C program is just syntax also. How does the compiler generate
>>> executable machine code? Extrapolate into a Python front-end to C.
>
> [Dave Angel responds:]
>> Did you even read the paragraph you quoted above? The BNF specification
>> does NOT completely describe a language, it only defines its syntax.
>
> [Steven D'Aprano responds:]
>> Like every other language, C programs are certainly not *just* syntax.
>> Here is some syntax:
>>
>> &foo bar^ :=
>
> Now, I don't know where y'all were taught Computer Science, but BNF
> specifies not only syntax (which would be the *tokens* of a language),
> but also its *grammar*; how syntax relates to linguistic categories
> like keywords, and tokens relate to each other.

Syntax is grammar. Tokens are part of the grammar (but often specified separately with a different grammar, usually regular expressions, which is a subset of BNF).

So are you just confused or are you trollong?

--
Piet van Oostrum <pi...@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

Steven D'Aprano

unread,

Oct 22, 2013, 1:23:35 PM10/22/13

to

On Tue, 22 Oct 2013 16:53:07 +0000, Frank Miles wrote:

[snip C code]

> What you're missing is that arr[] is an automatic variable. Put a
> "static" in front of it, or move it outside the function (to become
> global) and you'll see the difference.

Ah, that makes sense. Thanks to everyone who corrected my
misunderstanding.

Well, actually, no it doesn't. I wonder why C specifies such behaviour?
Why would you want non-global arrays to be filled with garbage?

--
Steven

Steven D'Aprano

unread,

Oct 22, 2013, 1:27:29 PM10/22/13

to

On Tue, 22 Oct 2013 23:20:52 +1100, Chris Angelico wrote:

> Considering that rapiding took about 1200ms (ish - again, cold cache)
> previously, adding even just 250ms is noticeable.

Please excuse my skepticism, but in my experience, that would probably
mean in practice:

... rapiding took about 1200ms, plus or minus 200ms, plus 500ms if the
system is under load, plus 800ms if the developer vagued out for a
moment, plus 1900ms if he happened to be scratching an itch, plus 2700ms
if the anti-virus happened to be scanning something, plus 4100ms if the
dev decided this was a good time to take a sip of coffee, plus 437000ms
if he needed to make the coffee first, plus 72000000ms if he was just
taking a moment to check something on Reddit or answer an email...

I don't have a lot of sympathy for this sort of micro-optimization of
interactive software, where the random variation from run to run is often
larger than the time being optimized, and where the human element is
usually one or two orders of magnitude greater still. Yes, developers
complain when they have to wait for the computer for half a second, or at
least the one time in thirty that they *notice* that they're waiting. The
other 29 times the computer is actually waiting for them.

Still, I'm probably no better. Only instead of optimizing code, I tend to
"optimize" travel time with "short cuts" that are guaranteed[1] to shave
off a minute from a journey that takes thirty minutes, plus or minus six
minutes. Show me a way to avoid waiting at traffic lights for 30s, and
I'll take it, even if it means waiting for a break in the traffic at a
Give Way sign for three minutes. So I shouldn't mock too much :-)

You're right, of course, that 1/4 second is noticeable. I just find it
hard to credit that it's *significant* in the circumstances you're
describing. But I could be wrong.

[1] Guarantee void in the presence of rain, fog, heavy winds, light
winds, police radar traps, industrial action, civil protests, riots,
wars, earthquakes, acts of God, or other cars.

--
Steven

Chris Kaynor

unread,

Oct 22, 2013, 1:35:51 PM10/22/13

to Steven D'Aprano, pytho...@python.org

Its a performance benefit, for cases where you are going to fill the array from other means, such as from a file or other sources such as literals. The global and static variables are effectively free to zero due to standard OS behaviours.

The issue is actually more general than arrays, as the same behavior exists for all the types in C. Stack and heap variables have undefined value when initialized, while global and static variables (which are really the same thing, but with differing lexical scopes) are zero-filled.

Neil Cerutti

unread,

Oct 22, 2013, 1:37:30 PM10/22/13

to

On 2013-10-22, Steven D'Aprano <steve+comp....@pearwood.info> wrote:

Fish(enc)ey.

--
Neil Cerutti

Oscar Benjamin

unread,

Oct 22, 2013, 1:37:54 PM10/22/13

to Steven D'Aprano, Python List

On 22 October 2013 18:23, Steven D'Aprano

It's not that you want them filled with garbage but rather that you
know you will fill them with useful values later and you don't want to
waste time pre-filling them with zeros first. Or perhaps you have some
other way of knowing which values are to be considered initialised.
This is reasonable in some contexts.

OTOH why in particular would you want to initialise them with zeros? I
often initialise arrays to nan which is useful for debugging. It means
that you can see if you made a mistake when you were supposed to have
initialised everything to useful values. In many contexts it would be
difficult to distinguish between a valid zero and a zero because you
haven't yet inserted a value. This kind of error can show up more
quickly if you don't zero the memory since the uninitialised values
will often be out of range for what you expected and will give very
noticeable results (such as a seg-fault).

Oscar

MRAB

unread,

Oct 22, 2013, 1:42:37 PM10/22/13

to pytho...@python.org

Static variables need be initialised only once, whereas auto variables
exist on the stack, so they would need to be initialised repeatedly,
which was considered too expensive, especially as they would be
assigned to before use anyway (hopefully!).

Of course, these days, with our much faster CPUs, we're not so
bothered, and prefer to allocate on the heap.

Mark Janssen

unread,

Oct 22, 2013, 1:50:33 PM10/22/13

to Ned Batchelder, Python List, Dave Angel, Steven D'Aprano

>> So which of you is confused? I ask that in the inclusive (not
>> exclusive OR) sense.... ;^) <-- face says "both".
>

> Could you please be less snarky? We're trying to communicate here, and it
> is not at all clear yet who is confused and who is not. If you are
> interested in discussing technical topics, then discuss them.

Okay. The purpose of BNF (at least as I envision it) is to
produce/specify a *context-free* "grammar". A lexer parses the tokens
specified in the BNF into an Abstract Syntax Tree. If one can produce
such a tree for any given source, the language, in theory, can be
compiled by GCC into an executable.

Boom.
--
MarkJ
Tacoma, Washington

Skip Montanaro

unread,

Oct 22, 2013, 2:08:42 PM10/22/13

to Mark Janssen, Python List, Dave Angel, Steven D'Aprano

On Tue, Oct 22, 2013 at 12:50 PM, Mark Janssen
<dreamin...@gmail.com> wrote:
> Okay. The purpose of BNF (at least as I envision it) is to
> produce/specify a *context-free* "grammar". A lexer parses the tokens
> specified in the BNF into an Abstract Syntax Tree. If one can produce
> such a tree for any given source, the language, in theory, can be
> compiled by GCC into an executable.

Well, not quite. The lexer breaks the stream of characters up into
tokens, which are fed to the parser, which produces an abstract syntax
tree. From the WIkipedia entry:

"In computer science, an abstract syntax tree (AST), or just syntax
tree, is a tree representation of the abstract syntactic structure of
source code written in a programming language. Each node of the tree
denotes a construct occurring in the source code."

Skip

rusi

unread,

Oct 22, 2013, 2:11:58 PM10/22/13

to

Mark Janssen said:
> Unattributed
> > No its not like those 'compilers' i dont really agree with a compiler
> > generating C/C++ and saying its producing native code. I dont really believe
> > its truely within the statement. Compilers that do that tend to put in alot
> > of type saftey code and debugging internals at a high level to get things
> > working in other projects i am not saying python compilers here i havent
> > analysed enough to say this.
>
> Hmm, well what I'd personally find interesting from a computer science
> point of view is a app that will take a language specification in BNF

> (complete with keywords and all) and output C code which is then

> compiled to an executable as normal. This is how a front-end should
> be designed. A middle-layer for translating common language elements
> like lists, sets, etc, could make it easy.

Taking this starting sortie I was going to try to justify what Mark is saying.
Somewhat along the following lines.

Things like lex and yacc (and equivalents in ecosystems other than C) give a kind of holy-grail in the following sense.

When a writer of a lex/yacc spec does his thing, he does not need to think at the C level at all. Given that executable C is produced (and ignoring mishaps/bugs like shift-reduce conflicts etc) its a very ideal situation.
The yacc-grammar (and its lex-helper) are completely declarative and yet they result in perfectly good ( O(n)) good C code.

Taking this cue from the syntax domain one can treat it as a holy grail for semantics. ie can one specify the semantics of a language completely declaratively and have a lex/yacc *analogous* magic wand and get out a compiler/interpreter etc.

Many people have pursued this holy grail but it remains elusive. Some examples:
1. Atribute grammars
2. Utrecht univs UUAG
3. Action semantics
etc

Note a key word above is "analogous"

However when I see this:

> Okay. The purpose of BNF (at least as I envision it) is to
> produce/specify a *context-free* "grammar". A lexer parses the tokens
> specified in the BNF into an Abstract Syntax Tree. If one can produce
> such a tree for any given source, the language, in theory, can be
> compiled by GCC into an executable.

all I will say is "eyes-roll"

MRAB

unread,

Oct 22, 2013, 2:16:04 PM10/22/13

to pytho...@python.org

On 22/10/2013 18:50, Mark Janssen wrote:
>>> So which of you is confused? I ask that in the inclusive (not
>>> exclusive OR) sense.... ;^) <-- face says "both".
>>
>> Could you please be less snarky? We're trying to communicate here, and it
>> is not at all clear yet who is confused and who is not. If you are
>> interested in discussing technical topics, then discuss them.
>

> Okay. The purpose of BNF (at least as I envision it) is to
> produce/specify a *context-free* "grammar". A lexer parses the tokens
> specified in the BNF into an Abstract Syntax Tree. If one can produce
> such a tree for any given source, the language, in theory, can be
> compiled by GCC into an executable.
>

> Boom.
>
But you still need to specify the semantics.

Mark Janssen

unread,

Oct 22, 2013, 2:16:54 PM10/22/13

to Ned Batchelder, Python List, Dave Angel, Steven D'Aprano

>>>> So which of you is confused? I ask that in the inclusive (not
>>>> exclusive OR) sense.... ;^) <-- face says "both".
>>>
>>> Could you please be less snarky?
>>

>> Okay. The purpose of BNF (at least as I envision it) is to
>> produce/specify a *context-free* "grammar". A lexer parses the tokens
>> specified in the BNF into an Abstract Syntax Tree. If one can produce
>> such a tree for any given source, the language, in theory, can be
>> compiled by GCC into an executable.
>>
>> Boom.
>

> Hmm, I don't hear the boom yet. An Abstract Syntax Tree is a tree
> representation of a program. To use my previous example: the program "123
> *!? 456" would become a tree:
>
> op: "*!?"
> num: 123
> num: 456
>
> There's still not enough information to compile this.

....Is your language Turing complete?

--
MarkJ
Tacoma, Washington

Neil Cerutti

unread,

Oct 22, 2013, 2:18:18 PM10/22/13

to

On 2013-10-22, Mark Janssen <dreamin...@gmail.com> wrote:
>>> So which of you is confused? I ask that in the inclusive (not
>>> exclusive OR) sense.... ;^) <-- face says "both".
>>
>> Could you please be less snarky? We're trying to communicate here, and it
>> is not at all clear yet who is confused and who is not. If you are
>> interested in discussing technical topics, then discuss them.
>
> Okay. The purpose of BNF (at least as I envision it) is to
> produce/specify a *context-free* "grammar".

Context-sensitive grammars can be parse, too.

> A lexer parses the tokens specified in the BNF into an Abstract
> Syntax Tree.

A lexer traditionaly reads the text and generates a stream of
tokens to the parser.

> If one can produce such a tree for any given source, the
> language, in theory, can be compiled by GCC into an executable.

What executable would GCC compile from a program that matched
this grammar?

spamgram = spam1, { ', ', more_spam }, '.'
spam1 = 'Spam'
more_spam = spam, { ', ', spam }
spam = 'spam'

--
Neil Cerutti

Mark Janssen

unread,

Oct 22, 2013, 2:22:41 PM10/22/13

to Python List

>> Okay. The purpose of BNF (at least as I envision it) is to

>> produce/specify a *context-free* "grammar". A lexer parses the tokens
>> specified in the BNF into an Abstract Syntax Tree. If one can produce

>> such a tree for any given source, the language, in theory, can be
>> compiled by GCC into an executable.
>>

>> Boom.
>>
> But you still need to specify the semantics.

In my world, like writing pseudo-code or flow-charts, the AST *is* the
semantics. What world are you guys from?
--
MarkJ
Tacoma, Washington

Mark Janssen

unread,

Oct 22, 2013, 2:28:27 PM10/22/13

to Ned Batchelder, Python List

>> ....Is your language Turing complete?
>>
>

> 1) No, it's not.
> 2) So what? That should make it easier to compile to C, if anything.
> 3) Don't change the subject.

Well, if your language is not Turing complete, it is not clear that
you will be able to compile it at all. That's the difference between
a calculator and a computer.

Thank you. You may be seated.

Mark J
Tacoma, Washington

Ned Batchelder

unread,

Oct 22, 2013, 2:23:22 PM10/22/13

to Mark Janssen, Python List

On 10/22/13 2:16 PM, Mark Janssen wrote:
>>>>> So which of you is confused? I ask that in the inclusive (not
>>>>> exclusive OR) sense.... ;^) <-- face says "both".
>>>> Could you please be less snarky?

>>> Okay. The purpose of BNF (at least as I envision it) is to
>>> produce/specify a *context-free* "grammar". A lexer parses the tokens
>>> specified in the BNF into an Abstract Syntax Tree. If one can produce
>>> such a tree for any given source, the language, in theory, can be
>>> compiled by GCC into an executable.
>>>
>>> Boom.

>> Hmm, I don't hear the boom yet. An Abstract Syntax Tree is a tree
>> representation of a program. To use my previous example: the program "123
>> *!? 456" would become a tree:
>>
>> op: "*!?"
>> num: 123
>> num: 456
>>
>> There's still not enough information to compile this.

> ....Is your language Turing complete?
>

1) No, it's not.
2) So what? That should make it easier to compile to C, if anything.
3) Don't change the subject.

A BNF doesn't provide enough information to compile a program to C.
That's all I'm trying to help you understand. If you don't agree, then
we have to talk about the meaning of the words BNF, compile, program, and C.

I applaud your interest in this topic. I think you need to learn more
before you try to claim expertise, though.

--Ned.

rusi

unread,

Oct 22, 2013, 2:40:40 PM10/22/13

to

On Tuesday, October 22, 2013 11:53:22 PM UTC+5:30, Ned Batchelder wrote:
> A BNF doesn't provide enough information to compile a program to C.
> That's all I'm trying to help you understand. If you don't agree, then
> we have to talk about the meaning of the words BNF, compile, program, and C.

I believe we need to talk about the Dunning-Kruger effect

Ned Batchelder

unread,

Oct 22, 2013, 2:40:49 PM10/22/13

to Mark Janssen, Python List

On 10/22/13 2:22 PM, Mark Janssen wrote:
>>> Okay. The purpose of BNF (at least as I envision it) is to
>>> produce/specify a *context-free* "grammar". A lexer parses the tokens
>>> specified in the BNF into an Abstract Syntax Tree. If one can produce
>>> such a tree for any given source, the language, in theory, can be
>>> compiled by GCC into an executable.
>>>
>>> Boom.
>>>

>> But you still need to specify the semantics.
> In my world, like writing pseudo-code or flow-charts, the AST *is* the
> semantics. What world are you guys from?

Mark, you started this by describing a program that compiled to C. Now
you say you are in the world of pseudo-code and flow-charts. Which is it?

In the world where actual programs get compiled and run, you need to
have semantics, and they aren't expressed in an AST or a BNF. I think
you think that an AST is enough, but it isn't. I'm also not sure what
you actually think because we don't stay on a topic long enough to get
into the details that would shed light.

It's fun to feel like you are right, and it's easy if you change the
topic when the going gets tough. You've done this a number of times.

I've tried to be polite, and I've tried to be helpful, but I'm sorry:
either you don't understand a lot of the terms you are throwing around,
or you aren't disciplined enough to focus on a topic long enough to
explain yourself. Either way, I don't know how else to move the
discussion forward.

--Ned.

MRAB

unread,

Oct 22, 2013, 2:47:01 PM10/22/13

to pytho...@python.org

On 22/10/2013 19:22, Mark Janssen wrote:
>>> Okay. The purpose of BNF (at least as I envision it) is to
>>> produce/specify a *context-free* "grammar". A lexer parses the tokens
>>> specified in the BNF into an Abstract Syntax Tree. If one can produce
>>> such a tree for any given source, the language, in theory, can be
>>> compiled by GCC into an executable.
>>>
>>> Boom.
>>>
>> But you still need to specify the semantics.
>
> In my world, like writing pseudo-code or flow-charts, the AST *is* the
> semantics. What world are you guys from?
>

The real world. :-)

So what you're saying is that you generate an AST where there are
certain pre-defined operations (primitives?) available?

Grant Edwards

unread,

Oct 22, 2013, 2:49:58 PM10/22/13

to

Firstly, it's not non-global arrays that have undefined contents.
It's _auto_ arrays that have undefined contents.

void foo(void)
{
int a[4]; // non-global, _auto_ variable and has undefined contents
}

void bar(void)
{
static int a[4]; // non-global, _static_ variable initially 0's
}

As to _why_ it's that way, you'd have to ask the guys who decided
that. I supect it's because zeroing static variables involves very
little run-time overhead, while zeroing auto variables could cause
huge amounts of overhead if that auto variable is declared inside the
innermost of nested loops. [Presumably a good optimizing compiler
would not zero an auto variable that was always set before it was
referenced, but it takes a lot of smarts for compiler to figure that
out correctly 100% of the time -- probably more smarts than a PDP-11
had room for.]

--
Grant Edwards grant.b.edwards Yow! Let's send the
at Russians defective
gmail.com lifestyle accessories!

Grant Edwards

unread,

Oct 22, 2013, 2:58:19 PM10/22/13

to

On 2013-10-22, Neil Cerutti <ne...@norwich.edu> wrote:
> On 2013-10-22, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>> On Tue, 22 Oct 2013 14:04:57 +0000, Dave Angel wrote:
>>> but here you go on to say the C code is unsafely skipping
>>> initialization, which is not the case.
>>
>> Are you talking generically, or specifically about the C code
>> referenced in the link I gave?
>>
>> "Memory is always zeroed" is one of the advantages of Go over C and C++,
>> at least according to Rob Pike:
>>
>> http://commandcenter.blogspot.com.au/2012/06/less-is-exponentially-
>> more.html
>
> Go initializes variables to defined zero values, not simply to
> all-bits zero as (I think) C does.

C initializes to defined zero values. For most machines in use today,
those values _happen_ to be all-bits-zero.

This makes the implementation trivial: chuck them all into some
pre-defined section (e.g. ".bss"), and then on startup, you zero-out
all the bits in the section without having to know what's where within
that section. If you design a machine such that integer, pointer, and
FP representations where 0, NULL, and 0.0 are all zero-bits, then life
get's tougher for the guys writing the compiler and startup code.

But the language doesn't require or assume that initializer values are
all-bits-zero. FP values have to be initizliaed to 0.0. Int's have
to be initialized to integer value 0, pointers have to be initialized
to (void*)0. Nowhere it the languauge is it required or assumed that
any or all of those values are all represented by contiguous blocks of
all-zero-bits.

> This isn't as great a feature as it seems, since the zero value for
> some built in types, e.g., map, is unusable without manual
> construction. In addition, you can't define a zero value for your own
> types.

--

Grant Edwards grant.b.edwards Yow! Oh my GOD -- the
at SUN just fell into YANKEE
gmail.com STADIUM!!

Mark Lawrence

unread,

Oct 22, 2013, 2:58:03 PM10/22/13

to pytho...@python.org

No need for me to discuss that as I used to be big headed but now I'm
perfect.

--
Python is the second best programming language in the world.
But the best has yet to be invented. Christian Tismer

Mark Lawrence

Ned Batchelder

unread,

Oct 22, 2013, 1:56:09 PM10/22/13

to Mark Janssen, Python List, Dave Angel, Steven D'Aprano

On 10/22/13 1:50 PM, Mark Janssen wrote:
>>> So which of you is confused? I ask that in the inclusive (not
>>> exclusive OR) sense.... ;^) <-- face says "both".

>> Could you please be less snarky? We're trying to communicate here, and it
>> is not at all clear yet who is confused and who is not. If you are
>> interested in discussing technical topics, then discuss them.

> Okay. The purpose of BNF (at least as I envision it) is to
> produce/specify a *context-free* "grammar". A lexer parses the tokens
> specified in the BNF into an Abstract Syntax Tree. If one can produce
> such a tree for any given source, the language, in theory, can be
> compiled by GCC into an executable.
>
> Boom.

Hmm, I don't hear the boom yet. An Abstract Syntax Tree is a tree
representation of a program. To use my previous example: the program
"123 *!? 456" would become a tree:

op: "*!?"
num: 123
num: 456

There's still not enough information to compile this. The operator *!?
needs to have a meaning assigned to it. That's the role of the semantic
specification of a language. That has to be provided somehow. A BNF,
or a grammar, or a syntax, or an AST can't provide the semantics. That
was the original point: syntax isn't enough.

--Ned.

Piet van Oostrum

unread,

Oct 22, 2013, 3:20:02 PM10/22/13

to

Neil Cerutti <ne...@norwich.edu> writes:

>
> Context-sensitive grammars can be parse, too.
>

That's not English. Do you mean "parsed"?

But context-sentitive grammars cannot be specified by BNF.

Neil Cerutti

unread,

Oct 22, 2013, 3:27:03 PM10/22/13

to

On 2013-10-22, Piet van Oostrum <pi...@vanoostrum.org> wrote:
> Neil Cerutti <ne...@norwich.edu> writes:
>> Context-sensitive grammars can be parse, too.
>
> That's not English. Do you mean "parsed"?

Thanks, yes, I meant parsed.

> But context-sentitive grammars cannot be specified by BNF.

Yes. I thought Mark might have had a misconception that all
programming languages have to be defined in BNF.

--
Neil Cerutti

Mark Lawrence

unread,

Oct 22, 2013, 3:32:07 PM10/22/13

to pytho...@python.org

On 22/10/2013 20:20, Piet van Oostrum wrote:
> Neil Cerutti <ne...@norwich.edu> writes:
>
>>
>> Context-sensitive grammars can be parse, too.
>>
> That's not English. Do you mean "parsed"?
>
> But context-sentitive grammars cannot be specified by BNF.
>

That's not English. Do you mean "context-sensitive"? :)

Mark Lawrence

unread,

Oct 22, 2013, 3:38:33 PM10/22/13

to pytho...@python.org

Please be kind enough to disambiguate Mark, as I would not wish to be
tarred with the same brush.

TIA.

Neil Cerutti

unread,

Oct 22, 2013, 4:00:27 PM10/22/13

to

On 2013-10-22, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> On 22/10/2013 20:27, Neil Cerutti wrote:
>> On 2013-10-22, Piet van Oostrum <pi...@vanoostrum.org> wrote:
>>> Neil Cerutti <ne...@norwich.edu> writes:
>>>> Context-sensitive grammars can be parse, too.
>>>
>>> That's not English. Do you mean "parsed"?
>>
>> Thanks, yes, I meant parsed.
>>
>>> But context-sentitive grammars cannot be specified by BNF.
>>
>> Yes. I thought Mark might have had a misconception that all
>> programming languages have to be defined in BNF.
>
> Please be kind enough to disambiguate Mark, as I would not wish
> to be tarred with the same brush.

Oops! I didn't notice he'd dropped out of the attributions. I was
speculating about Mark Janssen, not you.

--
Neil Cerutti

Grant Edwards

unread,

Oct 22, 2013, 4:26:00 PM10/22/13

to

On 2013-10-22, Grant Edwards <inv...@invalid.invalid> wrote:

> C initializes to defined zero values. For most machines in use today,
> those values _happen_ to be all-bits-zero.
>
> This makes the implementation trivial: chuck them all into some
> pre-defined section (e.g. ".bss"), and then on startup, you zero-out
> all the bits in the section without having to know what's where within
> that section. If you design a machine such that integer, pointer, and
> FP representations where 0, NULL, and 0.0 are all zero-bits, then life

^
not

> get's tougher for the guys writing the compiler and startup code.

--
Grant Edwards grant.b.edwards Yow! We are now enjoying
at total mutual interaction in
gmail.com an imaginary hot tub ...

Chris Angelico

unread,

Oct 22, 2013, 5:43:12 PM10/22/13

to pytho...@python.org

On Wed, Oct 23, 2013 at 4:27 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Tue, 22 Oct 2013 23:20:52 +1100, Chris Angelico wrote:
>
>> Considering that rapiding took about 1200ms (ish - again, cold cache)
>> previously, adding even just 250ms is noticeable.
>
> Please excuse my skepticism, but in my experience, that would probably
> mean in practice:
>
> ... rapiding took about 1200ms, plus or minus 200ms, plus 500ms if the
> system is under load, plus 800ms if the developer vagued out for a
> moment, plus 1900ms if he happened to be scratching an itch, plus 2700ms
> if the anti-virus happened to be scanning something, plus 4100ms if the
> dev decided this was a good time to take a sip of coffee, plus 437000ms
> if he needed to make the coffee first, plus 72000000ms if he was just
> taking a moment to check something on Reddit or answer an email...

Yes; and more importantly, at the times when it actually would be
used, the cache will likely be warm.

> You're right, of course, that 1/4 second is noticeable. I just find it
> hard to credit that it's *significant* in the circumstances you're
> describing. But I could be wrong.

Yeah, but the problem here is a fundamental of human nature. I
mentioned earlier that I was trying to "sell" the notion of an instant
pre-check of the code. I use one myself, have done for ages, and it's
helped me often enough that I don't care if it occasionally adds a
second to my time. (I have a few such assistants that involve
searching through recent git commit messages, so they can take
multiple seconds if the cache is cold. I don't care; once it's in
cache, it's way faster.) But imagine sitting beside a skeptical fellow
developer and saying, "Enable this line here and it'll catch these
sorts of bugs before you even spin it up in your testbox" - and he
hits F7 and notices that it takes twice as long as he's used to.
That's going to make it a pretty hard sell. That's why I wanted to get
the time cost of the smoke test as low as I possibly could, even on a
cold cache.

(I'm used to tough sells at my workplace. It took years before we
adopted source control.... yes, seriously. And even once we did, I had
to fight to get other developers to make useful commits and commit
messages. One in particular - who, fortunately, is no longer with us -
saw the whole thing as a nuisance that got in the way of his genius,
so he'd just commit once at the end of the day with a barely-useful
message. But then, he was pretty astounding in a lot of other areas,
too - really amazing... like using the For-Case paradigm in PHP with
an off-by-one error...)

ChrisA

Piet van Oostrum

unread,

Oct 22, 2013, 6:11:41 PM10/22/13

to

Mark Janssen <dreamin...@gmail.com> writes:

>>> ....Is your language Turing complete?
>>>
>>
>> 1) No, it's not.
>> 2) So what? That should make it easier to compile to C, if anything.
>> 3) Don't change the subject.
>
> Well, if your language is not Turing complete, it is not clear that
> you will be able to compile it at all. That's the difference between
> a calculator and a computer.

You think a language that is not Turing-complete cannot be compiled?
What nonsense is that. Please Mark, spare us your nonsense.

Grant Edwards

unread,

Oct 22, 2013, 6:47:31 PM10/22/13

to

On 2013-10-22, Mark Janssen <dreamin...@gmail.com> wrote:

>>> ....Is your language Turing complete?
>>>
>>
>> 1) No, it's not.
>> 2) So what? That should make it easier to compile to C, if anything.
>> 3) Don't change the subject.
>
> Well, if your language is not Turing complete, it is not clear that
> you will be able to compile it at all.

Whether a language is turing complete or not has _NOTHING_ to do with
whether it can be compiled or not.

> That's the difference between a calculator and a computer.

No, it isn't.

> Thank you. You may be seated.

Dunno what you're smoking, but you sure are getting your money's
worth...

--
Grant

alex23

unread,

Oct 22, 2013, 9:36:40 PM10/22/13

to

On 23/10/2013 4:40 AM, Ned Batchelder wrote:
> I've tried to be polite, and I've tried to be helpful, but I'm sorry:
> either you don't understand a lot of the terms you are throwing around,
> or you aren't disciplined enough to focus on a topic long enough to
> explain yourself. Either way, I don't know how else to move the
> discussion forward.

You forgot to end with a well-warranted "Boom".

Mark Janssen is rapidly becoming Xah Lee 2.0, identical down to the
repugnant misogyny he expresses elsewhere. The only difference is one of
verbosity.

rusi

unread,

Oct 23, 2013, 12:04:19 AM10/23/13

to

Hey! Time to stop punching up a guy -- even (because?) he's a nut -- rather a hard one tho

Michael Torrie

unread,

Oct 23, 2013, 12:05:12 AM10/23/13

to pytho...@python.org

On 10/22/2013 12:28 PM, Mark Janssen wrote:
> Thank you. You may be seated.

Ranting Rick, is that you?

Mark Lawrence

unread,

Oct 23, 2013, 2:06:54 AM10/23/13

to pytho...@python.org

Wow, no punches pulled here. But IMHO thoroughly deserved.

Mark Lawrence

unread,

Oct 23, 2013, 2:13:24 AM10/23/13

to pytho...@python.org

I think that's unfair, rr can be very helpful when discussing IDLE type
issues. In comparison all that appears to have eminated from Tacoma,
Washington is inaccurate nonsense.

Chris Angelico

unread,

Oct 23, 2013, 2:28:29 AM10/23/13

to pytho...@python.org

On Wed, Oct 23, 2013 at 9:11 AM, Piet van Oostrum <pi...@vanoostrum.org> wrote:
> Mark Janssen <dreamin...@gmail.com> writes:
>
>>>> ....Is your language Turing complete?
>>>>
>>>
>>> 1) No, it's not.
>>> 2) So what? That should make it easier to compile to C, if anything.
>>> 3) Don't change the subject.
>>
>> Well, if your language is not Turing complete, it is not clear that
>> you will be able to compile it at all. That's the difference between
>> a calculator and a computer.
>
> You think a language that is not Turing-complete cannot be compiled?
> What nonsense is that. Please Mark, spare us your nonsense.

HQ9+ can be compiled. By the simple technique of making the original
source into a literal and appending the interpreter, you can make a
compiler. Any language that has an interpreter can, therefore, be
compiled, regardless of Turing completeness or any other attribute.

ChrisA

John Nagle

unread,

Oct 23, 2013, 2:48:41 AM10/23/13

to

On 10/20/2013 3:10 PM, victorg...@gmail.com wrote:
> On Sunday, October 20, 2013 3:56:46 PM UTC-2, Philip Herron wrote:
>> I've been working on GCCPY since roughly november 2009 at least in its
>> concept. It was announced as a Gsoc 2010 project and also a Gsoc 2011
>> project. I was mentored by Ian Taylor who has been an extremely big
>> influence on my software development carrer.
>
> Cool!
>
>> Documentation can be found http://gcc.gnu.org/wiki/PythonFrontEnd.
>> (Although this is sparse partialy on purpose since i do not wan't
>> people thinking this is by any means ready to compile real python
>> applications)
>
> Is there any document describing what it can already compile and, if possible, showing some benchmarks?

After reading through a vast amount of drivel below on irrelevant
topics, looking at the nonexistent documentation, and finally reading
some of the code, I think I see what's going on here. Here's
the run-time code for integers:

http://sourceforge.net/p/gccpy/code/ci/master/tree/libgpython/runtime/gpy-object-integer.c

The implementation approach seems to be that, at runtime,
everything is a struct which represents a general Python object.
The compiler is, I think, just cranking out general subroutine
calls that know nothing about type information. All the
type handling is at run time. That's basically what CPython does,
by interpreting a pseudo-instruction set to decide which
subroutines to call.

It looks like integers and lists have been implemented, but
not much else. Haven't found source code for strings yet.
Memory management seems to rely on the Boehm garbage collector.
Much code seems to have been copied over from the GCC library
for Go. Go, though, is strongly typed at compile time.

There's no inherent reason this "compiled" approach couldn't work,
but I don't know if it actually does. The performance has to be
very low. Each integer add involves a lot of code, including two calls
of "strcmp (x->identifier, "Int")". A performance win over CPython
is unlikely.

Compare Shed Skin, which tries to infer the type of Python
objects so it can generate efficient type-specific C++ code. That's
much harder to do, and has trouble with very dynamic code, but
what comes out is fast.

John Nagle

Philip Herron

unread,

Oct 23, 2013, 3:25:39 AM10/23/13

to

I think your analysis is probably grossly unfair for many reasons. But your entitled to your opinion.

Current i do not use Bohem-GC (I dont have one yet), i re-use principles from gccgo in the _compiler_ not the runtime. At runtime everything is a gpy_object_t, everything does this. Yeah you could do a little of dataflow analysis for some really really specific code and very specific cases and get some performance gains. But the problem is that the libpython.so it was designed for an interpreter.

So first off your comparing a project done on my own to something like cPython loads of developers 20 years on my project or something PyPy has funding loads of developers.

Where i speed up is absolutely no runtime lookups on data access. Look at cPython its loads of little dictionaries. All references are on the Stack at a much lower level than C. All constructs are compiled in i can reuse C++ native exceptions in the whole thing. I can hear you shouting at the email already but the middle crap that a VM and interpreter have to do and fast lookup is _NOT_ one of them. If you truely understand how an interpreter works you know you cant do this

Plus your referencing really old code on sourceforge is another thing. And i dont want to put out bench marks (I would get so much shit from people its really not worth it) but it i can say it is faster than everything in the stuff i compile so far. So yeah... not only that but your referncing a strncmp to say no its slow yeah it isn't 100% ideal but in my current git tree i have changed that. So i think its completely unfair to reference tiny things and pretend you know everything about my project.

One thing people might find interesting is class i do data flow anaylsis to generate a complete type for that class and each member function is a compiled function like C++ but at a much lower level than C++. The whole project has been about stripping out the crap needed to run user code and i have been successful so far but your comparing a in my spare time project to people who work on their stuff full time. With loads of people etc.

Anyways i am just going to stay out of this from now but your email made me want to reply and rage.

Mark Lawrence

unread,

Oct 23, 2013, 4:42:59 AM10/23/13

to pytho...@python.org

On 23/10/2013 08:25, Philip Herron wrote:

Personally I have no interest in your project but do wish you the best
of luck with your endeavours.

However I do have a personnal interest in my eyesite, which gets
strained by reading posts such as yours. In order to assist me in
looking after my health, would you please be kind enough to read, digest
and action this https://wiki.python.org/moin/GoogleGroupsPython.

TIA.

John Nagle

unread,

Oct 23, 2013, 4:51:27 PM10/23/13

to

On 10/23/2013 12:25 AM, Philip Herron wrote:
> On Wednesday, 23 October 2013 07:48:41 UTC+1, John Nagle wrote:
>> On 10/20/2013 3:10 PM, victorg...@gmail.com wrote:
>>
>>> On Sunday, October 20, 2013 3:56:46 PM UTC-2, Philip Herron
>>> wrote:

> Nagle replies:

>>
>>>> Documentation can be found
>>>> http://gcc.gnu.org/wiki/PythonFrontEnd.

...

>
> I think your analysis is probably grossly unfair for many reasons.
> But your entitled to your opinion.
>
> Current i do not use Bohem-GC (I dont have one yet),

You included it in your project:

http://sourceforge.net/p/gccpy/code/ci/master/tree/boehm-gc

> i re-use
> principles from gccgo in the _compiler_ not the runtime. At runtime
> everything is a gpy_object_t, everything does this. Yeah you could do
> a little of dataflow analysis for some really really specific code
> and very specific cases and get some performance gains. But the
> problem is that the libpython.so it was designed for an interpreter.
>
> So first off your comparing a project done on my own to something
> like cPython loads of developers 20 years on my project or something
> PyPy has funding loads of developers.
>
> Where i speed up is absolutely no runtime lookups on data access.
> Look at cPython its loads of little dictionaries. All references are
> on the Stack at a much lower level than C. All constructs are
> compiled in i can reuse C++ native exceptions in the whole thing. I
> can hear you shouting at the email already but the middle crap that a
> VM and interpreter have to do and fast lookup is _NOT_ one of them.
> If you truely understand how an interpreter works you know you cant
> do this
>
> Plus your referencing really old code on sourceforge is another
> thing.

That's where you said to look:

http://gcc.gnu.org/wiki/PythonFrontEnd

"To follow gccpy development see: Gccpy SourceForge
https://sourceforge.net/projects/gccpy"

> And i dont want to put out bench marks (I would get so much
> shit from people its really not worth it) but it i can say it is
> faster than everything in the stuff i compile so far. So yeah... not
> only that but your referncing a strncmp to say no its slow yeah it
> isn't 100% ideal but in my current git tree i have changed that.

So the real source code isn't where you wrote that it is?
Where is it, then?

> So i
> think its completely unfair to reference tiny things and pretend you
> know everything about my project.

If you wrote more documentation about what you're doing,
people might understand what you are doing.

> One thing people might find interesting is class i do data flow

> analysis to generate a complete type for that class and each member

> function is a compiled function like C++ but at a much lower level
> than C++.

It's not clear what this means. Are you trying to determine, say,
which items are integers, lists, or specific object types?
Shed Skin tries to do that. It's hard to do, but very effective
if you can do it. In CPython, every time "x = a + b" is
executed, the interpreter has to invoke the general case for
"+", which can handle integers, floats, strings, NumPy, etc.
If you can infer types, and know it's a float, the run
time code can be float-specific and about three machine
instructions.

> The whole project has been about stripping out the crap
> needed to run user code and i have been successful so far but your
> comparing a in my spare time project to people who work on their
> stuff full time. With loads of people etc.

Shed Skin is one guy.

> Anyways i am just going to stay out of this from now but your email
> made me want to reply and rage.

You've made big claims without giving much detail. So people
are trying to find out if you've done something worth paying
attention to.

John Nagle

Stefan Behnel

unread,

Oct 24, 2013, 2:47:59 AM10/24/13

to pytho...@python.org

Antoine Pitrou, 22.10.2013 10:55:
> Philip Herron writes:
>> Its interesting a few things come up what about:
>> exec and eval. I didn't really have a good answer for this at my talk at
>> PYCon IE 2013 but i am going to say no. I am
>> not going to implement these. Partly because eval and exec at least to me
>> are mostly from developing
>> interpreters as a debugging exercise so the test doesn't have to invoke
>> the program properly and feed in
>> strings to interpret at least thats what i have done in the past with an
>> virtual machine i wrote before gccpy.
>
> If you don't implement exec() and eval() then people won't be able to use
> namedtuples, which are a common datatype factory.

FWIW, for Cython, I personally consider eval() and exec() more of a handy
way to insert plain Python code (potentially even generated at runtime)
into compiled code, so they are not currently compiled. You can see that
from the low Mako benchmark results, for example. We may eventually add an
option to compile that code at runtime (assuming you have Cython and a C
compiler installed), but I doubt that people would want that as the default.

Obviously, Cython has the advantage of being backed by a CPython runtime,
so it's easy for us to choose one or the other. An independent Python
implementation doesn't easily have that choice.

> As for the rest: well, good luck writing an AOT compiler producing
> interesting results on average *pure* Python code. It's already been tried
> a number of times, and has generally failed. Cython mitigates the issue by
> exposing a superset of Python (including type hints, etc.).

Agreed, although the word "mitigate" makes it sound more like a work-around
than the actual feature it represents. I've written down my thoughts on
this topic a while ago.

http://blog.behnel.de/index.php?p=241

Stefan

Mark Lawrence

unread,

Oct 24, 2013, 11:40:30 PM10/24/13

to pytho...@python.org

On 22/10/2013 18:37, Oscar Benjamin wrote:

>
> OTOH why in particular would you want to initialise them with zeros? I
> often initialise arrays to nan which is useful for debugging. It means
> that you can see if you made a mistake when you were supposed to have
> initialised everything to useful values. In many contexts it would be
> difficult to distinguish between a valid zero and a zero because you
> haven't yet inserted a value. This kind of error can show up more
> quickly if you don't zero the memory since the uninitialised values
> will often be out of range for what you expected and will give very
> noticeable results (such as a seg-fault).
>

In his book "Writing Solid Code" Steve Maguire states that he
initialises with 0xA3 for Macintosh programs, and that Microsoft uses
0xCC, for exactly the reasons that you describe above.

Mark Janssen

unread,

Oct 25, 2013, 7:55:43 AM10/25/13

to Mark Lawrence, Python List

On Thu, Oct 24, 2013 at 8:40 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> On 22/10/2013 18:37, Oscar Benjamin wrote:
>> OTOH why in particular would you want to initialise them with zeros? I
>> often initialise arrays to nan which is useful for debugging.

Is this some kind of joke? What has this list become?

--
MarkJ
Tacoma, Washington

Dan Sommers

unread,

Oct 25, 2013, 8:55:22 AM10/25/13

to

We used to initialize RAM to 0xdeadbeef on CPU reset (and sometimes
calls to free in a debugging environment) for the same reason: if a
program crashesd, and I saw that value in one of the CPU registers, then
I knew that some part of the program accessed "uninitialized" (or freed)
memory. That pattern also sticks out like a sore thumb (insert your own
C++/hammer joke here) in a core dump.

That said, I seem to recall that somewhere along the way, ANSI C began
to guarantee that certain static (in the technical sense) values were
initialized to 0, or NULL, or something like that, on program startup,
before any user-level code executed.

Dan

Mark Lawrence

unread,

Oct 25, 2013, 10:18:59 AM10/25/13

to pytho...@python.org

On 25/10/2013 12:55, Mark Janssen wrote:
> On Thu, Oct 24, 2013 at 8:40 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
>> On 22/10/2013 18:37, Oscar Benjamin wrote:
>>> OTOH why in particular would you want to initialise them with zeros? I
>>> often initialise arrays to nan which is useful for debugging.
>
> Is this some kind of joke? What has this list become?
>

What did *I* write? Something about real world practical programming
that a text book theorist such as yourself wouldn't understand. The
only snag is you haven't quoted anything that I've written.

Ned Batchelder

unread,

Oct 25, 2013, 10:35:16 AM10/25/13

to Mark Janssen, Python List

On 10/25/13 7:55 AM, Mark Janssen wrote:
> On Thu, Oct 24, 2013 at 8:40 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
>> On 22/10/2013 18:37, Oscar Benjamin wrote:
>>> OTOH why in particular would you want to initialise them with zeros? I
>>> often initialise arrays to nan which is useful for debugging.
> Is this some kind of joke? What has this list become?
>

It's a useful debugging technique to initialize memory to distinctive
values that should never occur in real data. Perhaps it better to say,
"pre-initialize". If the program is working correctly, then that data
will be written over with actual initial values, and you'll never see
the distinctive values. But if your program does encounter one of those
values, it's clear that there's a bug that needs to be fixed.
Additionally, if you have a number of different distinctive values, then
the actual value encountered provides a clue as to what might have gone
wrong.

In an array of floats, initializing to NaN would be very useful, since
NaNs propagate through calculations, or raise exceptions.

--Ned.

Mark Janssen

unread,

Oct 25, 2013, 2:26:06 PM10/25/13

to Ned Batchelder, Python List

>>>> OTOH why in particular would you want to initialise them with zeros? I
>>>> often initialise arrays to nan which is useful for debugging.
>>
>> Is this some kind of joke? What has this list become?
>
> It's a useful debugging technique to initialize memory to distinctive values
> that should never occur in real data.

If you're doing this, you're doing something wrong. Please give me
the hex value for NaN so I can initialize with my array.

--
MarkJ
Tacoma, Washington

Mark Lawrence

unread,

Oct 25, 2013, 2:40:28 PM10/25/13

to pytho...@python.org

It is clear that you know as much about debugging as you do about
objects and message passing, a summary here for the uninitiated
http://code.activestate.com/lists/python-ideas/19908/. I can see why the
BDFL described you as an embarrassment, and if he didn't, he certainly
should have done.

Mark Janssen

unread,

Oct 25, 2013, 2:45:43 PM10/25/13

to Mark Lawrence, Python List

>>>>>> OTOH why in particular would you want to initialise them with zeros? I
>>>>>> often initialise arrays to nan which is useful for debugging.
>>>>
>>>> Is this some kind of joke? What has this list become?
>>>
>>> It's a useful debugging technique to initialize memory to distinctive
>>> values that should never occur in real data.
>>
>> If you're doing this, you're doing something wrong. Please give me
>> the hex value for NaN so I can initialize with my array.
>
> It is clear that you know as much about debugging as you do about objects

> and message passing [...] can see why the

> BDFL described you as an embarrassment, and if he didn't, he certainly
> should have done.

Clearly the python list has been taken over by TheKooks. Notice he
did not respond to the request. Since we are talking about digital
computers (with digital memory), I'm really curious what the hex value
for NaN is to initialize my arrays....

All hail chairman Meow. Dismissed.

--
MarkJ
Tacoma, Washington

rusi

unread,

Oct 25, 2013, 2:59:12 PM10/25/13

to

On Saturday, October 26, 2013 12:15:43 AM UTC+5:30, zipher wrote:
> Clearly the python list has been taken over by TheKooks. Notice he
> did not respond to the request. Since we are talking about digital
> computers (with digital memory), I'm really curious what the hex value
> for NaN is to initialize my arrays....

I dont see how thats any more relevant than:
Whats the hex value of the add instruction?

Presumably floating point numbers are used for FP operations
FP operations barf on nans
So a mis-initialized array in which a nan shows up can be easily caught

Yes nans are not one value but a class
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_math.html

but so what?

Mark Lawrence

unread,

Oct 25, 2013, 3:02:00 PM10/25/13

to pytho...@python.org

On 25/10/2013 19:45, Mark Janssen wrote:
>>>>>>> OTOH why in particular would you want to initialise them with zeros? I
>>>>>>> often initialise arrays to nan which is useful for debugging.
>>>>>
>>>>> Is this some kind of joke? What has this list become?
>>>>
>>>> It's a useful debugging technique to initialize memory to distinctive
>>>> values that should never occur in real data.
>>>
>>> If you're doing this, you're doing something wrong. Please give me
>>> the hex value for NaN so I can initialize with my array.
>>
>> It is clear that you know as much about debugging as you do about objects
>> and message passing [...] can see why the
>> BDFL described you as an embarrassment, and if he didn't, he certainly
>> should have done.
>

> Clearly the python list has been taken over by TheKooks. Notice he
> did not respond to the request. Since we are talking about digital
> computers (with digital memory), I'm really curious what the hex value
> for NaN is to initialize my arrays....
>

> All hail chairman Meow. Dismissed.
>

Reinstating http://code.activestate.com/lists/python-ideas/19908/ where
you're described as a quack, which I assume in this context makes you an
expert on duck typing, which should obviously be abbreviated to ducking.

As for the hex value for Nan who really gives a toss? The whole point
is that you initialise to something that you do not expect to see. Do
you not have a text book that explains this concept?

Grant Edwards

unread,

Oct 25, 2013, 3:06:27 PM10/25/13

to

On 2013-10-25, Mark Janssen <dreamin...@gmail.com> wrote:
>>>>> OTOH why in particular would you want to initialise them with zeros? I
>>>>> often initialise arrays to nan which is useful for debugging.
>>>
>>> Is this some kind of joke? What has this list become?
>>
>> It's a useful debugging technique to initialize memory to distinctive
>> values that should never occur in real data.
>
> If you're doing this, you're doing something wrong.

Pardon me if I don't take your word for it.

> Please give me the hex value for NaN so I can initialize with my
> array.

Seriously? You haven't discovered google and wikepedia yet?

http://www.google.com/
http://en.wikipedia.org/

Assuming you're using IEEE-754, all 1's is a quiet NaN:

http://en.wikipedia.org/wiki/IEEE_floating_point
http://en.wikipedia.org/wiki/NaN

If you want a signaling NaN you've got to change one of the bits (see
the above links).

IIRC, the Pascal language required that using unintialized variables
caused an error. intializing FP values to a signalling NaN is a very
convenient way to do that.

--
Grant Edwards grant.b.edwards Yow! I'm also against
at BODY-SURFING!!
gmail.com

Mark Janssen

unread,

Oct 25, 2013, 3:09:09 PM10/25/13

to rusi, Python List

On Fri, Oct 25, 2013 at 11:59 AM, rusi <rusto...@gmail.com> wrote:
> On Saturday, October 26, 2013 12:15:43 AM UTC+5:30, zipher wrote:
>> Clearly the python list has been taken over by TheKooks. Notice he
>> did not respond to the request. Since we are talking about digital
>> computers (with digital memory), I'm really curious what the hex value
>> for NaN is to initialize my arrays....
>
> I dont see how thats any more relevant than:
> Whats the hex value of the add instruction?

You "don't see". That is correct. Btw, I believe the hex value for
the add instruction on the (8-bit) Intel 8088 is x0. Now what were
you saying?

--
MarkJ
Tacoma, Washington

rusi

unread,

Oct 25, 2013, 3:15:45 PM10/25/13

to

On Saturday, October 26, 2013 12:39:09 AM UTC+5:30, zipher wrote:

> On Fri, Oct 25, 2013 at 11:59 AM, rusi wrote:
> >
> > I dont see how thats any more relevant than:
> > Whats the hex value of the add instruction?
>
>
> You "don't see". That is correct. Btw, I believe the hex value for
> the add instruction on the (8-bit) Intel 8088 is x0. Now what were
> you saying?

There are a dozen different (hex values) for the add instruction -- depending on operand sizes/directions/immediate-operand/special-registers etc.

So just as there is one add instruction with many hex values
there is one nan as a concept with many different hex values.

Are you arguing for the sake of arguing?
If so lets at least agree on what you are arguing about!!

Mark Janssen

unread,

Oct 25, 2013, 3:18:58 PM10/25/13

to Mark Lawrence, Python List

> As for the hex value for Nan who really gives a toss? The whole point is
> that you initialise to something that you do not expect to see. Do you not
> have a text book that explains this concept?

No, I don't think there is a textbook that explains such a concept of
initializing memory to anything but 0 -- UNLESS you're from Stupid
University.

Thanks for providing fodder...

Mark Janssen, Ph.D.
Tacoma, WA

Mark Lawrence

unread,

Oct 25, 2013, 3:36:16 PM10/25/13

to pytho...@python.org

We've been discussing *DEBUGGING*. If you can't be bothered to read
what's written please do not respond.

Mark Janssen

unread,

Oct 25, 2013, 3:49:34 PM10/25/13

to Mark Lawrence, Python List

>>> As for the hex value for Nan who really gives a toss? The whole point is
>>> that you initialise to something that you do not expect to see. Do you
>>> not have a text book that explains this concept?
>>
>> No, I don't think there is a textbook that explains such a concept of
>> initializing memory to anything but 0 -- UNLESS you're from Stupid
>> University.
>>
>> Thanks for providing fodder...
>

> We've been discussing *DEBUGGING*.

Are you making it LOUD and *clear* that you don't know what you're
talking about?

Input: Yes/no

MarkJanssen

Mark Lawrence

unread,

Oct 25, 2013, 4:14:13 PM10/25/13

to pytho...@python.org

[sorry if things below are a bit messy]

no

Now please explain what you do not understand about the data below
that's been written by Oscar Benjamin, myself and Ned Batchelder,
specifically the use of the word *DEBUGGING*. Is this a word that does
not appear in your text books? If that is in fact the case would you
like one of the experienced practical programmers on this list to
explain it to you? Have you ever bothered to read "The Zen of Python",
specifically the bit about "Practicality beats purity"?

>
> MarkJanssen
>

On 25/10/2013 04:40, Mark Lawrence wrote:> On 22/10/2013 18:37, Oscar

Benjamin wrote:
>
>>
>> OTOH why in particular would you want to initialise them with zeros? I

>> often initialise arrays to nan which is useful for debugging. It means
>> that you can see if you made a mistake when you were supposed to have
>> initialised everything to useful values. In many contexts it would be
>> difficult to distinguish between a valid zero and a zero because you
>> haven't yet inserted a value. This kind of error can show up more
>> quickly if you don't zero the memory since the uninitialised values
>> will often be out of range for what you expected and will give very
>> noticeable results (such as a seg-fault).
>>
>
> In his book "Writing Solid Code" Steve Maguire states that he
> initialises with 0xA3 for Macintosh programs, and that Microsoft uses
> 0xCC, for exactly the reasons that you describe above.
>

On 25/10/2013 15:35, Ned Batchelder wrote:> On 10/25/13 7:55 AM, Mark

Janssen wrote:
>> On Thu, Oct 24, 2013 at 8:40 PM, Mark Lawrence
>> <bream...@yahoo.co.uk> wrote:

>>> On 22/10/2013 18:37, Oscar Benjamin wrote:
>>>> OTOH why in particular would you want to initialise them with zeros? I
>>>> often initialise arrays to nan which is useful for debugging.
>> Is this some kind of joke? What has this list become?
>>
>
> It's a useful debugging technique to initialize memory to distinctive

> values that should never occur in real data. Perhaps it better to say,
> "pre-initialize". If the program is working correctly, then that data
> will be written over with actual initial values, and you'll never see
> the distinctive values. But if your program does encounter one of those
> values, it's clear that there's a bug that needs to be fixed.
> Additionally, if you have a number of different distinctive values, then
> the actual value encountered provides a clue as to what might have gone
> wrong.
>
> In an array of floats, initializing to NaN would be very useful, since
> NaNs propagate through calculations, or raise exceptions.
>
> --Ned.