Message from discussion Bison deterministic LALR(1) parser for Java/C++ (kind of complex langauge) without 'lexar hack' support
Received: by 10.66.85.98 with SMTP id g2mr2353195paz.37.1345597159823;
Tue, 21 Aug 2012 17:59:19 -0700 (PDT)
From: BGB <cr88...@hotmail.com>
Subject: Re: Bison =?UTF-8?B?ZGV0ZXJtaW5pc+KAi3RpYyBMQUxSKDEpIHBhcnNlciBm?= =?UTF-8?B?b3IgSmF2YS9DKysgKGtpbmQgb2YgY29tcGxleCBsYW5nYXVnZSkgd2l0aG91dCA=?= =?UTF-8?B?J2xleGFyIGhhY2snIHN1cHBvcnQ=?=
Date: Tue, 21 Aug 2012 14:45:33 -0500
References: <email@example.com> <firstname.lastname@example.org> <email@example.com>
X-Trace: leila.iecc.com 1345597159 14928 184.108.40.206 (22 Aug 2012 00:59:19 GMT)
NNTP-Posting-Date: Wed, 22 Aug 2012 00:59:19 +0000 (UTC)
Keywords: C, performance
Posted-Date: 21 Aug 2012 20:59:19 EDT
Content-Type: text/plain; charset=UTF-8; format=flowed
On 8/20/2012 8:35 AM, Anton Ertl wrote:
> Hans-Peter Diettrich <DrDiettri...@aol.com> writes:
>>> [Get it working in bison, then in the unlikely event that's not fast
>>> enough, profile your compiler to see where it's spending its time and
>>> fix what needs to be fixed. Although in theory GLR can be very slow,
>>> in practice the ambiguities are generally resolved within a few tokens
>>> and the performance is fine. compilers always spend way more time in
>>> the lexer than the parser anyway. Writing RD parsers by hand can be
>>> fun, but you never know what language it actually parses. -John]
>> There exist parser generators for several models. I also doubt that -
>> except in misdesigned C-ish languages - a compiler spends significant
>> time in the lexer. This may be true for dummy parsers, which do
>> nothing but syntax checks, but not for compilers with code generation,
>> optimization and more.
> Code generation and optimization do not change the relation between
> the time spent in scanning and in parsing. Moreover, if the compiler
> spends most of the time in code generation, optimization and/or
> "more", there is even less reason to worry about parsing speed.
(sorry in advance for sort of wandering off on a tangent...).
a major thing that I think skews things so much in the case of C is just
how much stuff can get pulled in by the preprocessor, the bulk of which
ends up quickly being discarded by subsequent stages (and much of the
rest is only minimally processed by later stages).
actually, a fairly large portion of the time may actually be spent in
the preprocessor, which may basically sit around, for each source line:
splitting it into tokens;
checking if any of the identifiers in the source-line are names for
macros (and expanding as necessary);
repeating the above until no more macro expansions are seen;
recomposing the line and spitting it into the output.
then, for example, there are typically large numbers of typedefs and
structure definitions, very few of which may end up actually being used
by the code in question (most just get put into a table and then forgotten).
then, after all this, the compiler is typically left looking at a fairly
small chunk of (actual) code.
if we spend 1000ms preprocessing and parsing the code, and 10ms or 100ms
compiling it, where has most of the time gone?...
OTOH, in my scripting language (which doesn't use headers), I have
generally been looking at millisecond-range compile times.
although admittedly most of my compilers have tended to be pretty dumb,
typically working like (for the upper end):
split into tokens and parse the code directly into an AST;
perform basic simplifications on the AST (evaluating constant expressions);
perform basic type-analysis / inference (generally forward propagation
driven by declarations and assignment operations);
flatten this out into a bytecode-style format (binary serialization is
this is then currently followed by a backend which translates the
bytecode into threaded-code and runs this (generally also pretty fast,
and generally functions/methods are translated on call).
my code-generator backend was a little more involved, basically doing
its own type-analysis, mapping the stack to registers and temporaries,
and using peephole optimization. this backend fell into disuse mostly
because I was having a hard time really debugging it, and it was eating
up a lot of my time trying to do so (threaded code generally being a
unlike many other compilers though, my stuff is generally based on a
stack-machine model, so pretty much everything (from intermediate
values, to handling local variables and environment scopes) is handled
using a stack-machine model.
for example, in my bytecode format, local variables (and function
arguments, and parent scopes) are identified by their position on the
variable stack (with declarations pushing new variables onto the stack,
and other operations popping them back off).
note that my bytecode is lexically scoped, but tends to see the entire
lexical environment as a big stack (which may actually be split across
multiple call frames, or have parts be allocated on the heap as well).
there are also the dynamic, object ("this" scope), and toplevel
environments, but lookups into these are given by name or by
name+signature, rather than by an explicit stack index as in the case of
the lexical environment. note that packages/import build on delegation,
which is built on the object scope (packages are actually implemented as
objects, and imports are actually lexical variables used to delegate to
but, in general, I just seem to find the stack model generally a lot
easier to understand and work with.
my code generator also tied register allocation and lifetime to these
stacks as well, although later on it had added temporaries which were
identified by handle and had an independent lifetime, rather than being
tied to the stack-machine model (and were, ironically, much closer to
how the CPU stack was actually being utilized by the code generator, as
the evaluation stack and local/environment stack were not mapped 1:1
with the CPU stack, whereas the temp-vars were keyed 1:1 with their
however, it was introducing temporaries and registers which existed
independently of the stack-machine model which is where the pain began.
if I ever get back to working on or writing a new native code generator,
I will probably just put the temporaries on their own stack, thus any
temporary entering its lifetime is pushed onto the temp-var stack, and
popped off when its lifetime ends, mostly in the name of being less of a
PITA to work with.
note that, in any case, the actual stack-frame (at the machine-level) is
basically a fixed-size region which is divided up and allocated (as
appropriate) for things like holding variables and evaluation-stack
entries (and treated as an array of machine-words allocated using a bitmap).
note that some minor ABI hacks were made to allow for things like
lexical scoping, but otherwise it was using the same ABI as C (C style
block functions being callable directly as C function pointers).
currently, the threaded-code uses a more narrowly defined subset of the
C ABI, basically where all calls have exactly 2 arguments: the VM
context and the current threaded-opcode context. this, combined with
most of the logic being interpreter-style and written in C, does a bit
of harm to the performance though (but, it is still "fast enough" for
general application scripting tasks).
but, hopefully with hope and luck, I will be able to justify the effort
to be able to return to using a full-featured JIT.
granted, I could always just be like, "hell with it, I am just going to
do it..." (whether or not it is worthwhile), which sort of ends up being
invoked occasionally (for things where I can't really justify doing, but
want to do them anyways...).