Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

IMCC Reentarancy

1 view
Skip to first unread message

Vishal Soni

unread,
Jul 16, 2006, 11:43:00 PM7/16/06
to Perl 6 Internals
Hi,

I have been working on trying to make reenterant and/or thread-safe. There
are couple of things that have come up which might make it difficult to make
the existing implemention thread-safe/re-entrant.

The current implementation is implemented using Flex and YACC. Flex
implementation has limitations in C mode. The C lexer generated by flex
cannot be reentrant/threadsafe. Flex generates thread-safe parsers only in
C++ mode. This limation of flex will defeat the whole effort of removing
global variables from IMCC. In my opinion if we cannot get global variable
free code from flex there is no sense in proceeding with cleaning up the
other global variables.

Audrey Tang reccomended using re2c as an alternative to flex (Lemon Parser
replacemet for Yacc). re2c generates reenterant/thread-safe parsers. I also
spend some time reading up on the paper published with re2c. Initial
indicators are that it produces scanners that run faster than flex.

So here are some options that I have come up with that we have. I would like
you guys and especially Allison and Chip to provide some feedback on how to
proceed further.

1st Option: Hack it and patch it to death !!!
-------------------------------------------------------
Since flex is not generating re-eentrant code, this option will get rid of
flex altogether and replace it with re2c. This would require significant
reworking on the code. So the plan of action would be as follows:
a. Remove flex and implement re2c

2nd: Inaction is the best action !!!
-------------------------------------------

3rd Option: Back to drawing board !!!
------------------------------------------------


--
Thanks,
Vishal

Vishal Soni

unread,
Jul 16, 2006, 11:57:07 PM7/16/06
to Perl 6 Internals
Hi,

Please disregard the previous mail. Hit the wrong shortcut key!!

I have been working on trying to make reenterant and/or thread-safe. There
are couple of things that have come up which might make it difficult to make

the existing implementation thread-safe/re-entrant.

The current implementation is implemented using Flex and YACC. Flex
implementation has limitations in C mode. The C lexer generated by flex
cannot be reentrant/threadsafe. Flex generates thread-safe parsers only in

C++ mode. This limition of flex will defeat the whole effort of removing


global variables from IMCC. In my opinion if we cannot get global variable
free code from flex there is no sense in proceeding with cleaning up the
other global variables.

Audrey Tang recommended using re2c as an alternative to flex (Lemon Parser
replacement for Yacc). re2c generates reenterant/thread-safe parsers. I also


spend some time reading up on the paper published with re2c. Initial
indicators are that it produces scanners that run faster than flex.

So here are some options that I have come up with that we have. I would like
you guys and especially Allison and Chip to provide some feedback on how to
proceed further.

1st Option: Hack it and patch it to death !!!
-------------------------------------------------------

Since flex is not generating reeentrant code, this option will get rid of


flex altogether and replace it with re2c. This would require significant
reworking on the code. So the plan of action would be as follows:
a. Remove flex and implement re2c

b. Remove static and global variables

Apart from this we also need to refactor the code to get rid of arrays to a
hash table implementation for macros.

All in all this would be over hauling lot of code.


2nd: Inaction is the best action !!!
-------------------------------------------

Lets not do anything a leave the code as it is. Just say IMCC is not
re-entrant/thread-safe and leave it there We will address this issue in
future. I highly doubt it this is the route we want to take

3rd Option: Back to drawing board !!!
------------------------------------------------

This option would require a complete re-write of IMCC ( possibly could call
it PIRC). The cons of this approach is we will have to re-implement the
whole IMCC again. The programming languages will have to live with IMCC
limitations as long as the new version is ready.

The pros of this approach are
a. A clean implementation rather than a prototypish implementation
b. Make PIR compiler production release ready. The way the compiler sits
right now it is not a good release candidate.
c. Structure the code in a way that is easy to maintain and extend.

The 3rd option is lot of work but might be a good option in the long run.

These are just some of my thoughts.

Please let me know what you guys think or do you have other options in mind.
Whatever it is we need to come to consensus to make the IMCC reenterany a
reality.

As usual please provide feedback.

--
Thanks,

Vishal

--
Thanks,
Vishal

Audrey Tang

unread,
Jul 17, 2006, 12:22:11 AM7/17/06
to Vishal Soni, Perl 6 Internals

在 2006/7/16 下午 11:57 時,Vishal Soni 寫到:

> a. A clean implementation rather than a prototypish implementation

I think that the lemon+re2c, being the more modern parsing tools,
will make refactoring/hacking considerably easier. Whilst you are
converting the current IMCC implementation into the new toolchain,
you'll be in the best position to find out inconsistencies, to-be-
deprecated spots, as well as best strategies to hack in new features,
such as nested expressions and composable macros.

So I'd regard the lemon+re2c refactoring as a good preparation step
("synchronize with the PIR mindset") before a full-fledged rewrite.
Once that's in place, the "rewrite" may be reachable via a set of
gradual refactor+deprecation cycle, which will make transition much,
much easier.

Thanks,
Audrey

PGP.sig

Joshua Hoblitt

unread,
Jul 17, 2006, 3:41:21 PM7/17/06
to Vishal Soni, Perl 6 Internals
4th Option: fix flex. ;)

-J

--

Allison Randal

unread,
Jul 17, 2006, 5:49:09 PM7/17/06
to Vishal Soni, Perl 6 Internals
Vishal Soni wrote:
>
> The current implementation is implemented using Flex and YACC. Flex
> implementation has limitations in C mode. The C lexer generated by flex
> cannot be reentrant/threadsafe. Flex generates thread-safe parsers only in
> C++ mode. This limition of flex will defeat the whole effort of removing
> global variables from IMCC. In my opinion if we cannot get global variable
> free code from flex there is no sense in proceeding with cleaning up the
> other global variables.

This is unfortunate, but not entirely surprising.

> 1st Option: Hack it and patch it to death !!!
> -------------------------------------------------------

> Since flex is not generating reentrant code, this option will get rid of


> flex altogether and replace it with re2c. This would require significant
> reworking on the code. So the plan of action would be as follows:
> a. Remove flex and implement re2c
> b. Remove static and global variables
>
> Apart from this we also need to refactor the code to get rid of arrays to a
> hash table implementation for macros.
>
> All in all this would be over hauling lot of code.

The cost/benefit balance on this solution is not good. A lot of people
are depending on IMCC now, and a refactor of that magnitude will throw
several important projects on Parrot into a dead stall.

So, my answer is: No.

> 2nd: Inaction is the best action !!!
> -------------------------------------------
> Lets not do anything a leave the code as it is. Just say IMCC is not
> re-entrant/thread-safe and leave it there We will address this issue in
> future. I highly doubt it this is the route we want to take

For the short-term, this is the route we want to take. A new PIR/PASM
compiler isn't absolutely necessary for a 1.0 release. IMCC doesn't
really need to be reentrant, it just needs to produce bytecode.

So, my answer is: Yes, but...

> 3rd Option: Back to drawing board !!!
> ------------------------------------------------
>
> This option would require a complete re-write of IMCC ( possibly could call
> it PIRC). The cons of this approach is we will have to re-implement the
> whole IMCC again. The programming languages will have to live with IMCC
> limitations as long as the new version is ready.
>
> The pros of this approach are
> a. A clean implementation rather than a prototypish implementation
> b. Make PIR compiler production release ready. The way the compiler sits
> right now it is not a good release candidate.
> c. Structure the code in a way that is easy to maintain and extend.
>
> The 3rd option is lot of work but might be a good option in the long run.

IMCC was originally implemented as a separate compiler. After a while,
we found it to be so much better than the existing assembler that we
made it the primary way of producing bytecode. It's okay to repeat the
cycle by experimenting with a new compiler that produces bytecode, and
later decide if we want to replace IMCC with it. This doesn't interfere
with IMCC's development.

So, my answer is: Yes, but...

re2c and lemon aren't enough of an improvement over flex and bison to be
worth the pain of rewriting IMCC from scratch. If we do create a new way
of producing bytecode (and it's a safe bet that we will at some point),
I would lean toward using our own tools.

- Patrick is already looking into implementing a version of PGE in C.
This will be an infinitely better parser than any existing alternatives,
so it's worth waiting for.

- We already want an OST(opcode syntax tree)-to-bytecode compiler that
bypasses PIR for the compiler tools. That same compiler could be used to
implement PIR (combined with a lightweight version of TGE in C).

- IMCC is not a straight translator, it also performs optimizations.
These should be implemented in a modular way, with a standard interface,
so that developers can swap in new and improved optimizers as we go
along. The best place to hook them is probably off the OST-to-bytecode
compiler.


This approach does mean that the tools to start an IMCC rewrite aren't
available yet. It's a long-term solution (possibly post-1.0), so we can
afford to take a long-term view.

Allison

Sam Phillips

unread,
Jul 17, 2006, 6:21:44 PM7/17/06
to Perl 6 Internals

On 17 Jul 2006, at 05:22, Audrey Tang wrote:

>
> 在 2006/7/16 下午 11:57 時,Vishal Soni 寫到:
>

> I think that the lemon+re2c, being the more modern parsing tools,
> will make refactoring/hacking considerably easier.

For future reference Ragel is definitely worth a look too:

http://www.cs.queensu.ca/~thurston/ragel/

Cheers,
Sam Phillips

Vishal Soni

unread,
Jul 17, 2006, 7:59:48 PM7/17/06
to Allison Randal, Perl 6 Internals
On Mon, 2006-07-17 at 14:49 -0700, Allison Randal wrote:

> re2c and lemon aren't enough of an improvement over flex and bison to be
> worth the pain of rewriting IMCC from scratch. If we do create a new way
> of producing bytecode (and it's a safe bet that we will at some point),
> I would lean toward using our own tools.

> - Patrick is already looking into implementing a version of PGE in C.
> This will be an infinitely better parser than any existing alternatives,
> so it's worth waiting for.
>
> - We already want an OST(opcode syntax tree)-to-bytecode compiler that
> bypasses PIR for the compiler tools. That same compiler could be used to
> implement PIR (combined with a lightweight version of TGE in C).
>
> - IMCC is not a straight translator, it also performs optimizations.
> These should be implemented in a modular way, with a standard interface,
> so that developers can swap in new and improved optimizers as we go
> along. The best place to hook them is probably off the OST-to-bytecode
> compiler.

Allison having said that we need an API for byte code generation that
supports plug n play optimizers would it make sense to start
implementing this layer. This API could be used for OST to byte code
generation. Later when Patrick's PGE to C parser generator is ready we
could use his code to implement the PIR compiler and just use the API's
that we write for byte code generation. Initially for prototyping
purposes we might just use the existing flex/yacc or re2c/lemon.

Allison should this development wait or can we start working on it? Will
we need a PDD before we can commence working on this API. Let me know
your thoughts.

It might not hurt to start working on a Prototype API and see how it
fits withe OST-to-bytecode compiler.

Allison Randal

unread,
Jul 17, 2006, 9:11:25 PM7/17/06
to visha...@gmail.com, Perl 6 Internals
Vishal Soni wrote:
>
> Allison having said that we need an API for byte code generation that
> supports plug n play optimizers would it make sense to start
> implementing this layer. This API could be used for OST to byte code
> generation. Later when Patrick's PGE to C parser generator is ready we
> could use his code to implement the PIR compiler and just use the API's
> that we write for byte code generation.

Yes, this will be valuable.

> Initially for prototyping
> purposes we might just use the existing flex/yacc or re2c/lemon.

The current PGE implementation is the best prototyping substitute: a)
the output from it will be nearly identical to the output from the C
version, and b) we also want to be able to use the OST-to-bytecode
compiler from language-compilers that use the PIR versions of PGE/TGE,
so it makes sense to build it that way from the start.

Ultimately we'll want to remove the PIR->PGE->PIR dependency loop, but
this is a good start.

> Allison should this development wait or can we start working on it? Will
> we need a PDD before we can commence working on this API. Let me know
> your thoughts.
>
> It might not hurt to start working on a Prototype API and see how it
> fits withe OST-to-bytecode compiler.

Let's go for an agile, iterative approach to the spec. Write up some
initial thoughts on the shape of the API and post them to
parrot-porters. The group can do sanity-checking/brainstorming, and then
you can start a prototype based on the result. After we've played with
the prototype a bit (and probably after you've modified it a few times
based on feedback from the group), I'll write a PDD to flesh out the
spec, fill in any holes, and address any problems encountered along the way.

Thanks,
Allison

Vishal Soni

unread,
Jul 17, 2006, 9:32:34 PM7/17/06
to Allison Randal, Perl 6 Internals

> Let's go for an agile, iterative approach to the spec. Write up some
> initial thoughts on the shape of the API and post them to
> parrot-porters. The group can do sanity-checking/brainstorming, and then
> you can start a prototype based on the result. After we've played with
> the prototype a bit (and probably after you've modified it a few times
> based on feedback from the group), I'll write a PDD to flesh out the
> spec, fill in any holes, and address any problems encountered along the way.

Allison this sounds great. To get started I will need some reference to
the OST format. Can you please point me in the right direction (some
documentation or sample code shall do.)?

I will assume the implementation of the Byte Code Generation/
Optimization API will be implemented in C (TGE could use loadlib or some
PMC mechanism to call it). Let me know if my assumption is correct or
does this API need to be in PIR.


> Thanks,
> Allison

Allison Randal

unread,
Jul 17, 2006, 10:04:53 PM7/17/06
to visha...@gmail.com, Perl 6 Internals
Vishal Soni wrote:
>
> Allison this sounds great. To get started I will need some reference to
> the OST format. Can you please point me in the right direction (some
> documentation or sample code shall do.)?

Start with languages/punie/lib/POST/ and
languages/punie/lib/PIRGrammar.tg. This is the most developed existing
prototype implementation of OST nodes and an OST-to-PIR translator,
which should give you a general idea of what we'll be looking for.

> I will assume the implementation of the Byte Code Generation/
> Optimization API will be implemented in C (TGE could use loadlib or some
> PMC mechanism to call it). Let me know if my assumption is correct or
> does this API need to be in PIR.

Yes, C is the right way to go.

Allison

Audrey Tang

unread,
Jul 17, 2006, 11:54:07 PM7/17/06
to Joshua Hoblitt, Vishal Soni, Perl 6 Internals

在 2006/7/17 下午 3:41 時,Joshua Hoblitt 寫到:

> 4th Option: fix flex. ;)

Turns out flex 2.5.30+ has a reentrant mode. However, it also has an
incompatible API with earlier flex,
even in non-reentrant mode. I've attached a patch under http://
rt.perl.org/rt3//Public/Bug/Display.html?id=34669
(need flex 2.5.30+ to run) that updates imcc.l to deal with "%option
reentrancy" and the additional yyscanner parameter.

However, imcc.y currently only allows one additional param to be
passed as YYLEX_PARAM, and it's already taken
by the Parrot interpreter, so until we put yyscanner into the interp
somehow, or change the generated bison
parser, this wouldn't work.

As I'm writing this, I noticed that Allison has ruled that we go with
PIR/PGE and eventually C-based libpge instead
-- since a lexer refactoring that doesn't affect the IMCC API will
somehow throw important projects on Parrot into a
"dead stall", and thread safety for PIR compilation is not a 1.0 goal
anyway -- I'll abandon working on this, and
focus on helping getting a C-based libpge started instead. :-)

Thanks,
Audrey

PGP.sig

Allison Randal

unread,
Jul 18, 2006, 1:21:17 AM7/18/06
to Audrey Tang, Perl 6 Internals
Audrey Tang wrote:
>
> As I'm writing this, I noticed that Allison has ruled that we go with
> PIR/PGE and eventually C-based libpge instead
> -- since a lexer refactoring that doesn't affect the IMCC API will
> somehow throw important projects on Parrot into a
> "dead stall", and thread safety for PIR compilation is not a 1.0 goal
> anyway -- I'll abandon working on this, and
> focus on helping getting a C-based libpge started instead. :-)

LOL :) Audrey, I love you dear, but you always have an interesting way
of interpreting what I say. :)

Yes, I'm not willing to start a 6+ month project to gut IMCC. The cost
is too great and the benefit isn't great enough.

If you have a way to make IMCC reentrant that involves upgrading to a
more recent version of flex and passing one additional parameter, go for
it! Send us a patch and if it passes all the tests, we'll apply it.


It's still true that:

- We need an OST-to-bytecode compiler for the compiler tools. (I suspect
it will solve some of your problems too, as you'll no longer need to
embed Parrot to generate Parrot bytecode. You'll be able to generate it
from a C library instead and just run the bytecode on Parrot.)

- A PIR parser written in PGE is a good idea (and will be dead simple
anyway, as PIR is a simple language).

- A version of PGE written in C is a good idea, because it will spread
Perl 6 regexes/grammars far and wide. (It will be difficult, because of
all the Parrot features that will have to be reimplemented in a
standalone PGE. But, it is possible.)

- If those things combine to produce a cleaner, more maintainable
alternative to IMCC, it's good for Parrot. If not, then the separate
components are still good for Parrot.


There's more than one way to do it,
Sometimes you should do both,
Allison

Audrey Tang

unread,
Jul 18, 2006, 1:54:45 AM7/18/06
to Allison Randal, Perl 6 Internals

在 2006/7/18 上午 1:21 時,Allison Randal 寫到:

> LOL :) Audrey, I love you dear, but you always have an interesting
> way of interpreting what I say. :)
>
> Yes, I'm not willing to start a 6+ month project to gut IMCC. The
> cost is too great and the benefit isn't great enough.

Indeed, and I'd like to apologize publicly for the snipping.

However, the re2c or regel-based scanner refactoring isn't different
from a "flex upgrade patch", as it (by definition) can't affect
IMCC's public API at all. An additional advantage is that they will
let us rid of the flaky API situation with flex. In any case it
wouldn't take 6 months.

In vsoni's original words:

> a. Remove flex and implement re2c
> b. Remove static and global variables

And you answered:

> The cost/benefit balance on this solution is not good. A lot of
> people are depending on IMCC now, and a refactor of that magnitude
> will throw several important projects on Parrot into a dead stall.
>
> So, my answer is: No.

It will involve overhauls, but again, the public interface -- at
bison level and above -- cannot break. So the "dead stall" ruling --
effectively dismissing re2c and other scanner alternatives instantly
-- strikes me as extremely surprising.

> If you have a way to make IMCC reentrant that involves upgrading to
> a more recent version of flex and passing one additional parameter,
> go for it! Send us a patch and if it passes all the tests, we'll
> apply it.

As flex 2.5.30+ is not API compatible with the current flex IMCC is
using, I wonder how it is different from re2c or regel, in particular
that shoehorning an additional YYLEX parameter to make it work with
bison will also involve overhauls beyond the original bison interface.

I guess my question is: If I send two patches, of equal size, one
uses re2c and is much cleaner and faster; another uses a kluged-up
flex with its new, backward-incompatible reentrant API, would you
reject one and apply the other? If you are willing to let
alternative scanners go in, I'd much rather working on that instead
of trying to work around the bison/flex interface.

> - A version of PGE written in C is a good idea, because it will
> spread Perl 6 regexes/grammars far and wide. (It will be difficult,
> because of all the Parrot features that will have to be
> reimplemented in a standalone PGE. But, it is possible.)

Well, as discussed in #parrot, an offline-parser (i.e. one that does
not permit changes to the gramamr during parsing) with rule syntax
can be much more easily generated as a C-emitter backend from either
PIR/PGE or Perl5/PCR. I'm looking into it with vsoni right now,.

Audrey


PGP.sig

Audrey Tang

unread,
Jul 18, 2006, 3:24:32 AM7/18/06
to Allison Randal, Perl 6 Internals

在 2006/7/18 上午 1:54 時,Audrey Tang 寫到:

>> If you have a way to make IMCC reentrant that involves upgrading
>> to a more recent version of flex and passing one additional
>> parameter, go for it! Send us a patch and if it passes all the
>> tests, we'll apply it.
>
> As flex 2.5.30+ is not API compatible with the current flex IMCC is
> using, I wonder how it is different from re2c or regel, in
> particular that shoehorning an additional YYLEX parameter to make
> it work with bison will also involve overhauls beyond the original
> bison interface.
>
> I guess my question is: If I send two patches, of equal size, one
> uses re2c and is much cleaner and faster; another uses a kluged-up
> flex with its new, backward-incompatible reentrant API, would you
> reject one and apply the other? If you are willing to let
> alternative scanners go in, I'd much rather working on that instead
> of trying to work around the bison/flex interface.

Code is easier for me to write than English. Hence:

09:22 <@audreyt> imcc scanner is now reentrant.
09:22 <@audreyt> I think it wouldn't take more than another hour to
get it based on re2c
09:22 <@audreyt> but I'm willing to take what is felt more comfortable.

:-)

Audrey

PGP.sig

Audrey Tang

unread,
Jul 18, 2006, 3:38:36 AM7/18/06
to Vishal Soni, Perl 6 Internals
Vishal,

在 2006/7/16 下午 11:57 時,Vishal Soni 寫到:

> a. Remove flex and implement re2c
> b. Remove static and global variables

Now that the flex part is done, are you still willing to help
removing the remaining static/global state?

> Apart from this we also need to refactor the code to get rid of
> arrays to a
> hash table implementation for macros.

This would rock, too.

Thanks,
Audrey

PGP.sig

Allison Randal

unread,
Jul 18, 2006, 4:15:22 AM7/18/06
to Audrey Tang, Perl 6 Internals
Audrey Tang wrote:
>
> Indeed, and I'd like to apologize publicly for the snipping.

Accepted and forgiven.

> However, the re2c or regel-based scanner refactoring isn't different
> from a "flex upgrade patch", as it (by definition) can't affect IMCC's
> public API at all. An additional advantage is that they will let us rid
> of the flaky API situation with flex. In any case it wouldn't take 6
> months.
>
> In vsoni's original words:
>
>> a. Remove flex and implement re2c
>> b. Remove static and global variables

The full quote in context is:

----
Since flex is not generating reeentrant code, this option will get rid of
flex altogether and replace it with re2c. This would require significant
reworking on the code. So the plan of action would be as follows:

a. Remove flex and implement re2c
b. Remove static and global variables

Apart from this we also need to refactor the code to get rid of arrays to a


hash table implementation for macros.

All in all this would be over hauling lot of code.

----

> And you answered:
>
>> The cost/benefit balance on this solution is not good. A lot of people
>> are depending on IMCC now, and a refactor of that magnitude will throw
>> several important projects on Parrot into a dead stall.

Yup. Always take the estimate of the developer and multiply it by at
least 3. If the developer thinks it will require "significant
reworking", it's likely to be a massive overhaul.

> It will involve overhauls, but again, the public interface -- at bison
> level and above -- cannot break. So the "dead stall" ruling --
> effectively dismissing re2c and other scanner alternatives instantly --
> strikes me as extremely surprising.

It's not the definition of the interface I'm concerned about, it's the
behavior behind the interface. Can you guarantee that you can substitute
re2c for flex without changing any behavior of IMCC? If you say "Yes",
I'll still be suspicious the answer will turn out to be "No".

I'm also not convinced that re2c is a significant improvement over flex.
I'd rather spend that developer time on things that are significant
improvements.

I am convinced that we need to avoid yanking working systems out from
under developers whenever possible.

Allison

0 new messages