Part 11 Parser rewrite

111 views
Skip to first unread message

Christopher Horler

unread,
Jun 2, 2017, 12:50:15 PM6/2/17
to STEPcode - Developers Mailing List
Due to issues I encountered when working on my complete rewrite of exp2py (I have thousands of lines of unshared code because of this...).  I decided to rewrite the express parser to fix the underlying cause of difficulties I've been having.

you can grab the code here for the moment:
https://github.com/cshorler/stepcode/tree/parser_rewrite

I've tested all schemas under data/*/*.exp
When developing I also tested the unitary schemas and a few additional ones I have for exp2py

If you want to test other schemas (NOTE: this doesn't give you any code, it's just a check if you get any syntax errors or lexing errors)
you can build the utility in src/expscan - the usual way of building a cmake project applies, it's not integrated into the stepcode build process.

Further details:

The new parser is in two parts - pass 1 and pass 2.

Pass 1 - retrieves all identifiers from the express file and creates a single linked list of POD structure of names.

In order to test it I've also created a utility similar to the build process schemaScanner used to configure exp2cxx, called expscan

I switched back to Bison and Flex, because I wanted the location tracking support and there may be need for a reentrant parser for use / reference resolution.  Plus, there's been a lot of work and licence / usage guidance since Lemon was introduced.  I'm also making a test of argparse, you'll find that under auxlibs... where I will also put bstrlib when I'm ready.

Pass 2 - still working on the lexer and grammar.

(as an aside, disabling the build of exp2cxx doesn't appear to disable the build of schemaScanner.cc, which is annoying if you are making changes to the parser - because mistakes break the cmake configuration step, if anyone cares to fix this it would be appreciated).

Clifford Yapp

unread,
Jun 2, 2017, 11:03:59 PM6/2/17
to scl...@googlegroups.com
On Fri, Jun 2, 2017 at 12:50 PM, 'Christopher Horler' via STEPcode -
Developers Mailing List <scl...@googlegroups.com> wrote:

> The new parser is in two parts - pass 1 and pass 2.
>
> Pass 1 - retrieves all identifiers from the express file and creates a
> single linked list of POD structure of names.
>
> In order to test it I've also created a utility similar to the build process
> schemaScanner used to configure exp2cxx, called expscan
>
> I switched back to Bison and Flex, because I wanted the location tracking
> support and there may be need for a reentrant parser for use / reference
> resolution. Plus, there's been a lot of work and licence / usage guidance
> since Lemon was introduced.

Can you elaborate on this a little bit? The rewrite to use re2c/lemon
was very specifically undertaken for portability reasons... also, what
licensing issues are you seeing? My understanding was/is that re2c
and lemon are both effectively in the public domain...

> I'm also making a test of argparse, you'll find
> that under auxlibs... where I will also put bstrlib when I'm ready.

> Pass 2 - still working on the lexer and grammar.
>
> (as an aside, disabling the build of exp2cxx doesn't appear to disable the
> build of schemaScanner.cc, which is annoying if you are making changes to
> the parser - because mistakes break the cmake configuration step, if anyone
> cares to fix this it would be appreciated).

Hmm... the original build logic didn't contemplate disabling exp2cxx
IIRC, which is probably why the rest of the logic assumes it.

Christopher Horler

unread,
Jun 3, 2017, 12:25:54 PM6/3/17
to STEPcode - Developers Mailing List


On Saturday, 3 June 2017 04:03:59 UTC+1, C Y wrote:
On Fri, Jun 2, 2017 at 12:50 PM, 'Christopher Horler' via STEPcode -
Developers Mailing List <scl...@googlegroups.com> wrote:

> The new parser is in two parts - pass 1 and pass 2.
>
> Pass 1 - retrieves all identifiers from the express file and creates a
> single linked list of POD structure of names.
>
> In order to test it I've also created a utility similar to the build process
> schemaScanner used to configure exp2cxx, called expscan
>
> I switched back to Bison and Flex, because I wanted the location tracking
> support and there may be need for a reentrant parser for use / reference
> resolution.  Plus, there's been a lot of work and licence / usage guidance
> since Lemon was introduced.

Can you elaborate on this a little bit?  The rewrite to use re2c/lemon
was very specifically undertaken for portability reasons... also, what
licensing issues are you seeing?  My understanding was/is that re2c
and lemon are both effectively in the public domain...

sorry, a little confusion there - I wasn't referring to lemon/re2c, I was referring to the code produced by Flex / Bison.
What portability issues? I'm compiling now in C99 mode without issue, I haven't had the opportunity to test with MSVC yet - but I will.
 

Clifford Yapp

unread,
Jun 3, 2017, 3:00:01 PM6/3/17
to scl...@googlegroups.com
On Sat, Jun 3, 2017 at 12:25 PM, 'Christopher Horler' via STEPcode -
Developers Mailing List <scl...@googlegroups.com> wrote:
>
>> Can you elaborate on this a little bit? The rewrite to use re2c/lemon
>> was very specifically undertaken for portability reasons... also, what
>> licensing issues are you seeing? My understanding was/is that re2c
>> and lemon are both effectively in the public domain...
>
> sorry, a little confusion there - I wasn't referring to lemon/re2c, I was
> referring to the code produced by Flex / Bison.
> What portability issues? I'm compiling now in C99 mode without issue, I
> haven't had the opportunity to test with MSVC yet - but I will.

That's what I mean - stepcode replaced flex/bison with
perplex/re2c/lemon some time ago. The main repository shouldn't have
any remaining requirement for flex/bison, unless we missed something.

The issue is not with the code generated by flex/bison itself, it's
that those tools do not run natively on Windows without something like
cygwin/mingw. Under normal circumstances it's always better to avoid
checking in and relying on any form of automatically generated source
code in any repository - as soon as you do that, any build fixes to
the generated code divorce those sources from the actual source files
(the scanner/lexer inputs). It is occasionally necessary for
bootstrapping - the BRL-CAD build of re2c, for example, uses generated
bootstrapping sources to get an initial re2c executable working - but
it is something that should be kept to a minimum.

Christopher Horler

unread,
Jun 3, 2017, 6:16:11 PM6/3/17
to STEPcode - Developers Mailing List

this seems to be an active project that provides a native windows build for Flex/Bison:
https://sourceforge.net/projects/winflexbison/

I would never fix the generated sources - not my plan.
 

Mark

unread,
Aug 13, 2017, 8:12:54 PM8/13/17
to STEPcode - Developers Mailing List
I'm rather late to the party, but there's a perplex/lemon/re2c repo on github that should work quite easily with stepcode's build system - https://github.com/stepcode/baffledCitrus
Once it's compiled, I'm pretty sure there's a single cmake variable to set in stepcode - then, cmake will use those executables whenever it's necessary to sync the generated code with the .y file.

Regards
Mark

--
You received this message because you are subscribed to the Google Groups "STEPcode - Developers Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scl-dev+u...@googlegroups.com.
To post to this group, send email to scl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scl-dev/6e4a5022-dff4-4d6f-a880-b5c04a0def0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark

unread,
Aug 13, 2017, 8:36:57 PM8/13/17
to STEPcode - Developers Mailing List
I meant to mention... a parser rewrite would do two additional things that are of benefit:

a) make the parser more precisely match the spec (bnf can be found at doc/iso-10303-11--2004.bnf)
b) (probably) make American Fuzzy Lop work harder to find crashes

I ran afl a few months ago and it didn't take long for it to find >130 files that cause check-express to barf.

Regards
Mark

Chris

unread,
Aug 25, 2017, 11:14:18 AM8/25/17
to scl...@googlegroups.com
I'm planning a focused effort on this next week, I'm now on my 3rd or 4th rewrite after changing my opinion about how to do it three times... after running into three roadblocks. It's about my 20th or 30th parser implementation, but by far the most complex to have a graceful approach... Which I'm trying to achieve.

And, yes - it should conform to spec / bnf.

Also, my rewrite of exp2py which tries to output a much richer object model... Also barfs at the moment. For me this is the logical and necessary cause of action. I've only really found issues in the parser - nothing really bothers me about libexpress.

Mark

unread,
Aug 27, 2017, 10:13:06 PM8/27/17
to scl...@googlegroups.com
Is this rewrite using perplex/lemon/re2c, or flex/bison? I hope the former, as BRL-CAD put quite a bit of effort into rewriting the parser to get away from flex/bison.

When you get to the stage of debugging, gdb pretty printers may be helpful. I have a few in a commit in the pretty_printers branch, https://github.com/stepcode/stepcode/commit/066d4d7809813de90d6dddc284fea505446d6418

Seems like 90% of the effort with pretty printers is getting something trivial to work, so hopefully these will save effort. Your ~/.gdbinit will need a line like add-auto-load-safe-path /path/to/stepcode

Regards
Mark

--
You received this message because you are subscribed to the Google Groups "STEPcode - Developers Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scl-dev+u...@googlegroups.com.
To post to this group, send email to scl...@googlegroups.com.

Chris

unread,
Aug 28, 2017, 5:00:52 AM8/28/17
to scl...@googlegroups.com
Mark,

I looked at Perplex and Lemon - please see the previous email in this thread on the subject (2nd / 3rd June)

https://groups.google.com/forum/m/#!topic/scl-dev/KONzCpiEO7o

Chris

Sean

unread,
Aug 28, 2017, 12:41:49 PM8/28/17
to STEPcode - Developers Mailing List

On Monday, August 28, 2017 at 5:00:52 AM UTC-4, Christopher Horler wrote:
On 28 August 2017 03:12:55 BST, Mark wrote:
Is this rewrite using perplex/lemon/re2c, or flex/bison? I hope the former, as BRL-CAD put quite a bit of effort into rewriting the parser to get away from flex/bison.

When you get to the stage of debugging, gdb pretty printers may be helpful. I have a few in a commit in the pretty_printers branch, https://github.com/stepcode/stepcode/commit/066d4d7809813de90d6dddc284fea505446d6418

Seems like 90% of the effort with pretty printers is getting something trivial to work, so hopefully these will save effort. Your ~/.gdbinit will need a line like add-auto-load-safe-path /path/to/stepcode

Regards
Mark

On Fri, Aug 25, 2017 at 11:14 AM 'Chris' via STEPcode - Developers Mailing List <scl...@googlegroups.com> wrote:
On 14 August 2017 01:36:46 BST, Mark wrote:
I meant to mention... a parser rewrite would do two additional things that are of benefit:

a) make the parser more precisely match the spec (bnf can be found at doc/iso-10303-11--2004.bnf)
b) (probably) make American Fuzzy Lop work harder to find crashes

I ran afl a few months ago and it didn't take long for it to find >130 files that cause check-express to barf.

Regards
Mark


I'm planning a focused effort on this next week, I'm now on my 3rd or 4th rewrite after changing my opinion about how to do it three times... after running into three roadblocks. It's about my 20th or 30th parser implementation, but by far the most complex to have a graceful approach... Which I'm trying to achieve.

And, yes - it should conform to spec / bnf.

Also, my rewrite of exp2py which tries to output a much richer object model... Also barfs at the moment. For me this is the logical and necessary cause of action. I've only really found issues in the parser - nothing really bothers me about libexpress.

--
You received this message because you are subscribed to the Google Groups "STEPcode - Developers Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scl-dev+unsubscribe@googlegroups.com.

To post to this group, send email to scl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scl-dev/FB2F1F6C-29A4-4596-900E-94DEB14F2033%40googlemail.com.
For more options, visit https://groups.google.com/d/optout.

Mark,

I looked at Perplex and Lemon - please see the previous email in this thread on the subject (2nd / 3rd June)

https://groups.google.com/forum/m/#!topic/scl-dev/KONzCpiEO7o

As mentioned, a LOT of effort went into changing STEPcode to Perplex+Lemon for *portability* reasons above all else because that was the (sole?) failing of flex+bison.  You noted a possibility in the previous thread ( https://github.com/AaronNGray/winflexbison ) but it didn't sound like you've actually tried it.  On quick inspection, it would likely complicate things over the current setup because they're not using CMake for the build system.  That means integration would be manual.  It's currently fully automated.

Is this something that you can look into?  Someone would need to at least confirm that it it works, figure out how complicated it is to integrate, what build system changes would need to be changed, etc.  Switching the parser to a complete rewrite is rather risky and with broad implications.

Cheers!
Sean

Chris

unread,
Aug 29, 2017, 4:10:29 AM8/29/17
to scl...@googlegroups.com
Hi Sean,

Yes there are vcproj / msbuild files for flex, M4 and bison. I reviewed each build file and see no major issue. No, I didn't build it - but I have used msbuild frequently.

Before doing anything on Windows, I will complete the grammar of pass 1 and pass 2 parsers and lexers and then re-evaluate the gap of Perplex and Lemon. At this point none of the parser actions are tied to libexpress anyway.

Chris

Christopher Horler

unread,
Apr 15, 2018, 4:09:07 PM4/15/18
to STEPcode - Developers Mailing List
I had a change of opinion on all of this -

I rewrote again in flex and bison, then I encountered a problem with flex!  And so, I started appreciating with greater impact the earlier discussion here after working around it!!

I then decided to look at re2c again, I implemented the missing features and rewrote again in re2c and lemon.  The re2c author was also very helpful in helping me with a few issues I was facing.

A later version of re2c is required (as there are bugs and missing features in earlier versions).  I haven't tested all versions of lemon yet - initially I thought it would be good to upgrade, but I ran into problems there requiring to use the "-c" switch to make it work correctly.

https://github.com/stepcode/stepcode/tree/part11_parser_lexer_rewrite

I'm still tinkering with it - so it doesn't really do anything other than read a file lex it and parse it.  Further details in the PARSER_REWRITE.txt in that branch express directory.
Reply all
Reply to author
Forward
0 new messages