Re: c2ast.pl on Perl source?

23 views
Skip to first unread message

Durand Jean-Damien

unread,
Dec 24, 2013, 10:08:46 AM12/24/13
to marpa-...@googlegroups.com
Jeffrey,

Nice idea, I'll do so, guessing that posting to blogs.perl.org could have a better and perhaps more appreciated audience than directly to p5p or perlbug (?).

Thanks / JD.

2013/12/24 Jeffrey Kegler <jeffre...@jeffreykegler.com>
[ Off-line from the group ] An exercise which might help the Perl community (and in the process bring attention to c2ast), would be to run a c2ast.pl --check reservedNames on the Perl source, and submit it to perl5-porters (or perlbug?).

[...] Cleaning up the namespace will be hard -- the Perl source intrudes on the reserved namespace heavily.  And many people may not realize the reason to keep the namespace clean -- it'll seem like a lot of work to deal with something that is not an issue.

I'm emailing you direct off-line because you're the obvious first-choice to do this.  If you like the idea, reply back into the main list.  Otherwise, I may throw this open to the list as a "Target of Opportunity".

-- jeffrey

Jeffrey Kegler

unread,
Dec 24, 2013, 1:35:14 PM12/24/13
to marpa-...@googlegroups.com
A blog post sounds good.  More people should know about this issue -- even if you don't find fixing legacy code to be worth the bother, it *is* good to know enough not to write new code with reserved names.  And p5p, etc., will then be free to give the current issues in the Perl source whatever priority they see as appropriate.

As context, the C standards reserve certain names to the "implementation", which means the compiler implementation, including the C libraries.  Your own applications and libraries are not allowed to use reserved names.  These names were reserved back before namespace issues were well understood.  In many cases there are unnecessarily overbroad and can be called mistakes, but they are mistakes that we are stuck with.  I personally find the bans on E[A-Z0-9]*, is[a-z]* and to[a-z]* all to be real nuisance.  If you have a variable named "token", you are using reserved namespace, and an implementation upgrade could cause unspecified behavior as a result.  Or how about the "stream_state" variable?  Banned -- str[a-z]* is reserved for new string functions. The GNU docs summarize them nicely here.

The full list is hard to memorize and C programmers, even at the highest skill level, often ignore them.  Before Jean-Damien created it at my request, there was (as far as I know) no tool to detect violations.  I was aware of these issues, and believed that I was writing Marpa to be fully compliant, but c2ast.pl found many issues I'd missed.

So a blog post would be a real service.

-- jeffrey

--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Durand Jean-Damien

unread,
Dec 24, 2013, 5:59:29 PM12/24/13
to marpa-...@googlegroups.com
Damned, c2ast failure on gcc __extension__, ... Will see tomorrow -; Probably another release of MarpaX::Languages:C::AST when fix found.

 else if ((!__extension__ ({ size_t __s1_len, ...
--------------------------^
Uncaught exception from user code:
         at /usr/local/share/perl/5.18.1/MarpaX/Languages/C/AST.pm line 109.
        MarpaX::Languages::C::AST::Util::logCroak('%s\x{a}Last position:\x{a}\x{a}%s%s', 'Error in SLIF parse: No lexemes accepted at line 18736, colum...', 'line:column 18736:27 (Unicode newline count) 18736:27 (\n cou...', '') called at /usr/local/share/perl/5.18.1/MarpaX/Languages/C/AST.pm line 109
        MarpaX::Languages::C::AST::parse('MarpaX::Languages::C::AST=HASH(0xb392654)', 'SCALAR(0xa702f78)') called at /usr/local/bin/c2ast.pl line 126

Jeffrey Kegler

unread,
Dec 24, 2013, 7:08:22 PM12/24/13
to marpa-...@googlegroups.com
As I've said before, I think this C parser is a remarkable new tool, something that has been needed for decades.  Getting out the last few nits is annoying, but each of those last few steps adds a surprising amount to way the user experiences the tool.

By the way, when I was using it (and I encountered no bugs when using it to deal with the Marpa source), I did note some things which I thought might make the tool more "cuddly" from the user's point of view.

1.) An option (--stdin?) to accept the pre-processed C source on standard input.  This makes getting things right with the options easier.  You first get the pre-processor options right, separately.  Then you can eyeball the pre-processed C to see it's right.  Finally, you work out the (now far fewer) options to c2ast.  And with a --stdin option, you can do other things like capture the pre-processed C in a file, for debugging.

2.) More error messages on the c2ast.pl options.  In particular, I had a lot of problems with it "failing" silently when I got the options wrong.  c2ast.pl was not really failing -- it's just that I did not have the options right.  But I think the option processing in c2ast could be more helpful.

3.) Factor the checking of reserved names into a separate tool, perhaps c_reserved.pl.  This would make the options even simpler.  Also, I think a separate c_reserved.pl could then act as an example of the use of MarpaX::Languages::C::AST for study.

Thanks, jeffrey

Durand Jean-Damien

unread,
Dec 25, 2013, 1:13:58 AM12/25/13
to marpa-...@googlegroups.com
Thanks for this feedback. All of them has been pushed on my todo list, and you will get feedback progressively.

Right now, I have fixed the grammar (gcc __extension__ keyword was misplaced) and have released a new version of MarpaX::Languages::C::AST. Which parses successfully the perl-5.18.1 sources, increasing the confidence level on c2ast.pl to a quite high value -;

Thanks / JD.

Ruslan Shvedov

unread,
Dec 25, 2013, 2:48:53 AM12/25/13
to marpa-...@googlegroups.com
On Wed, Dec 25, 2013 at 8:13 AM, Durand Jean-Damien <jeandami...@gmail.com> wrote:
Thanks for this feedback. All of them has been pushed on my todo list, and you will get feedback progressively.

Right now, I have fixed the grammar (gcc __extension__ keyword was misplaced) and have released a new version of MarpaX::Languages::C::AST. Which parses successfully the perl-5.18.1 sources, increasing the confidence level on c2ast.pl to a quite high value -;
Great news; congratulations. 

I just wondered can it be useful if MarpaX::Languages::C::AST had a dump() method to re-produce the source C text? 

Such reproduced file would then be shown

to be the same as the original C (no textual diff, sans whitespace perhaps)
to do the same (no binary diff perhaps?) as the original C file when compiled with the same compiler.

Naive as it is, but hopefully helpful?

Jeffrey Kegler

unread,
Dec 25, 2013, 3:27:06 AM12/25/13
to marpa-...@googlegroups.com
The point may be a bit pedantic, but the "A" in AST stands for "abstract", which suggests that some information is lost, and that a round-trip from text to text via an AST would not be possible.   On the other hand, in current use, "AST" tends to mean simply "syntax tree".  -- jeffrey

-- jeffrey

Ruslan Shvedov

unread,
Dec 25, 2013, 4:02:13 AM12/25/13
to marpa-...@googlegroups.com
Both valid points; round-tripping just might look good from the outside as a test/demo case, even if more demo than test.

Ruslan Shvedov

unread,
Dec 25, 2013, 5:04:01 AM12/25/13
to marpa-...@googlegroups.com
Now with ASF's traversing at hand, given that c2ast.pl produces unambiguous parses (as it should be, I think), ASF's become effectively AST's and the MarpaX::Languages::C::AST::dump() can be done by simply visiting each glade and appending glade->literal() as needed with suitable whitespace?

Or is it not that easy?


On Wed, Dec 25, 2013 at 10:27 AM, Jeffrey Kegler <jeffre...@jeffreykegler.com> wrote:

Durand Jean-Damien

unread,
Dec 25, 2013, 7:35:24 AM12/25/13
to marpa-...@googlegroups.com
It does and would croak if not the case.
FYI the raw output of c2ast output on the whole perl-5.1.8.1 source tree is attached.
I am thinking to use google documents's spreadsheet to produce some sexy output -;
Thanks / JD.
perl-5.18.1.txt

Ruslan Shvedov

unread,
Dec 25, 2013, 7:59:50 AM12/25/13
to marpa-...@googlegroups.com
On Wed, Dec 25, 2013 at 2:35 PM, Durand Jean-Damien <jeandami...@gmail.com> wrote:
It does and would croak if not the case.
Great to hear that. A round-tripping test (sort of) looks easily doable then with ASF.
 
FYI the raw output of c2ast output on the whole perl-5.1.8.1 source tree is attached.
I am thinking to use google documents's spreadsheet to produce some sexy output -;
1316 messages, no less. Couldn't resist the curiosity and converted it to tab-separated values (file line id msg), attached in case you'd find it useful.
 
Thanks / JD.
-- rns 


Le mercredi 25 décembre 2013 11:04:01 UTC+1, rns a écrit :
Now with ASF's traversing at hand, given that c2ast.pl produces unambiguous parses (as it should be, I think), ASF's become effectively AST's and 

--
perl-5.18.1.tsv

Ruslan Shvedov

unread,
Dec 25, 2013, 8:25:54 AM12/25/13
to marpa-...@googlegroups.com
On Wed, Dec 25, 2013 at 12:04 PM, Ruslan Shvedov <Ruslan....@gmail.com> wrote:
Now with ASF's traversing at hand, given that c2ast.pl produces unambiguous parses (as it should be, I think), ASF's become effectively AST's and the MarpaX::Languages::C::AST::dump() can be done by simply visiting each glade and appending glade->literal() as needed with suitable whitespace?

Or is it not that easy?
Just remembered that it's easy only for monotonic applications which c2ast.pl must be not.

Durand Jean-Damien

unread,
Dec 25, 2013, 11:15:30 AM12/25/13
to marpa-...@googlegroups.com
Interestingly, the 1316 message all fall into 8 categories, the top 4 eating 98% of the messages. I think I will blog raw statistics like that -;

Top 10 messages (number of hits)
--------------------------------
 466  The header file sys/stat.h reserves names prefixed with 'st_' and 'S_'
 465  The header file fcntl.h reserves names prefixed with 'l_', 'F_', 'O_', and 'S_'
 238  Names that begin with either 'is' or 'to' followed by a lowercase letter may be used for additional character testing and conversion functions.
 133  Names beginning with 'str', 'mem', or 'wcs' followed by a lowercase letter are reserved for additional string and array functions
   9  Names that end with '_t' are reserved for additional type names
   2  Names beginning with a capital 'E' followed by a digit or uppercase letter may be used for additional error code names
   2  The header file limits.h reserves names suffixed with '_MAX'
   1  The header file dirent.h reserves names prefixed with 'd_'

JD.

Durand Jean-Damien

unread,
Dec 25, 2013, 11:56:30 AM12/25/13
to marpa-...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages