Why separate the C preprocessor grammar and C parser grammar, can they merge in one?

678 views
Skip to first unread message

song...@gmail.com

unread,
Mar 15, 2017, 10:46:14 PM3/15/17
to antlr-discussion
I am new to ANTLR, and trying to use ANTLR create a standard C parser which can accept a .c file and then create the AST of that .c file. 
But when I downloaded several C grammar from http://www.antlr3.org/grammar/list.html and found that the C preprocessor and C parser are separated.
Why should we separate them?
If I want to get a full complete C parser, should I merge these two grammars?

Tankyou!

Jim Idle

unread,
Mar 15, 2017, 11:14:29 PM3/15/17
to antlr-di...@googlegroups.com
The C pre-processor has always been a separate phase of the compiler tool chain (originally m4). It would be difficult to do the pre-processing as you go. 

I would just utilize an existing pre-processor (so that you don't need to write that) then parse the output of that. You will need to track source file names etc so that any reporting/tracking can follow the original source files. The output from the pre-processor shows you how to do that - run it on it's own against a .c file and you will see how the output is annotated.

Jim

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussion+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

song...@gmail.com

unread,
Mar 16, 2017, 4:33:44 AM3/16/17
to antlr-discussion
Thanks for your answer. 
So what I should do is: .c file -> C preprocsseor -> output file ->C parser(write the grammar based on the output file)-> AST.

Another silly question but it really confuse me. I already downloaded the C preprocessor grammar and C parser grammar but seems they can only accept partial c code. For example, the "hello world" code:
#include <stdio.h>
int main(void){
printf("hello world");
return 0;
}
Then the C preprocessor can only accept #include <stdio.h> and the C parser can only accept the rest part, except the first line.
(The rest of the code displays red in the antlrworks "parser tree" window)
And I check the C preprocessor grammar and there is not any rule for the main c code part. Is that what it suppose to be?  
Should I just ignore the this,and get the output file created by this C preprocessor? And let it be the input of the C parser, then I will get what I want?


在 2017年3月16日星期四 UTC+8上午11:14:29,Jim Idle写道:
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.

Steve Coleman

unread,
Mar 16, 2017, 9:02:47 AM3/16/17
to antlr-di...@googlegroups.com
I'm new to ANTLR but I can probably address this question on the C
compiler phases. Historically the compiler has been broken into phases
chained together. Originally it was due to smaller memory sizes being
predominant, but also the "preprocessor" is exactly that. It takes the
source file and performs all the #defines, #includes operations and it
determines what parts of the source code file actually go to the
compiler proper.

This way the code can be rewritten on-the-fly based on the target
platform architecture and the compiler tool environment doing the
compiling. This allows for cross compiling with different OS's and CPU's
that have different physical constraints for its registers or memory
layout (e.g. endian notion, bit size, etc.). When you send -D -I command
switches into the compiler you are asking for particular
operations/edits on that source file, and the output of that step then
gets compiled.

Basically the C preprocessor handles only the '#' operations (macro
expansions, conditional includes), and all other lines in the source
file will be ignored by that parser and merely passed on as is.

Hope that helps.

Steve


On 03/16/2017 04:33 AM, song...@gmail.com wrote:
> Thanks for your answer.
> So what I should do is: .c file -> C preprocsseor -> output file ->C
> parser(write the grammar based on the output file)-> AST.
>
> Another silly question but it really confuse me. I already downloaded
> the C preprocessor grammar and C parser grammar but seems they can only
> accept partial c code. For example, the "hello world" code:
>
> *#include <stdio.h>*
> *int main(void){*
> *printf("hello world");*
> *return 0;*
> *}*
>
> Then the C preprocessor can only accept*#include <stdio.h>* and the C
> parser can only accept the rest part, except the first line.
> (The rest of the code displays red in the antlrworks "parser tree" window)
> And I check the C preprocessor grammar and there is not any rule for the
> main c code part. Is that what it suppose to be?
> Should I just ignore the this,and get the output file created by this C
> preprocessor? And let it be the input of the C parser, then I will get
> what I want?
>
>
> 在 2017年3月16日星期四 UTC+8上午11:14:29,Jim Idle写道:
>
> The C pre-processor has always been a separate phase of the compiler
> tool chain (originally m4). It would be difficult to do the
> pre-processing as you go.
>
> I would just utilize an existing pre-processor (so that you don't
> need to write that) then parse the output of that. You will need to
> track source file names etc so that any reporting/tracking can
> follow the original source files. The output from the pre-processor
> shows you how to do that - run it on it's own against a .c file and
> you will see how the output is annotated.
>
> Jim
>
> On Thu, Mar 16, 2017 at 10:46 AM, <song...@gmail.com <javascript:>>
> wrote:
>
> I am new to ANTLR, and trying to use ANTLR create a standard C
> parser which can accept a .c file and then create the AST of
> that .c file.
> But when I downloaded several C grammar from
> http://www.antlr3.org/grammar/list.html
> <http://www.antlr3.org/grammar/list.html> and found that the C
> preprocessor and C parser are separated.
> Why should we separate them?
> If I want to get a full complete C parser, should I merge these
> two grammars?
>
> Tankyou!
>
> --
> You received this message because you are subscribed to the
> Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to antlr-discussi...@googlegroups.com
> <javascript:>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to antlr-discussi...@googlegroups.com
> <mailto:antlr-discussi...@googlegroups.com>.

song...@gmail.com

unread,
Mar 16, 2017, 10:10:11 PM3/16/17
to antlr-discussion
Thanks Steve. I am trying to learn compiler through building a compiler with ANTLR, and I feel I had confused some concepts.
Based on your answer, I  draw its process as follow, please correct me, if I am wrong:




What's the file format of the output file ? .i file?
if I use -i command in gcc, and then use that .i file
as the input file of my C parser. In this way, is the gcc
already do the preprocess for me so I do not need my own C preprocessor?
Am I getting the same result? 


在 2017年3月16日星期四 UTC+8下午9:02:47,Steve Coleman写道:
> For more options, visit https://groups.google.com/d/optout.

Steve Coleman

unread,
Mar 17, 2017, 12:43:26 PM3/17/17
to antlr-di...@googlegroups.com
The gcc -E command flag will cause the preprocessor to perform all its
includes and macro expansions, and that output can then be fed into the
compiler proper. You then pass that as a preprocessed *.i file or give
the gcc compiler the -fpreprocessed command flag to continue on at that
point in the compilation process, but isn't it this step your own ANTLR
C parser would be doing?

The png image you supplied indicates your "target" is actually the AST
graph. If so, that can supposedly be done by just running gcc like:

> gcc -fdump-tree-original-raw ./test.c

and then processing the output into whatever form you need, thus
skipping the parsing of the C code altogether. You can see an example of
this here:

http://digitocero.com/en/blog/exporting-and-visualizing-gccs-abstract-syntax-tree-ast

I'm going to be playing with this AST stuff for another project but
unrelated to my interests in ANTLR.


Steve


On 03/16/2017 10:10 PM, song...@gmail.com wrote:
> Thanks Steve. I am trying to learn compiler through building a compiler
> with ANTLR, and I feel I had confused some concepts.
> Based on your answer, I draw its process as follow, please correct me,
> if I am wrong:
>
>
>
> <https://lh3.googleusercontent.com/-Uz5dhne33DA/WMs-YcJv2TI/AAAAAAAAADA/qwm1X87izCsYqkZnb-dLAsxrMbcaxgZMQCLcB/s1600/%25E6%258D%2595%25E8%258E%25B7.PNG>
> > an email to antlr-discussi...@googlegroups.com <javascript:>
> > <mailto:antlr-discussi...@googlegroups.com <javascript:>>.
> <https://lh3.googleusercontent.com/-w5Ej6QuEtGU/WMs-IgBM_fI/AAAAAAAAAC8/FsZILqjgL4gxDK_OvWrLJ8U4lfoVcKDNgCLcB/s1600/%25E6%258D%2595%25E8%258E%25B7.PNG>
>
> --
> You received this message because you are subscribed to the Google
> Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to antlr-discussi...@googlegroups.com
> <mailto:antlr-discussi...@googlegroups.com>.

Nandha Kumar

unread,
Feb 1, 2018, 4:56:21 AM2/1/18
to antlr-discussion
this SO link might help.
Reply all
Reply to author
Forward
0 new messages