Access parsed fields in Antlr for Cpp

24 views
Skip to first unread message

Harsha Sharma

unread,
May 30, 2019, 11:44:14 PM5/30/19
to antlr-discussion
Hello everyone,

I currently started working on a project in Cpp and using Antlr to parse input files.
I just want to access all parsed fields all at the same place. So, I want to use the standard visitor generated by Antlr and not build my own visitor.
I will like to do something like this - build a parse tree and traverse that tree to get my identifiers values.
I tried doing this:
CompilerParser parser(&tokens);
CompilerParser::ProgramContext *nf = parser.program();
cout << nf->IDENT()->getText() << endl; //Like I can access one identifier here.

But I can not do this for something like this:
expression : item #atom
                 | op expression #single
;

CompilerParser::ExpressionContext *expr = parser.expression(); //I can not do this because I can't parse twice as this gives seg fault.
CompilerParser::AtomContext atom(expr); //I thought this will work to construct AtomContext class and access it members but is doesn't work.

Also, I can't find how to build parsetree from parser.
I tried a lot to look for answers at Stack Overflow, docs and source code, but couldn't find anything. So any help will be really appreciated.
Thank you so much for all your great work and time.

Regards,
Harsha Sharma

Mike Lischke

unread,
May 31, 2019, 4:23:22 AM5/31/19
to antlr-discussion
Hi Harsha,


I currently started working on a project in Cpp and using Antlr to parse input files. 
I just want to access all parsed fields all at the same place.
I will like to do something like this - build a parse tree and traverse that tree to get my identifiers values.

Not sure I understand it correctly: traversing the tree means *not* to access all fields in the same place. You are walking the tree and take the values from the individual sub contexts.

I tried doing this:
CompilerParser parser(&tokens);
CompilerParser::ProgramContext *nf = parser.program();
cout << nf->IDENT()->getText() << endl; //Like I can access one identifier here.

But I can not do this for something like this:
expression : item #atom
                 | op expression #single
;

CompilerParser::ExpressionContext *expr = parser.expression(); //I can not do this because I can't parse twice as this gives seg fault.

What do you mean by "parse twice"? Do you trigger 2 parse runs and if so why? Also, it would be interesting to see what causes the seg fault.

CompilerParser::AtomContext atom(expr); //I thought this will work to construct AtomContext class and access it members but is doesn't work.

You are not supposed to create the contexts yourself. They are created while building the parse tree.


Also, I can't find how to build parsetree from parser.

You neither build the parse tree yourself. It is created automatically by the parser (if enabled).


Harsha Sharma

unread,
Jun 2, 2019, 3:41:50 AM6/2/19
to antlr-di...@googlegroups.com
Hi ,
Thanks for your reply.

On Fri, May 31, 2019 at 4:23 PM 'Mike Lischke' via antlr-discussion <antlr-di...@googlegroups.com> wrote:
Hi Harsha,

I currently started working on a project in Cpp and using Antlr to parse input files. 
I just want to access all parsed fields all at the same place.
I will like to do something like this - build a parse tree and traverse that tree to get my identifiers values.

Not sure I understand it correctly: traversing the tree means *not* to access all fields in the same place. You are walking the tree and take the values from the individual sub contexts.

I tried doing this:
CompilerParser parser(&tokens);
CompilerParser::ProgramContext *nf = parser.program();
cout << nf->IDENT()->getText() << endl; //Like I can access one identifier here.

But I can not do this for something like this:
expression : item #atom
                 | op expression #single
;

CompilerParser::ExpressionContext *expr = parser.expression(); //I can not do this because I can't parse twice as this gives seg fault.

What do you mean by "parse twice"? Do you trigger 2 parse runs and if so why? Also, it would be interesting to see what causes the seg fault.

Yeah, triggering 2 parse runs on same parser. I understand why it is not a correct way.
CompilerParser::AtomContext atom(expr); //I thought this will work to construct AtomContext class and access it members but is doesn't work.

You are not supposed to create the contexts yourself. They are created while building the parse tree.


Also, I can't find how to build parsetree from parser.

You neither build the parse tree yourself. It is created automatically by the parser (if enabled).


OK. I understand this.
But my doubt is :
I have this input:
rule ALLOW = sip:192.168.120.1;

my grammar for parsing this rule is :

declare_entry :
                    type IDENT (LB granu=expression RB)? (ASSIGN value=expression)? SEMICOLON
;
expression : item #atom
                    | op expression #single
;
item: function_call  #func
         | flow_or_rule_entry  #rule_flow
         | constant
;

flow_or_rule_entry: fields COLON constant;
 
In my visitor file:
antlrcpp::Any  visitDeclare_entry func, I can access rule name through Declare_entryContext->IDENT()->getText(), but how do I access its  subcontext field values , for e.g. the constant field of flow_or_rule_entryContext.
The reason why I want to access the subcontext fields in base context is that if I want to add this symbol in my symboltable, I want its name as well as its constant value.
I was earlier trying to access all these subcontexts in my main file, but couldn't get it working as I mentioned in previous mail. But, even if I try to access these fields in each subcontexts's visitor function, I need to access fields from a child subcontexts in parent subcontext. Is there a way to do so from antlrcpp::Any object returned from visitor functions ?
If yes, how to do so ?

Thank you so much.

Regards,
Harsha Sharma

Mike Lischke

unread,
Jun 2, 2019, 5:32:42 AM6/2/19
to antlr-discussion

OK. I understand this.
But my doubt is :
I have this input:
rule ALLOW = sip:192.168.120.1;

my grammar for parsing this rule is :

declare_entry : 
                    type IDENT (LB granu=expression RB)? (ASSIGN value=expression)? SEMICOLON
;
expression : item #atom
                    | op expression #single
;
item: function_call  #func
         | flow_or_rule_entry  #rule_flow
         | constant 
;

flow_or_rule_entry: fields COLON constant;
 
In my visitor file:
antlrcpp::Any  visitDeclare_entry func, I can access rule name through Declare_entryContext->IDENT()->getText(), but how do I access its  subcontext field values , for e.g. the constant field of flow_or_rule_entryContext.

The rule contexts for matched subrules are also part of this context. For instance you can write Declare_entryContext->expression()->item()->flow_or_ruleentry()->getText() to access the text that matched this subrule. Don't forget to test each subcontext if it exists, since they are optional.

The reason why I want to access the subcontext fields in base context is that if I want to add this symbol in my symboltable, I want its name as well as its constant value. 

You can do that also by implementing the enterFlow_or_rule_entry() function and take the passed in context to access its subfields. For an example how to collect symbols see the DetailsListener implementation in my ANTLR4 extension for Visual Studio Code: https://github.com/mike-lischke/vscode-antlr4/blob/master/src/backend/DetailsListener.ts. This is written in Typescript however. For a C++ example (but more complex) see how I retrieve MySQL symbols (tables, indexes, columns etc.) from MySQL code in MySQL Workbench: https://github.com/mysql/mysql-workbench/blob/8.0/modules/db.mysql.parser/src/ObjectListeners.cpp. I use listeners however, not visitors.

I was earlier trying to access all these subcontexts in my main file, but couldn't get it working as I mentioned in previous mail. But, even if I try to access these fields in each subcontexts's visitor function, I need to access fields from a child subcontexts in parent subcontext. Is there a way to do so from antlrcpp::Any object returned from visitor functions ?

The visitor result is something you have to construct. It's not built automatically. Also, a visitor is meant to either determine a single return value (say the result of an expression evaluation) or to act on the given contexts in some way. IMO listeners are better suited for such kind of tasks like symbol processing.


Reply all
Reply to author
Forward
0 new messages