Help needed: stop parsing section of file and put it to a accessible buffer (string)

13 views
Skip to first unread message

Stjepan Vukadin

unread,
Apr 20, 2018, 7:19:07 AM4/20/18
to antlr-discussion

Hi guys and gals,

I need help solving this problem so I hope you can help :)

I would like to have grammar that would have it's structure strictly defined, but part of the structure should not be parsed by my grammar but put into some sort of a buffer (string) for later use.

My grammar looks like this:

grammar RSL;

rsl: sectionStructs? sectionProgram;

sectionProgram: 'section' 'program' '{' '}';

sectionStructs: 'section' 'structs' '{' structDef+ '}';
sectionName: ID;

structDef: 'struct' ID '{' varDef+ '}' ';';

varDef: ID ID ';';

ID: [a-zA-Z_][a-zA-Z_\-0-9]*;

WS : [ \t\r\n\u000C]+ -> skip
;

COMMENT
: '/*' .*? '*/' -> skip
;

LINE_COMMENT
: '//' ~[\r\n]* -> skip
;


And my wish is to have this sort of parsing going on:

section structs {
    struct TestStruct {
        int var1;
        float var2;
        ...
    };
    
    struct Struct2 {
        int var1;
        ...
    };
}

section program {
    // Do not parse anything that would be in this section
    // just store it in a buffer for later use.
}


So all contents of section program should be stored in a string for a later use and no grammar rules should apply to program.

What is the best way of approaching this problem?

Thanks!

Mike Lischke

unread,
Apr 20, 2018, 8:51:13 AM4/20/18
to azrdev via antlr-discussion
Hi,

I would like to have grammar that would have it's structure strictly defined, but part of the structure should not be parsed by my grammar but put into some sort of a buffer (string) for later use.



That’s tricky and tbh. I believe that’s not possible. The reason is you have no dedicated delimiters for that catch-all block. The braces around that block are used also for normal blocks, hence you can neither use lexer modes nor the more traditional way like multi line comments are defined (which is a very similar problem, but there you have /* and */ as dedicated delimiters). I rewrote your grammar as a test:

grammar RSL;

rsl: sectionStructs? sectionProgram;

sectionProgram:
SECTION_SYMBOL PROGRAM_SYMBOL PROGRAM_CONTENT
;

sectionStructs:
SECTION_SYMBOL STRUCTS_SYMBOL OPEN_BRACE structDef+ CLOSE_BRACE
;
sectionName: ID;

structDef:
STRUCT_SYMBOL ID OPEN_BRACE varDef+ CLOSE_BRACE SEMICOLON
;

varDef: ID ID ';';

SECTION_SYMBOL: 'section';
PROGRAM_SYMBOL: 'program';
STRUCT_SYMBOL: 'struct';
STRUCTS_SYMBOL: 'structs';

OPEN_BRACE: '{';
CLOSE_BRACE: '}';
SEMICOLON: ';';

PROGRAM_CONTENT: OPEN_BRACE .*? CLOSE_BRACE;

ID: [a-zA-Z_][a-zA-Z_\-0-9]*;
WS: [ \t\r\n\u000C]+ -> channel(HIDDEN);
COMMENT: '/*' .*? '*/' -> skip;
LINE_COMMENT: '//' ~[\r\n]* -> skip;



And get this parse tree:



For this input:

section structs {
    struct TestStruct {
        int var1;
        float var2;
    };

    struct Struct2 {
        int var1;
    };
}

section program {
    Anything you always wanted to know
    but never dared to ask.
}

As you can see the catch-all block works, but due to the way ANTLR4 parses it will always match any incoming open brace/close brace block which will break the other block rules (sectionStructs and structDef).

I’m afraid you will have to define a full grammar, also for your program content, right from the beginning.

Stjepan Vukadin

unread,
Apr 20, 2018, 10:09:54 AM4/20/18
to antlr-discussion
Thanks for advice and rewrite Mike!

I think I'll have to pre-parse the file and find the start location of program section and then just count open/closed brackets until I have the content pulled out and then letting the ANTLR process the rest of the file... Syntax check should still work when ANTLR is doing it's thing because if brackets are not matched properly.

If you have any other advice it would be appreciated :)
Reply all
Reply to author
Forward
0 new messages