ANTLR4: Two Different Ways of Designing a Rule (Detecting Tokens Like ##<Text>##)

Fabian Deitelhoff

unread,

Jun 2, 2016, 4:57:28 PM6/2/16

to antlr-discussion

Hello there,

I'm designing a grammar for a markdown based language but without the context awareness.

For example I want to detect tokens like ## <text> ##.

I found two different ways of designing rules for that and I'm not quite sure which way could be the best approach.

The first way: Defining more complex tokens and a simple rule.

    fragment
    HEAD
        : '#'
        ;
    
    fragment
    HEADING_TEXT
        : (~[#]|'\\#')+?
        ;
    
    SUBHEADLINE
        : HEAD HEAD HEADING_TEXT HEAD HEAD
        ;

---

    subheadline
        : SUBHEADLINE
        ;

Due to the fragments HEAD and HEADING_TEXT would get to the parser. I'm prototyping within IntelliJ and the parsing works well. And the errors message show something like "missing SUBHEADLINE" what's great for the main application (I think I can change those errors easily to human readable ones).

The second approach: Much simpler tokens and more complex rules for the parser.

    HEAD
        : '#'
        ;
    
    HEADING_TEXT
        : (~[#]|'\\#')+?
        ;

---

    subheadline
        : HEAD HEAD HEADING_TEXT HEAD HEAD

        ;

Works fine, too. The errors are more specific and maybe not very good for transforming them to human readable ones.

But I'm overall not sure which approach I should follow and why?! The more complex tokens are easier to write in this case because there won't be any complex rules like normal programming languages contains. But it don't feel like this is the correct way of doing it.

Maybe someone can provide some hints. :)

Thanks

Fabian

Eric Vergnaud

unread,

Jun 2, 2016, 10:21:09 PM6/2/16

to antlr-discussion

Fabian,

not sure there is just one 'correct' way of doing things with ANTLR.

It really depends on what is most important for you:speed, clarity, reuse, maintenance...

You have already found 2 ways, and one seems better than the other.

Sounds great! Simply keep up thinking like that.

Eric

n.b. in your grammars, I believe +? can be replaced by * (unless I'm missing something)

John B. Brodie

unread,

Jun 2, 2016, 11:44:49 PM6/2/16

to antlr-di...@googlegroups.com

Greetings!

Note that with the second version, any Skipped and/or Hidden tokens will be accepted between the two HEAD tokens.

Maybe that does not matter in your use case, but just to be aware....

Hope this helps...

-jbb

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward