HTML/Markdown style grammar for ANTLR4

93 views
Skip to first unread message

Fabian Deitelhoff

unread,
May 28, 2016, 8:44:29 AM5/28/16
to antlr-discussion

Hi there,

I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :)

Here's my lexer definition:

lexer grammar dnpMDLexer;

NL
 
: [\r\n]
 
;

LISTING_TAG
 
: '`'
 
;

ITALIC_TAG
 
: '*'
 
;

IMAGE_PATH_TAG_OPEN
 
: '[['
 
;

IMAGE_PATH_TAG_CLOSE
 
: ']]'
 
;

HEAD_TAG
 
: '#'
 
;

HEADING_TEXT
 
: ('\\#'|'\\*'|~[*#`\r\n])+
 
;

RUNNING_TEXT
 
: ('\\#'|'\\`'|'\\*'|~[#*`])+
 
;

And here's my parser definition:

parser grammar dnpMDParser;

options
{ tokenVocab=dnpMDLexer; }

dnpMD
 
: subheadline headline lead bodyElements*
 
;

subheadline
 
: HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG NL
 
;

headline
 
: HEAD_TAG HEADING_TEXT HEAD_TAG NL
 
;

lead
 
: HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG
 
;

bodyElements
 
: text bodyElement
 
;

text
 
: RUNNING_TEXT
 
;

bodyElement
 
: subheading
 
| imageheading
 
| listing
 
| italic
 
| EOF
 
;

subheading
 
: HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG
 
;

imageheading
 
//: HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG IMAGE_PATH_TAG_OPEN HEADING_TEXT IMAGE_PATH_TAG_CLOSE HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG
 
: HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG
 
;

listing
 
: LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG .+? LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG
 
;

italic
 
: ITALIC_TAG .+? ITALIC_TAG
 
;

I tried this stuff in ANTLRworks2 and IntelliJ with the ANTLR4 plugin.

I've heavy problems with the listing and the italic rule. Matching way to much in some cases and nothing in other. The attached image shows that it's quite working somehow.

Am I heading in the right direction? I tried to use the HTML grammar as a template. Not quite sure if the ANTLR4 modes could help me to distinguish between outer text and inner text of tags?

Maybe someone has some useful hints. I'm thankful for every hint I can get because I'm not 100% sure that the way I'm working on this problem will lead me towards the right direction.

I think the .+? definitions within the listing and italic rules can destroy way too much while parsing. But I haven't found another solution yet. :( And the imageheading rule isn't working as expected as well. I would love to add a [[Path]] field within the beginning of the heading but the current rules are matching too much.

Thanks,
Fabian
dnpMD2.png
Reply all
Reply to author
Forward
0 new messages