How to parsing text containing multilevel of {}

Christian LeMoussel

unread,

Nov 9, 2013, 5:00:57 AM11/9/13

to antlr-di...@googlegroups.com

I am looking for help about parsing text containing multilevel of {} to build a graph (well, sort of a tree) of every part of the string inside the outer-most {}'s

For example with this text : {a {b|c} d e | f}

I want to obtain tokens a, b, c, d, e, f . The graph would be

         -> b -
        /      \
  -> a -        -> d -> e
 /      \      /
/        -> c -
\
 \
  -> f

I think, I need a non-regular expression grammar parser to parse the text to a sequence of tokens.

Can you help to do this with ANTLR, How can I define grammar & parser rules?

I make a C# applications to do parse the text and display the graph.

BR.

Christian.

Rem: Excuse me for my poor English, I'm french.

Greg D

unread,

Nov 12, 2013, 7:45:18 PM11/12/13

to

Est-ce correct?

Antlr 4:

grammar L ;

debut    : ( niveau )? EOF ;
niveau   : ( Feuille | branche ) ( niveau )? ;
branche  : '{' niveau  '|' niveau '}' ;

Feuille  : ~[{}| \t\r\n]+ ;
AttacheG : '{' ;
AttacheD : '}' ;
Barre    : '|' ;
Espaces  : [ \t\r\n]+ -> skip;

Bash:

$ antlr4 L.g4
$ javac *.java
$ grun L debut -tokens -tree -gui << END


> {a {b|c} d e | f}

> END
[@0,0:0='{',<2>,1:0]
[@1,1:1='a',<1>,1:1]
[@2,3:3='{',<2>,1:3]
[@3,4:4='b',<1>,1:4]
[@4,5:5='|',<4>,1:5]
[@5,6:6='c',<1>,1:6]
[@6,7:7='}',<3>,1:7]
[@7,9:9='d',<1>,1:9]
[@8,11:11='e',<1>,1:11]
[@9,13:13='|',<4>,1:13]
[@10,15:15='f',<1>,1:15]
[@11,16:16='}',<3>,1:16]
[@12,18:17='<EOF>',<-1>,2:0]
(debut (niveau (branche { (niveau a (niveau (branche { (niveau b) | (niveau c) }) (niveau d (niveau e)))) | (niveau f) })) <EOF>)
$

Greg D

unread,

Nov 13, 2013, 3:49:21 PM11/13/13

to antlr-di...@googlegroups.com

Christian,

Le code ci-dessous est une alternative, basée sur les exemples que vous avez envoyés en privé.

grammar MSpin ;

/* la version claire
debut    : ( corps )? EOF ;
corps    : ( Texte | choix | niveau )+ ;
choix    : '{' corps  ( '|' corps )+ '}' ;
niveau   : '{' corps '}' ;
*/

// la version inline, montrant un arbre simple
debut    : (   ( Texte | choix | niveau )+   )? EOF ;
choix    : '{' ( Texte | choix | niveau )+  ( '|' ( Texte | choix | niveau )+ )+ '}' ;
niveau   : '{' ( Texte | choix | niveau )+ '}' ;

Texte    : ~[{}| \t\r\n]+ ;

Christian LeMoussel

unread,

Nov 16, 2013, 9:12:08 AM11/16/13

to antlr-di...@googlegroups.com

Greg, Merci.

Et j'ai réussi à l'intégrer dans du C# avec les adaptations suivantes :

grammar MSpin;

// Lexer
TEXTE : ~[{}|]+ ;
LBRACE : '{' ;
RBRACE : '}' ;
PIPE : '|' ;
WS : [ \t\r\n]+ -> skip ;

// Parser
expr : ((TEXTE | choix | niveau )+)? ;
choix : LBRACE ( TEXTE | choix | niveau )+ ( PIPE ( TEXTE | choix | niveau )+ )+ RBRACE ;
niveau : LBRACE ( TEXTE | choix | niveau )+ RBRACE ;

Reply all

Reply to author

Forward