Support for hierarchical grammar in SableCC

258 views
Skip to first unread message

v.kum...@gmail.com

unread,
May 23, 2013, 1:53:24 AM5/23/13
to sab...@googlegroups.com

Given root grammar R and imported grammars A and B, ANTLR generates classes R, R_A, and R_B. Class R has two delegate pointers to instances of type R_A and R_B so that rules in R can access the imported rules. The delegate classes have delegator pointers back to the root so that any rule can get to any other rule. There is no need to copy any rules from the delegate grammars into the root grammar. Every import of a delegate grammar results in the generation of a delegate parser not just the root parser. So if two root grammars import A, ANTLR will generate two different classes from A.

ANTLR must do this because a rule overridden in the root grammar often changes the prediction lookahead sets of rules in the imported grammar. Additionally, all token types must be consistent across rules from all delegate grammars. There is little choice but to regenerate classes for delegate grammars.

This requirement raises the issue of multiple code regeneration for top node classes in our environment. This problem will get compounded because the node depth of the tree in our case is at least 4. Is there a way SableCC provides to stop this regeneration of classes and support multilevel grammar files. If yes, please provide me with detailed steps.

Etienne Gagnon

unread,
May 23, 2013, 10:33:02 AM5/23/13
to sab...@googlegroups.com
Hi Vivek,

One solution I would suggest, using SableCC, would be to use a virtual distinguishing start token.

Here is an example. Assume that you have two grammars A and B that share some productions. You could write the following grammar:
...
Tokens

  // normal tokens
  if = 'if';
  while = 'while';
  identifier = letter (letter | digit)*;
  ...

  // virtual token (not matched by the generated lexer)
  a_start = ;
  b_start = ;

Productions

  start =
    {a} a_start a |
    {b} b_start b;

// grammar A
  a = x y;
  y = while identifier ...;

// grammar B
  b = z x;
  z = identifier ...;

// shared productions
  x = if identifier ... ;
In order to make it work, you'll need to write a custom lexer class that extends the generated lexer and automatically returns a leading TAStart or TBStart token to the parser, to select the grammar you wish to parse.

I made many assumptions about what your problem actually is; I hope I wasn't too far off and that this solution will help you. Please let us know if this helps.

Have fun!

Etienne
Etienne Gagnon, Ph.D.
http://sablecc.org
On 2013-05-23 01:53, v.kum...@gmail.com wrote:

Given root grammar R and imported grammars A and B, ANTLR generates classes R, R_A, and R_B. Class R has two delegate pointers to instances of type R_A and R_B so that rules in R can access the imported rules. The delegate classes have delegator pointers back to the root so that any rule can get to any other rule. There is no need to copy any rules from the delegate grammars into the root grammar. Every import of a delegate grammar results in the generation of a delegate parser not just the root parser. So if two root grammars import A, ANTLR will generate two different classes from A.

ANTLR must do this because a rule overridden in the root grammar often changes the prediction lookahead sets of rules in the imported grammar. Additionally, all token types must be consistent across rules from all delegate grammars. There is little choice but to regenerate classes for delegate grammars.

This requirement raises the issue of multiple code regeneration for top node classes in our environment. This problem will get compounded because the node depth of the tree in our case is at 1least 4. Is there a way SableCC provides to stop this regeneration of classes and support multilevel grammar files. If yes, please provide me with detailed steps. --

v.kum...@gmail.com

unread,
May 24, 2013, 1:23:19 AM5/24/13
to sab...@googlegroups.com

Hi Etienne,

Thanks for quick response. You answer explains a large part of my query but still let me explain my problem in more contrasting manner.

Assume that I am designing a product which has different grammar for different OS and DB. Let us say I have a grammar file R as my root grammar which is common among all OS and DB. Next i decide to write a grammar OS_A which is a different file and which inherits grammar of R. Also there is OS_B another grammar file and inherits grammar of R but not OS_A.. In a more simpler way R is the root of the tree and OS_A is its left child and OS_B is its Right child..  Let me add some more nodes to this Tree.. Say DB_A be another grammar file which inherits grammar of OS_A.. So in a way it also inherits grammar of R as R is inherited by OS_A . In the tree structure DB_A is left child of OS_A.

So in ANTLR when i tried this and recompiled the leaf node then it started regenerating all the classes. Like if a make change to Class of DB_A then all the classes which DB_A has inherited gets regenerated with different file name... Why it regenerates has been explained in my previous post.. This regeneration causes me great difficulty as file name of newly generated class is changed( i.e., class name changed) and if i am using this class in suppose 100 different methods or different Java files, i need to make this change manually to other files...

So if you can show me an example with different grammar files engaged in a hierarchical Tree structure as i explained using SableCC then it solve my big problem. Also if you can show using IF ELSE statement i can decide which grammar to take then also may be i can solve the issue. Hope i have explained it enough for you to get the core of my problem..   

Cheers,
Vivek

Etienne Gagnon

unread,
May 24, 2013, 9:11:27 AM5/24/13
to sab...@googlegroups.com
Hi Vivek,

You'll find examples in http://downloads.sourceforge.net/sablecc/sablecc-grammars-2.0.0.zip . In particular, you might want to look at the MiniBasic example for your "if/else".

SableCC does not use a grammar inheritance scheme. Instead, I have shown one possible solution to you (using a virtual start token).

Another solution is to write different grammar files, but use identical token/production/alternative names, when appropriate, and indicate the same "Package" declaration. Then you can use symbolic links and or make use of branches in a code repository (Git, Subversion, etc.) to achieve your various versions using the same core.

Two key properties of SableCC that should simplify things for you:
  1. There is no user code embedded in the grammar file. All user code is written is independent subclasses. Nothing prevents you from developing different subclasses (functionalities) for the same grammar.
  2. SableCC's generated code a has very stable public interface: class and method names do not change when you modify the grammar, unless you explicitly change the names (token, production, alternative).
Have fun!

Etienne
Etienne Gagnon, Ph.D.
http://sablecc.org
--
-- You received this message because you are subscribed to the SableCC group. To post to this group, send email to sab...@googlegroups.com. To unsubscribe from this group, send email to sablecc+u...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/sablecc?hl=en
---
You received this message because you are subscribed to the Google Groups "SableCC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sablecc+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

v.kum...@gmail.com

unread,
Jun 3, 2013, 8:25:56 AM6/3/13
to sab...@googlegroups.com

Hi Etienne,

There is one more thing which i wanna ask. I'm quite new in SableCC. Lets refer that mini basic example only. Suppose i want to write custom java code in it, Like i wanna Print my name in the console whenever ENDIF is encountered. And it should be such that if after compiling and running it. Now i change in grammar file and then compile the grammar all over again but I should not require to make any change or write again any where to run that printing my name after encountering ENDIF.

Can you please show me thru a JAVA file or attach a Java file in which you write custom code as tutorial. Also printing my name is very simple, solution should be such that i should be able write many more custom code... Also it should not be like it is only for Java as i need parser for both C and Java Language.

Please guide me this.
Cheers,
Vivek

Etienne Gagnon

unread,
Jun 3, 2013, 2:27:11 PM6/3/13
to sab...@googlegroups.com
Hi Vivek,

I am not sure I understand your question. Do you want to build a
compiler or an interpreter?

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

v.kum...@gmail.com

unread,
Jun 4, 2013, 1:38:20 AM6/4/13
to sab...@googlegroups.com
I actually just want to add custom action when ever ENDIF is encountered when i pass my input file to SableCC. For Example "Say i want to print my name on the console".
Suppose my INPUT file for minibasic.jar is like this
-------------------------------------------------------------------------------
PRINT "What is your number"
READ I
PRINT I
IF I < 10 THEN
PRINT " * 3 = "
PRINT I * 3
ELSE
PRINT " + 3 = "
PRINT I + 3
ENDIF
PRINTLN
-----------------------------------------------------------------------------

Now want to define a custom action such as i want to print ENDIF encountered in the console whenever ENDIF is encounter while parsing the INPUT file(or printing my name instead of ENDIF encountered). So i define a Java class where i write this custom code for printing on the console....

But it should not be such that if suppose i change my grammar file like instead of PRINT i write DISPLAY.. this should not make me compile my custom java class again. Will i able to compile just the grammar file and pass my custom code to it work??

Also it would very helpful for you to demonstrate it thru a tutorial of how to write custom java class to define action on grammar

Thanks,
Vivek

Etienne Gagnon

unread,
Jun 4, 2013, 11:06:43 AM6/4/13
to sab...@googlegroups.com
Hi Vivek,

The tutorial, in Chapter 3 of my M.Sc. thesis is what you are looking for:
http://sablecc.sourceforge.net/thesis/thesis.html#PAGE21

For a detailed explanation, you want to read Chapter 5:
http://sablecc.sourceforge.net/thesis/thesis.html#PAGE52

As for the specific solution to printing your name on each ENDIF, here is one solution:
Lexer lexer = new Lexer(new PushbackReader(new BufferedReader(new FileReader(arguments[0])), 1024));
Parser parser = new Parser(lexer);
Node ast = parser.parse();

// create an instance of a subclass of DepthFirstAdapter
// to print "Vivek" on each ENDIF
// and apply it to the AST
ast.apply(new DepthFirstAdapter() {
  @Override
  public void outAIfStatement(AIfStatement node) {
    System.out.println("Vivek");
  }
});

Have fun!


Etienne
Etienne Gagnon, Ph.D.
http://sablecc.org
Le 2013-06-04 01:38, v.kum...@gmail.com a écrit :
I actually just want to add custom action when ever ENDIF is encountered when i pass my input file to SableCC. For Example "Say i want to print my name on the console".
Suppose my INPUT file for minibasic.jar is like this
[...]

v.kum...@gmail.com

unread,
Jun 5, 2013, 2:26:16 AM6/5/13
to sab...@googlegroups.com

Hi Etienne,

Thanks for the help... I saw the example link you sent few days back and i was getting some error so i asked you how to write it may be i was doing something wrong.. But still m getting the error so m sending you my java file.

PFA my java class to print my name on the console.
Here is what i did

D:\Parser\sablecc-grammars-2.0.0\minibasic-1.0\src>javac test.java

D:\Parser\sablecc-grammars-2.0.0\minibasic-1.0\src>java test test.basic
Exception in thread "main" java.lang.ClassCastException: org.sablecc.minibasic.n
ode.ADivExpression cannot be cast to org.sablecc.minibasic.node.TIdentifier
        at org.sablecc.minibasic.parser.Parser.new21(Parser.java:477)
        at org.sablecc.minibasic.parser.Parser.parse(Parser.java:266)
        at test.main(test.java:18)

M doing something wrong over here.

Also when i tried running it from an IDE( netbeans) i passed the test.basic file as input it ran successfully but didnt show any output. Is there a particular way to run it thru an IDE.

Thanks,
Vivek
test.java

Etienne Gagnon

unread,
Jun 5, 2013, 4:27:13 PM6/5/13
to sab...@googlegroups.com
Hi Vivek,

I suggest that you try deleting (a) all *.class files and (b) the
generated \node, \lexer, \parser, and \analysis folders. Then, you can
regenerate the code using SableCC 3.7, compile it, and run it.
Sometimes, the java compiler omits to recompile old class files causing
such problems. Also, some bugs have been resolved in SableCC 3.7.

I will let others help you with the NetBeans environment.

Have fun!

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

On 2013-06-05 02:26, v.kum...@gmail.com wrote:
> [...] Thanks for the help... I saw the example link you sent few days
> back and i was getting some error so i asked you how to write it may
> be i was doing something wrong.. But still m getting the error so m
> sending you my java file.
> [...]

Hong Phuc Bui

unread,
Jun 5, 2013, 9:16:18 PM6/5/13
to sab...@googlegroups.com
Hi Vivek,

If you use the sablecc plugin in Netbeans, you can call SableCC from
within the Netbean by
right-clicking on the SableCC file and chose SableCC in ContextMenu. It
will call SableCC 3.7
so, that the generated java files are saved in the folder, where your
sablecc file is.
So if you save your SableCC in src/grammar.scc, the generated files will
be in
src/org/yourname/.... and you must delete all the generated files by
hand. Just using "clean and build" in
Netbeans will not delete these files.

Best Wish
HongPhuc

v.kum...@gmail.com

unread,
Jun 6, 2013, 4:09:28 AM6/6/13
to sab...@googlegroups.com

Thanks Etienne and Phuc,

After recompilation of all java files it worked... Also i have downloaded SableCC 3.7 to run it. NetBeans i didn't check. But i think it will do just fine

I think the thread has gone little longer. But now i want to give you a use case what I actually adhere to do. This use case is related to my Title i.e., Support for hierarchical grammar.

I'll keep it very simple. I want to write a grammar file. say test.sablecc, which is described as below

  Helpers
  letter = ['A'..'Z'];
  digit = ['0'..'9'];

Tokens
count = 'COUNT';
max = 'MAX';

Production
statements = {list}  statement statement | {empty};
{count} count digit+ | max


I m not sure whether i have written the grammar correct..  but what i want to do is in a Input file say test.txt will contain

count 100
count max

So when i pass this to SableCC to parse it should print successful or not Successful based on whether is the input file is adhering to grammar or not...
Catch here is:

I will write a custom test.java file like i wrote for ENDIF thing in previous post... Now what that java file should do is. It will receive two inputs as arg. 1) input file test.txt... 2) DB name

So based on this DB name the parsing should be done.... Like---- count 100 will be common grammar for all and what ever the DB name be, it will be parsed successfully... But --- count max is only specific to MySQL that is if DB name is MySQL then it should be parsed successfully else it should display an error... in the case DB name is not MySQL .. that is say DB2, Informix, Oracle, sybase etc

I hope i was clear in explaining my dilemma... Can you show me how the grammar for this case will look like and also the custom Java class thru which DB name and Input file will be passed...

This i was not able to do for long time with different parsers... Please i'm stuck because of this.. and thru SableCC if you can help me out to achieve this may be i can move forward..

Sorry for the trouble

Thanks,
Vivek   

v.kum...@gmail.com

unread,
Jun 6, 2013, 5:31:50 AM6/6/13
to sab...@googlegroups.com
One more thing which i forgot to mention is that.. in that example of printing my name whenever ENDIF happens...
But it not asking me

What is your number? as it was asking before over riding the method...
Can it do for this input to minibasic example

PRINT "What is your number"
READ I
PRINT I
IF I < 10 THEN
PRINT " * 3 = "
PRINT I * 3
ELSE
PRINT " + 3 = "
PRINT I + 3
ENDIF
PRINTLN


Ask me what is your number and performs corresponding operation and then parse ENDIF to print my name...

Please do answer my last post.. That is very much important..

Regards,
Vivek

v.kum...@gmail.com

unread,
Jun 10, 2013, 1:13:24 AM6/10/13
to sab...@googlegroups.com

Hi Etienne,

Is there a way to the solve the use case i have given here using sableCC.

Please reply as soon as possible...

Thanks,

Vivek


On Thursday, June 6, 2013 1:39:28 PM UTC+5:30, v.kum...@gmail.com wrote:

Etienne Gagnon

unread,
Jun 10, 2013, 12:42:22 PM6/10/13
to sab...@googlegroups.com
Hi Vivek,

I already outlined one solution for you in a previous message ( https://groups.google.com/d/msg/sablecc/Lfemf2Xx6uk/ej3XhJQLPR8J ). Of course, you'll have to actually write the code of a custom lexer to feed the distinguishing token as the initial token returned by the next() method.

Unfortunately, I don't have the time to write the complete example for you. I think that, with the information I provided, it should be fairly easy for any good Java programmer to do it in approximately 30 minutes (to write and debug).

Have fun!

Etienne

Etienne Gagnon

unread,
Jun 10, 2013, 12:53:18 PM6/10/13
to sab...@googlegroups.com
I forgot to write: In my M.Sc. thesis, you'll find an example of how to
customize a lexer: http://sablecc.sourceforge.net/thesis/thesis.html#PAGE38

v.kum...@gmail.com

unread,
Jun 11, 2013, 1:25:19 AM6/11/13
to sab...@googlegroups.com
Yeah i do agree...

I forgot about the virtual tokens was trying to do it thru DFA, I wrote the code and it was working fine...

Also i didn't try it but can i have a Grammar C which shares property of grammar A and also Grammar D which shares property of Grammar A and B but not C...

like

//grammar C
c= a x t;
t= for identifier ...

//grammar D
d= a b x u;
u = switch identifier ...


Thought i didnt try it but as per understanding you gave it should work i think.. Just confirming this..


Thanks,
Vivek

Etienne Gagnon

unread,
Jun 11, 2013, 7:03:11 AM6/11/13
to sab...@googlegroups.com
Hi Vivek,

Yes, it should work.

Have fun!

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

On 2013-06-11 01:25, v.kum...@gmail.com wrote:
> [...]
> Also i didn't try it but can i have a Grammar C which shares property
> of grammar A and also Grammar D which shares property of Grammar A and
> B but not C...
> [...]
> Thought i didnt try it but as per understanding you gave it should
> work i think.. Just confirming this..
> [...]

v.kum...@gmail.com

unread,
Jun 13, 2013, 3:34:36 AM6/13/13
to sab...@googlegroups.com
One more thing...

If i have a rule defined in shared productions and same rule i have written in grammar A which inherits shared production as well..
so which one does it SableCC take.. shared rule or Grammar A one..

Like
a=z x;

i have same rule in z and x as well where x is shared production... so which one will it take rule to be followed

Also can i decide which one to take... Cos in some cases i would require Shared to be overridden in some cases not be overridden.



Thank,
Vivek

Etienne Gagnon

unread,
Jun 13, 2013, 10:09:28 PM6/13/13
to sab...@googlegroups.com
Hi Vivek,

SableCC does not support grammar inheritance, but it has CST-to-AST
(concrete to abstract tree) transformations.

So, if you have similar but not identical rules, they cannot share the
same name in the concrete (i.e. parser) syntax. But, through tree
transformations, you could merge them into the same abstract (i.e. tree)
rule.

For example, if you had 3 relatively similar x1, x2, and x3 productions,
you could merge them into a single x tree production :

Productions // parser syntax
...
x1 = a b;
x2 = a c;
x3 = b c;

Abstract Syntax Tree // tree syntax (as returned by parser.parse())
...
x = a? b? c?;

But, you have to tell SableCC how to do this transformation. So, the
actual grammar has to look like:

Productions
...
x1 {-> x} = a b {-> new x(a, b, Null)};
x2 {-> x} = a c {-> new x(a, Null, c)};
x3 {-> x} = b c {-> new x(Null, a, b)};

Abstract Syntax Tree
...
x = a? b? c?;

This way, SableCC returns a tree with the simplified syntax (a single
production for the 3 cases).

Have fun!

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

Reply all
Reply to author
Forward
0 new messages