Advice on conversion from CST to AST

123 views
Skip to first unread message

Tabiul Mahmood

unread,
Sep 11, 2013, 6:24:10 AM9/11/13
to sab...@googlegroups.com
I have below grammar

Helper
 digit
= ['0' .. '9'];
 character
= ['a' .. 'z']|['A' .. 'Z'];
 
Token
  number
= digit+;
 
string = character+;  
  keywordA
= 'keywordA';
  keywordB
= 'keywordB';
  keywordC
= 'keywordC';
 
Production
   lang
= options*;
   option
= {A}   [key]:keywordA [value]:number
           
|{B}   [key]:keywordB [value]:string
           
|{str} [key]:string [value]:string


I want to have the AST tree to be something like below

        <lang>
        /  |  \      
       /   |   \
      /    |    \
  <option> ......*
    /  \
   /    \ 
  <key>  <value> 

This means at the root is the node "Lang", under that there are multiple children of type option. For each option node, the left node is the Key as defined in the production
and the right hand side is the value as defined in the production

Any idea how I can do this

Etienne Gagnon

unread,
Sep 12, 2013, 3:28:50 PM9/12/13
to sab...@googlegroups.com
Hi Tabiul,

From your grammar, SableCC already produces the desired AST. Please read Chapters 3 to 6 of the following document: http://sablecc.sourceforge.net/thesis/thesis.html#PAGE21

Have fun!

Etienne
Etienne Gagnon, Ph.D.
http://sablecc.org
On 2013-09-11 06:24, Tabiul Mahmood wrote:
I have below grammar

[...] 
Production
   lang
= options*;
   option
= {A}   [key]:keywordA [value]:number
           
|{B}   [key]:keywordB [value]:string
           
|{str} [key]:string [value]:string

Pomeroy, Roger C

unread,
Sep 27, 2013, 7:02:43 PM9/27/13
to sab...@googlegroups.com

I am trying to write a grammar that allows a list of items to be input.  For example, suppose I have tokens for either “number” or “letter”.  I can see how to write a grammar for a list of those:

 

list =

  {one} list_member |

  {more} list comma list_member;

 

list_member =

   number |

   letter;

 

But now suppose what I need is to write a grammar that allows you to have a list that contains all numbers, or a mixture  of numbers and letters, but cannot contain just letters.   How do I write that grammar ... I keep getting errors.  Is this just because that is not something I can parse with LALR(1)?

 

Thanks

 

Roger

.

Etienne Gagnon

unread,
Oct 6, 2013, 8:47:04 PM10/6/13
to sab...@googlegroups.com
Hi Roger,

As you can guess, it is simpler to do this during semantic analysis (after parsing), instead of inside the grammar. But, actually, there are other reasons that can make this preferable.

Here is an LALR(1) grammar for it:

list =
  {all_numbers} number+ |
  {at_least_one_letter} number* letter element*;
element =
  {letter} letter |
  {number} number;

This grammar forces you to handle 2 list alternatives, and 3 distinct elements in the second alternative (in your compiler/interpreter). A lot of work.

Personally, I find the following more elegant and simpler to work with (in the compiler/interpreter):

list =
  element+;
element =
  {letter} letter |
  {number} number;

After parsing, you just need to visit the list and make sure one of the elements is a number; if not, report an error.

** MAIN ARGUMENT

Now, imagine I type the list: "a b c d ;"

If I use the first grammar, I will get a "syntax error" on the ";" telling me that "the parser is expecting a TNumber".

If, instead, I use the second grammar, the semantic verifier will tell me that "the 'a b c d' list is missing at least one number" (or some similar, human readable and semantically-appropriate message).

Which one do you prefer?


Have fun!

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

Reply all
Reply to author
Forward
0 new messages