SableCC parsing given wrong result

164 views
Skip to first unread message

psaravan...@isa.ae

unread,
Jul 21, 2015, 6:52:34 AM7/21/15
to sab...@googlegroups.com

I tried to parse the valid message using sablecc. There are three type of valid message format.

  1. aaa; (three continuous alpha character +semi {messageid} messageid semi )
  2. mm; ( or two continuous alpha or numeric character {flightnum} carriercode semi)
  3. -amm (or hyphen + alpha character + 2 continuous alpha or numeric character {load} hypene co semi )

when I input valid string to the programme, it did not work.

input:

abc; //type 1

ZZ; //type 2

ZZ; //type 2

-ab2; //type3

sablecc grammar code :

 Helpers
    /* Our helpers */
    fa = ['0' .. '9'] ;
    a = [['a' .. 'z'] + ['A' .. 'Z']] ;
    m=  [a + fa];
    sp = ' ' ;
    cr = 13 ; // carriage return
    lf = 10 ; // line feed
    tab = 9 ; // tab char
    bl = sp | cr | lf | tab;


Tokens
    /* Our simple token definition(s). */
    semi = ';' bl*;
    co = (a)(m)(m);
    messageid = (a)(a)(a) ;
    carriercode = (m)(m);
    hypene ='-';

Productions
    program =  {single} statement |
                {sequence} program statement;
    statement = {messageid} messageid semi |
                {flightnum}carriercode semi |
                {load} hypene co semi ;

compilation succeed, when run the java code it throws parser exception :

simpleAdders.parser.ParserException: [1,1] expecting: messageid, carriercode, '-'

Even though first string is valid.

Etienne Gagnon

unread,
Jul 21, 2015, 5:22:46 PM7/21/15
to sab...@googlegroups.com
Hi,

The problem is in your lexer specification.

It is helpful to use a debugging lexer to identify such problems:

PushbackReader input = new PushbackReader(new FileReader(filename), 1024);
Lexer lexer = new Lexer(input){

  @Override
  protected void filter()
      throws LexerException, IOException {

    System.out.println(this.token.getClass().getSimpleName() + ": [" + token.getText() + "]");
  }
};
Parser parser = new Parser(lexer);
return parser.parse();

Feeding your example input to this lexer/parser reveals:

TCo: [abc]
[1,1]: expecting: messageid, carriercode, '-'.

The "abc" token is scanned as a "co", not as a "messageid". This is expected, as "abc" corresponds to both definitions of "co" and "messageid", but "co" has precedence because it appears first in the Tokens section.

Have fun!

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

psaravan...@isa.ae

unread,
Jul 22, 2015, 3:01:32 AM7/22/15
to SableCC
Hi  Etienne,
Thanks for the great help.  In a normal way, if one statement (load) is failed, it should check other statement (messageid) other than throwing the exception, right? . is it any way to check sequentially (for an example first character is alpha ( there will be possibility, the statement may be messageid or flightnum), or is hyphene (it should be load statement) ,then second character like wise ....).

Thanks :)

Etienne Gagnon

unread,
Jul 22, 2015, 10:00:52 AM7/22/15
to sab...@googlegroups.com
Hi,

(What's your first name?)

Lexers are not very good for delivering different tokens depending on context. There are some advanced trick one could play with lexer states, but it is usually much simpler to let the parser do such things.

If you defined "a" and "m" as tokens, you could define "co", "messageid", and "carriercode" as productions, and your problem would be solved.


Have fun!

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

psaravan...@isa.ae

unread,
Jul 23, 2015, 12:19:04 AM7/23/15
to SableCC, ega...@j-meg.com
Hi Etienne,

My first name is Parane.

Thanks again for your great help and great tool.

yesterday, I could not access your thesis due to sourceforge server down . now I read your thesis. I got the idea how sablecc works. now I cleared my doubts.  

In my project, I had some overlapping tokens ( e.g amm --> code1 mmm --> code2 I need info code 1 and code2 as token , code2 only appear in particular message format, if define code1 before code2 , that particular message part (token) read as code1. as mentioned by you If I define code1 and code2 in production, i could not get code2 and code2 details separately.). 

Lets play with the Sablecc :)

Thanks

Best Regards,
Parane.
Reply all
Reply to author
Forward
0 new messages