How to distinguish similar ascii sequences into separate tokens?

39 views
Skip to first unread message

Corey Cason

unread,
Jan 13, 2016, 8:18:00 AM1/13/16
to SableCC
With sablecc3, how would I establish tokens such that free-form ascii strings (using characters 32-127) could be differentiated from ascii operator sequences, such that a production like the following is effective...
    expr = [left]:ascii_text* ascii_op_seq [right]:ascii_text* new_line;

The problem I'm having is that the operator sequence is not a single distinguishable character, so the closest I've gotten is where the parser attributes the entire line to the left ascii_text token.

Consider ascii strings 'foo', 'bar' and operator 'ob'.  I would like to parse the following into AST fields with left="foo", right="bar" and operator="ob":
  foo ob bar
  fooobbar
  "foo" ob "bar"
  'foo' ob 'bar'

Quoted strings should be exempt also, such that:
  "foob " ob "obar" yields left="foob" right="obar", operator="ob";

Notionally, what I'm trying to do is effectively have the lexer to parse left to right until the first non-quoted instance of the ascii operator sequence and yield the left and right sides of the operator as separate tokens.  Is this feasible?

Thanks


Michael B. Mast

unread,
Jan 13, 2016, 11:38:15 AM1/13/16
to sab...@googlegroups.com

I suspect from your description that you have not defined the operator tokens before the grammar.  It would help if you would post your grammar.

--
-- You received this message because you are subscribed to the SableCC group. To post to this group, send email to sab...@googlegroups.com. To unsubscribe from this group, send email to sablecc+u...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/sablecc?hl=en
---
You received this message because you are subscribed to the Google Groups "SableCC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sablecc+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael B. Mast

unread,
Jan 13, 2016, 11:46:53 AM1/13/16
to sab...@googlegroups.com

Also I suggest reviewing some grammars that produce simple calculators.  There may be one in the old SableCC grammar archive file or you can just Google one.  Here’s one:  http://improve.dk/writing-a-calculator-in-csharp-using-sablecc/

This grammar file is a little more complex than needed to just illustrate how to build a calculator, but the added complexity shouldn’t be too distracting.

 

 

 

From: sab...@googlegroups.com [mailto:sab...@googlegroups.com] On Behalf Of Corey Cason
Sent: Monday, January 11, 2016 3:03 PM
To: SableCC <sab...@googlegroups.com>
Subject: How to distinguish similar ascii sequences into separate tokens?

 

With sablecc3, how would I establish tokens such that free-form ascii strings (using characters 32-127) could be differentiated from ascii operator sequences, such that a production like the following is effective...

Reply all
Reply to author
Forward
0 new messages