Parser Difficulty for G-Code

Kent Bowling

unread,

Jan 1, 2015, 3:15:04 PM1/1/15

to antlr-di...@googlegroups.com

I'm attempting to write a Python program that among other things parses G-Code (language used to control CNC machines) using ANTLR4. Because I'd like to target Python, I'd like to avoid embedding code in the grammar.

The grammar of G-Code basically consists of blocks (basically just lines in the G-Code implementation I'm dealing with). Blocks can have multiple instructions on them, but can't have duplicates of certain types. For instance, there are "M" codes and there can only be one per block. In the grammar I have something like:

block : (lineNumber)? WS* ((comment | instruction) WS*)* EOB;

instruction: gcode

| mcode

| axis

| tcode

| scode

| fcode

| pcode

| hcode

| rcode

;

So a legal block for instance could be:

G0 X1.0 Y2.0 Z3.0 M03

But an illegal one could be for instance:

G0 X1.0 M03 Y2.0 Z3.0 M03

The fact it has two M codes (in this case two of the same M codes) makes it illegal (it would also be illegal if it has two X, Y, or Z movements, or another G code in the same "group", which is a whole other can of worms).

Currently I'm parsing the G-Code, putting it in a tree, then using a walker to go through the tree as a "second" syntax checker to detect these cases. (FWIW, the listener I created is using a "composite pattern" to call multiple listeners for various rules, and doing this in Python with get/setattr was pretty cool)

Is there any way to somehow put these checks in the original grammar, or is my approach the only way to keep the grammar language agnostic (and thus an incomplete grammar)?

Sam Harwell

unread,

Jan 1, 2015, 6:17:33 PM1/1/15

to antlr-di...@googlegroups.com

Hi Kent,

It is possible to put these rules in the original grammar, but it is not recommended for several reasons. Placing these types of semantic rules in the post-parse stage improves the ability to report understandable error messages, and greatly improves the ability of the parser to recover from syntax errors. In many cases it also improves the performance of the parser and reduces the size of both the generated code and the dynamic cache that is created at runtime.

Sam

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kent Bowling

unread,

Jan 2, 2015, 12:30:01 AM1/2/15

to antlr-di...@googlegroups.com

Sam,

Thanks for the reply. I can certainly see what you're saying about having better control on error messages for sure. It's easier to interpret "I saw 2 of X" instead of "I saw an X but was expecting an A, B, C, ... or Z" :). I don't feel too bad about doing it the way I'm doing it now.

One thing that would be nice though is to have a grammar that others could use for other language targets. If someone wanted to write a Java program and use my grammar they'd have to re-implement the rules I've added in my "2nd syntax check". I'm also going to be checking things that would be impossible to check in a grammar in any event (for instance, arcs in G-code use I, J, and K numbers, but those numbers have to make mathematical sense).

You mention it's possible to check that there's only one of each command type in the grammar. IF I was to do it in rules, would I have to essentially list every permutation of tokens (which is a massive list of permutations in my case), or is there a better way? For instance if I had a language that every block could contain zero or one A's, B's, or C's (followed by some EOL), would my rule have to look like?:

block : A B? C? EOL

| A C? B? EOL

| B A? C? EOL

| B C? A? EOL

| C A? B? EOL

| C B? A? EOL

| EOL // Empty block, which is valid

;

Jim Idle

unread,

Jan 2, 2015, 1:10:17 AM1/2/15

to antlr-di...@googlegroups.com

You are on the good path. In fact it is usually better to make your parser rules accept anything that might make sense - the parser will be simple.

I say this about twice a month but pushing all errors as far down the tool chain as you can most always gives better feedback for the user of your tool chain. Don't check for valid characters in lexer specs, accept anything non-ambiguous then check later. Trivially if B is not a valid character in an ID then accept it anyway but later say "identifier ABC is an invalid name because B is not allowed". And so on down the chain. Other advantages include being able to print more errors in one attempt instead of giving up at the first error.