Emtpy token sequence

Cesare Zecca

unread,

Aug 30, 2007, 6:31:20 AM8/30/07

to

(I tried to continue Aravinda's thread "empty token sequence error"
but that discussion is closed and no reply are there allowed).

Well, there is very few documentation around (in this group, too)
about the "empty token sequence" issue.
Thus, let's take again the Aravinda's thread.

Weel, after the definition of the "canonical" ID lexical rule, I added
to Gpl.jj (the grammar has been previously mentioned, see

Avoid SKIPping while producing another terminal / Unit tests for
JavaCC tokens and productions
http://groups.google.com/group/comp.compilers.tools.javacc/browse_thread/thread/e33401b7c7c81b66
)
the lexical rule for the group identifiers. A group identifier is an
identifier optionally followed by zero or more identifiers, being
separated by a given string (currently a string composed by a single
dot character)

[I] <GROUP_ID : <ID> ( <GROUP_ID_SEPARATOR> <ID> )* >

[II] <GROUP_ID : <ID> <GROUP_ID_SEPARATOR> <ID>
( <GROUP_ID_SEPARATOR> <ID> )* >

The second version [II] is an attempted workaround to clean the
warning described below.
Here is the Gpl.jj

------------------------
options
{
STATIC = true;
UNICODE_INPUT = true;
}

PARSER_BEGIN(Gpl)

package it.finmatica.gpj.ec.istruzioni.gpl;

public class Gpl
{
public static
void
main( String args[] ) throws ParseException
{
Gpl lParser = new Gpl( System.in );
lParser.ExpressionList();
}
} // end class Gpl

PARSER_END(Gpl)

SKIP :
{ " "
| "\t"
| "\n"
| "\r"
}

TOKEN : // Identifiers
{ <ID : ["a"-"z", "A"-"Z","_"] ( ["a"-"z", "A"-"Z", "_", "0"-"9"] )* >
}

TOKEN : // Group identifiers
{ <GROUP_ID : <ID> ( <GROUP_ID_SEPARATOR> <ID> )* >
| < #GROUP_ID_SEPARATOR: "." >
}

/***
TOKEN : // Group identifiers
{ <GROUP_ID : <ID> <GROUP_ID_SEPARATOR> <ID> ( <GROUP_ID_SEPARATOR>
<ID> )* >
| < #GROUP_ID_SEPARATOR: "." >
}
***/

void
ExpressionList() :
{
String lString;
}
{
{
System.out.println( "Type in an expression followed by a \";\" or ^D
to quit:" );
System.out.println( "" );
}

( lString = Expression() ";"
{
System.out.println( lString );
System.out.println("");
System.out.println("Type in another expression followed by a \";\"
or ^D to quit:" );
System.out.println("");
}
)*
<EOF>
} // end Gpl.ExpressionList()

String
Expression() :
{
java.util.Vector<String> lTermImage = new java.util.Vector<String>();
String lResult;
}
{
lResult = Term()
{
lTermImage.addElement( lResult );
}

( "+" lResult = Term()
{
lTermImage.addElement( lResult );
}
)*

{
if ( lTermImage.size() == 1 )
{
lResult = lTermImage.elementAt(0);
return lResult;
}
else
{
lResult = "the sum of " + lTermImage.elementAt(0);
for ( int i = 1; i < lTermImage.size() - 1; i++ )
{
lResult += ", " + lTermImage.elementAt(i);
}
if ( lTermImage.size() > 2 )
lResult += ",";
lResult += " and " + lTermImage.elementAt( lTermImage.size() -
1 );
return lResult;
}
}
} // end Gpl.Expression()

String
Term() :
{
java.util.Vector<String> lFactorImage = new
java.util.Vector<String>();
String lResult;
}
{
lResult = Factor()
{
lFactorImage.addElement( lResult );
}
(
"*" lResult = Factor()
{
lFactorImage.addElement( lResult );
}
)*
{
if ( lFactorImage.size() == 1 )
{
lResult = lFactorImage.elementAt(0);
return lResult;
}
else
{
lResult = "the product of " + lFactorImage.elementAt(0);
for ( int i = 1; i < lFactorImage.size() - 1; i++ )
{
lResult += ", " + lFactorImage.elementAt(i);
}
if ( lFactorImage.size() > 2 )
lResult += ",";
lResult += " and " + lFactorImage.elementAt( lFactorImage.size() -
1 );
return lResult;
}
}
} // end Gpl.Term()

String
Factor() :
{
Token lToken;
String lResult;
}
{
{
lResult = Id();
return lResult;
}
| {
lResult = GroupId();
return lResult;
}
} // end Gpl.Factor()

String
Id() :
{
Token lToken;
String lResult;
}
{
lToken = <ID>
{
lResult = lToken.image;
return lResult;
}
} // end Gpl.Id()

String
GroupId() :
{
Token lToken;
String lResult;
}
{
lToken = <GROUP_ID>
{
lResult = lToken.image;
return lResult;
}
|
lToken = <ID>
{
lResult = lToken.image;
return lResult;
}
} // end Gpl.GroupId()

// EOF, in unit testing allows to test a given input and not a prefix
// (see also:
http://groups.google.com/group/comp.compilers.tools.javacc/browse_thread/thread/e33401b7c7c81b66/cede39f37111bc73#cede39f37111bc73)
String
Eof() :
{
Token lToken;
String lResult;
}
{
lToken = <EOF>
{
lResult = lToken.image;
return lResult;
}
} // end Gpl.Eof()
------------------------

Both <ID> and <GROUP_ID> share a prefix for the input.
Following the suggestion

3.3. What if more than one regular expression matches a prefix of the
remaining input?
http://www.engr.mun.ca/~theo/JavaCC-FAQ/

I wrote the non terminal GroupId() (see above).
Notwithstanding this I got the following warning (switching between
the [I] and [II] definition does not "clean" the warning)

Warning: Line 153, Column 9: This choice can expand to the empty
token sequence and will therefore always be taken in favor of the
choices appearing later.

javacc complains that within Factro() non terminal the Id() might be
choosen to the detriment of the following choice (GroupId())

What does it mean? what's wrong, if anything is wrong?
Excuse me if the questions might be naive, I've found only a brief
mention about the empty token sequence issue in the
http://www.engr.mun.ca/~theo/JavaCC-Tutorial/. p. 25, "Maximal munch")

Is there any documentation about the empty token sequence issue?

Thanks in advance for any clarification.

ciao
Cesare

AC

unread,

Aug 30, 2007, 8:52:16 AM8/30/07

to

Cesare Zecca wrote:

> Warning: Line 153, Column 9: This choice can expand to the empty
> token sequence and will therefore always be taken in favor of the
> choices appearing later.
>
>
> javacc complains that within Factro() non terminal the Id() might be
> choosen to the detriment of the following choice (GroupId())

In this case the problem is in Factor: all the calls are erroneously
inside java code sections, so to JavaCC it looks like Factor calls no
productions or tokens.

----------

String
Factor() :
{
Token lToken;
String lResult;
}

{
{
lResult = Id(); // bug: inside {javacode}
return lResult;
}
| {
lResult = GroupId(); // bug: inside {javacode}
return lResult;
}

} // end Gpl.Factor()

----------

String
Factor() :
{
Token lToken;
String lResult;
}

{
lResult = Id() // fix: outside {javacode}
{ return lResult; }
|
lResult = GroupId() // fix: outside {javacode}
{ return lResult; }

} // end Gpl.Factor()

----------

Hope this helps!

Cesare Zecca

unread,

Aug 31, 2007, 11:26:08 AM8/31/07

to

On 30 Ago, 14:52, AC <u...@domain.invalid> wrote:

> Hope this helps!

Thanks, AC.
It works fine. The warning about the empty token sequence disappeard.

Now I have to solve a lookahead warning because, even if I followed,
within Factor(), the suggestion

3.3. What if more than one regular expression matches a prefix of the
remaining input?

ID and GROUP_ID share a common prefix and cause a lookahead problem.
But this is another story and I began to study the lookagead tutorial
https://javacc.dev.java.net/doc/lookahead.html

Noch einmahl, vielen Danke. :)
Have a nice week end!
Cesare

AC

unread,

Sep 1, 2007, 8:06:17 AM9/1/07

to

Cesare Zecca wrote:
> ID and GROUP_ID share a common prefix and cause a lookahead problem.

Actually, the problem is that there are two ways to parse an <ID>,
either
Factor() -> Id() -> <ID>
or
Factor() -> GroupId() -> <ID>
The parser doesn't know which one to choose.

For the example grammar, this is unnecessary, so the fix is to
eliminate the redundancy.

Approach 1: Eliminate Id() and GroupId() and replace the body of
Factor() with the body of GroupId(). This results in the simplest
grammar. Then <ID> would be parsed via
Factor() -> <ID>

Approach 2: Remove <ID> from GroupId(), since it is already covered by
the Id() call in Factor(). Then <ID> would be parsed via
Factor() -> Id() -> <ID>

Hope this helps!