Computation of average in lexer and parser

Phuc Luoi

unread,

Jul 25, 2012, 5:45:28 PM7/25/12

to sab...@googlegroups.com

Hi Etienne,

Today I let the tool FindBug (http://findbugs.sourceforge.net/) inspect my project and the generated
parser. It tells me somethink like:

Computation of average could overflow
The code computes the average of two integers using either division or signed right shift, and then uses the result as the index of an array. If the values being averaged are very large, this can overflow (resulting in the computation of a negative average). Assuming that the result is intended to be nonnegative, you can use an unsigned right shift instead. In other words, rather that using (low+high)/2, use (low+high) >>> 1

This bug exists in many earlier implementations of binary search and merge sort. Martin Buchholz found and fixed it in the JDK libraries, and Joshua Bloch widely publicized the bug pattern.

These codes are defined in lexer.txt (line 141) and parser.txt (line 104).

Can the lexer's table and the shift/reduce table so big that it makes overflow?

Best
Hong Phuc

Etienne Gagnon

unread,

Jul 25, 2012, 6:32:13 PM7/25/12

to sab...@googlegroups.com

Hi Hong Phuc,

Wow! Interesting catch.

To answer your question: You're likely to hit out-of-memory errors way before this bug. I'm guessing that SableCC itself won't be able to compute the tables in the first place if they are so huge (once compressed!).

I don't think that you'll be able to get such huge tables using human-written normal grammars. Of course, if you intently write grammars that cause exponential automaton explosion or if you generate humongous grammars automatically, you might be able to reach such limits.

In theory, you could also play with some SableCC 3 command-line inlining directives and some nearly ambiguous grammar to allow exponential grammar growth through inlining but, even then, I think that automaton computation time will be the problem that will first hit you. :)

Yet, in the binary search code of SableCC 3.x, it is worth doing the proposed change because dividing using an unsigned-shift-right is likely to be faster on modern processors... I think that I'll adopt it.

Thanks,

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

On 2012-07-25 17:45, Phuc Luoi wrote:

Hi Etienne,

Today I let the tool FindBug (http://findbugs.sourceforge.net/) inspect my project and the generated
parser. It tells me somethink like:

[...]

In other words, rather that using (low+high)/2, use (low+high) >>> 1

[...]

Etienne Gagnon

unread,

Jul 26, 2012, 9:03:15 AM7/26/12

to sab...@googlegroups.com

Actually, the overflow is even less likely than I thought; SableCC uses
binary search for character intervals in the lexer, and for tokens in
the parser. So unless you end up with more than 2 billion tokens or 2
billion intervals in your grammar, you won't experience an integer
overflow in the binary search.

Phuc Luoi

unread,

Jul 26, 2012, 9:12:08 AM7/26/12

to sab...@googlegroups.com

Hi Etienne,

I also think, that the chance that my grammar too big is, is approximated null. Practice it will never reach 2 billion tokens.

Hong Phuc

--
-- You received this message because you are subscribed to the SableCC group. To post to this group, send email to sab...@googlegroups.com. To unsubscribe from this group, send email to sablecc+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/sablecc?hl=en

Niklas Matthies

unread,

Jul 26, 2012, 9:16:16 AM7/26/12

to sab...@googlegroups.com

On Thu 2012-07-26 at 09:03h, you wrote:
> Actually, the overflow is even less likely than I thought; SableCC
> uses binary search for character intervals in the lexer, and for
> tokens in the parser. So unless you end up with more than 2 billion
> tokens or 2 billion intervals in your grammar, you won't experience
> an integer overflow in the binary search.

I thought SableCC is using BigInteger for tokens?

-- Niklas Matthies

Etienne Gagnon

unread,

Jul 27, 2012, 6:42:29 AM7/27/12

to sab...@googlegroups.com

Hi Niklas,

It"s SableCC 4 that uses big integers, not SableCC 3.

Have fun!

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

Etienne Gagnon

unread,

Aug 7, 2012, 7:57:06 PM8/7/12

to sab...@googlegroups.com

Hi Hong Phuc,

I have released SableCC 3.5 that uses unsigned shift right for the division.

Have fun!

Etienne

On Wednesday, July 25, 2012 5:45:28 PM UTC-4, Phuc Luoi wrote:

[...]Assuming that the result is intended to be nonnegative, you can use an unsigned right shift instead. In other words, rather that using (low+high)/2, use (low+high) >>> 1

Phuc Luoi

unread,

Aug 8, 2012, 7:37:26 AM8/8/12

to sab...@googlegroups.com

Hi Etienne,

I'm pleased to see your email. I have seen the post in Facebook too.

Hong Phuc

--
-- You received this message because you are subscribed to the SableCC group. To post to this group, send email to sab...@googlegroups.com. To unsubscribe from this group, send email to sablecc+u...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/sablecc?hl=en

Reply all

Reply to author

Forward