read_numeral never handles negative numbers

77 views
Skip to first unread message

Summpot

unread,
Feb 12, 2024, 5:46:21 AM2/12/24
to lua-l
diff --git a/llex.c b/llex.c
index 9f20d3c8..4d218588 100644
--- a/llex.c
+++ b/llex.c
@@ -4,6 +4,7 @@
 ** See Copyright Notice in lua.h
 */
 
+#include <stdio.h>
 #define llex_c
 #define LUA_CORE
 
@@ -246,11 +247,13 @@ static int read_numeral (LexState *ls, SemInfo *seminfo) {
     lexerror(ls, "malformed number", TK_FLT);
   if (ttisinteger(&obj)) {
     seminfo->i = ivalue(&obj);
+    printf("%lld\n",seminfo->i);
     return TK_INT;
   }
   else {
     lua_assert(ttisfloat(&obj));
     seminfo->r = fltvalue(&obj);
+    printf("%lf\n",seminfo->r);
     return TK_FLT;
   }
 }

The output of the above diff is as follows.
$ echo "print(-9223372036854775807)" | ./lua
9223372036854775807
-9223372036854775807
$ echo "print(-9223372036854775808)" | ./lua
9223372036854775808.000000
-9.2233720368548e+18

As you can see, the isneg in l_str2int doesn't work at all, and the overflow detection is so aggressive that it fails to store the smallest value of int64 as an integer, and it's not known how it ends up correctly outputting negative values (perhaps the unary operator-).

bil til

unread,
Feb 12, 2024, 6:03:03 AM2/12/24
to lu...@googlegroups.com
But your "miracle number" is 0x7FFF FFFF FFFF FFFF ... maybe you
should better mention this (so 7FF... in 64bit system - I assume you
use 64bit system).

With this number and the neighbouring number 0x8000 0000 0000 0000
better be VERY cautious, if you handle integers... .(see explanation
in Wiki "Signed number representations" / 0x80 / "Offset binary...").

... if you reach such high integers without understanding what going
on, better do NOT use integers... .

float also is NO ideal world: in float your will get other "strange
counting errors" if you use float for counting ca. above 2^22 for
32bit float, or ca. 2^51 for 64bit float... (see Wiki 32bit / 64bit
float...)..

Summpot

unread,
Feb 12, 2024, 9:13:42 AM2/12/24
to lua-l
The problem lies in the fact that 0x8000 0000 0000 0000 can be stored as an integer, whereas -9223372036854775808 can't, so obviously they're the same number.
The cause of this problem is in the following code.
lua/lobject.c at master · lua/lua (github.com)
static const char *l_str2int (const char *s, lua_Integer *result) {
  lua_Unsigned a = 0;
  int empty = 1;
  int neg;
  while (lisspace(cast_uchar(*s))) s++;  /* skip initial spaces */
  neg = isneg(&s);
  if (s[0] == '0' &&
      (s[1] == 'x' || s[1] == 'X')) {  /* hex? */
    s += 2;  /* skip '0x' */
    for (; lisxdigit(cast_uchar(*s)); s++) {
      a = a * 16 + luaO_hexavalue(*s);
      empty = 0;
    }
  }
  else {  /* decimal */
    for (; lisdigit(cast_uchar(*s)); s++) {
      int d = *s - '0';
      if (a >= MAXBY10 && (a > MAXBY10 || d > MAXLASTD + neg))  /* overflow? */
        return NULL;  /* do not accept it (as integer) */
      a = a * 10 + d;
      empty = 0;
    }
  }
  while (lisspace(cast_uchar(*s))) s++;  /* skip trailing spaces */
  if (empty || *s != '\0') return NULL;  /* something wrong in the numeral */
  else {
    *result = l_castU2S((neg) ? 0u - a : a);
    return s;
  }
}
Which uses the isneg function to determine whether the number is negative or not, and is used to compare whether the current number is greater than the maximum value of int64 9223372036854775807 or less than the minimum value of int64 -9223372036854775808.
But isneg clearly does not work here because the argument s always does not contain a negative sign.

bil til

unread,
Feb 12, 2024, 9:34:05 AM2/12/24
to lu...@googlegroups.com
Trying to generate a "general function" strtoint which works at
0x8000... is generally a misleading idea leading quite straight away
into a mine field... .

You should better concentrate on a function calculating the difference
between two numbers, and then in then if the 0x80.. sign bit is set
use negative casting (signed int) for the two numbers, enforcing
compiler to use signed subtraction, and if not set you use positive
casting (unsigned int) for the two numbers, , / negative
substraction, enforching the compiler to use unsigned substraction.

And if you insist that you dream on two functions "sub_signed" and
"sub_unsigned", then do careful analysis of the sign bit range, and
return a reasonable result ... but the user has to know, that there IS
such ambiguity for int numbers around 0x80...., otherwise you cannot
really help such user... .

(such subtract fuction will allow nice subtracting for all time
numbers or um axis mechanical encoder numbers... if you need such
function for other applications, better please describe the
application very exactly, where you dream of using such ambigious
0x80... int numbers - every "carefully programmed" application must
avoid this number range around 0x80000.... for int).

Roberto Ierusalimschy

unread,
Feb 12, 2024, 9:47:42 AM2/12/24
to lu...@googlegroups.com
I am not sure what you are trying to say. The subject is correct:
"read_numeral never handles negative numbers". This is by design,
and it is not exclusive to Lua.

Consider the expression x-1. If the lexer handled negative numbers,
the parser would see two tokens: "x" and "-1", and it could not figure
out that this was a binary operation. So Lua, like Java and many other
languages, read that as three tokens: "x", "-", and "1". Because the
lexer has no context, anywhere it sees "-1" it will read that as two
tokens, "-" and "1". So, read_numeral never handles negative numbers,
as you correctly pointed out.

Java has a special provision for the case of the mininum integer:

The Java® Language Specification
Java SE 8 Edition
Section 3.10 (§15.15.4)
The decimal literal 9223372036854775808L may appear
only as the operand of the unary minus operator

Note that the literal is not "-9223372036854775808L". The minus is
seen as an unary minus operator, not part of the literal. Lua does
the same, but it does not have that special provision, so it sees
9223372036854775808 as an overflow.

-- Roberto

Summpot

unread,
Feb 12, 2024, 10:00:23 AM2/12/24
to lua-l
But someone used isneg in this code, which requires a negative sign in the token to work properly, which relates to overflow checking. Maybe it needs special handling in lexer?

Roberto Ierusalimschy

unread,
Feb 12, 2024, 10:19:44 AM2/12/24
to lu...@googlegroups.com
> But someone used isneg in this code, which requires a negative sign in the
> token to work properly, which relates to overflow checking. Maybe it needs
> special handling in lexer?

"this code" is used by other parts of Lua, not only the lexer.

$ lua
> tonumber("-9223372036854775808")
-9223372036854775808
> tonumber("-9223372036854775809")
-9.2233720368548e+18

It is not difficult to find that out. ('grep' is your friend.)

-- Roberto

Summpot

unread,
Feb 12, 2024, 10:30:13 AM2/12/24
to lua-l
But is it really good to only have literal numbers that don't work as expected? Don't we need to be consistent?

bil til

unread,
Feb 12, 2024, 10:55:50 AM2/12/24
to lu...@googlegroups.com
I am quite sure, that if you check "signed int i= 0x80000000..." in
your C code and then ask for "i < 0", the result can depend on C
compiler (either 0 or 1).

This is a "general int amgiguous 0x8000... value problem", not a Lua
problem, and as C is a very speed concerned language it does not
specify such details too tediously.... .

(in fact this might very well depend on your CPU machine, so e. g. ARM
might give different result compared to Intel Pentium world for such a
comparison, ARM even might give different results depending on the ARM
version used)..
Reply all
Reply to author
Forward
0 new messages