Fwd: Patch 5.1

10 views
Skip to first unread message

Leo Razoumov

unread,
Jan 17, 2011, 3:58:25 PM1/17/11
to lua-table...@googlegroups.com
Henning suggested that the attached email might be of some interest to
this group.
Here it is, complete with the patch to lua5.1-test suite.

--Leo--


---------- Forwarded message ----------
From: Leo Razoumov <slon...@gmail.com>
Date: Mon, Jan 17, 2011 at 08:22
Subject: Re: Patch 5.1
To: Henning Diedrich <hdie...@eonblast.com>


Hi Henning,
I fixed the lua5.1-tests official test suite by modifying some tests
in lua5.1-tests/main.lua. Since Lua-5.1.4 with libreadline6 under
Linux always echos the input in interactive mode '-i' I changed
expected results to reflect that. I attached the lua5.1-tests patch
below. I also had to disable one test in lua5.1-tests/code.lua (see
the patch attached).

Regarding your lua-count-patch. Performance penalty is negligible.
Here are the results on Dell Latitide E6500 laptop with Intel Core2
Duo P8600  @ 2.40GHz running Ubuntu-10.04 x86 (32 bit) with 4GB of
memory:

baseline lua5.1-tests (average of 10): elapsed wall-clock time t=0.903 seconds
W/ patch-lua-5.1.4-2-tru-count-0.1 (avg of 10): elapsed wall-clock
time t=0.905 seconds

Since the tests are really fast and variability between the runs
exceeds meager 2 milliseconds difference I tend to conclude that for
the use cases in the test suite your patch causes no apparent penalty.

--Leo--

lua5.1-tests%LR1.diff

Henning Diedrich

unread,
Jan 19, 2011, 10:26:53 AM1/19/11
to lua-table...@googlegroups.com, Leo Razoumov
I am trying to find out how much speed the 'tru count' patch consumes.

So far, it looks like around 2% for any table operation. Not the entire program, but the actual inserting or removing of values into a table.

It does not look as if table creation is slowed.

I am exploring if there could be a gain by using the length in the rehash.

On the way I found an interesting animal: the following bit from the table re-hashing parts can be stripped off one arithmetic calculation, 'i-1':

static int numusearray (const Table *t, int *nums) {
  int lg;
  int ttlg;  /* 2^lg */
  int ause = 0;  /* summation of `nums' */
  int i = 1;  /* count to traverse all array keys */
  for (lg=0, ttlg=1; lg<=MAXBITS; lg++, ttlg*=2) {  /* for each slice */
    int lc = 0;  /* counter */
    int lim = ttlg;
    if (lim > t->sizearray) {
      lim = t->sizearray;  /* adjust upper limit */
      if (i > lim)
        break;  /* no more elements to count */
    }
    /* count elements in range (2^(lg-1), 2^lg] */
    for (; i <= lim; i++) {
      if (!ttisnil(&t->array[
i-1]))
        lc++;
    }
    nums[lg] += lc;
    ause += lc;
  }
  return ause;
}
to become this:

static int numusearray (const Table *t, int *nums) {
  int lg;
  int ttlg;  /* 2^lg */
  int ause = 0;  /* summation of `nums' */
  int i = 0;  /* count to traverse all array keys */
  for (lg=0, ttlg=1; lg<=MAXBITS; lg++, ttlg*=2) {  /* for each slice */
    int lc = 0;  /* counter */
    int lim = ttlg;
    if (lim > t->sizearray) {
      lim = t->sizearray;  /* adjust upper limit */
      if (i >= lim)
        break;  /* no more elements to count */
    }
    /* count elements in range (2^(lg-1), 2^lg] */
    for (; i < lim; i++) {
      if (!ttisnil(&t->array[i]))
        lc++;
    }
    nums[lg] += lc;
    ause += lc;
  }
  return ause;
}


Now look at the results --- some things get faster, some slower. Is that for the < vs <= operators?

Lua 5.1.4-2 tru count 0.2
1000000x t[1] =1             201ns per operation
1000000x t[i] =1             224ns per operation
1000000x t[-i]=1             608ns per operation
1000000x t[i] =i             209ns per operation
1000000x t[i] =-i            228ns per operation
1000000x t[1]='a'            200ns per operation
1000000x t[i]='a'            206ns per operation
1000000x t[-i]='a'           253ns per operation
1000000x t[1]={}             490ns per operation
1000000x t[i]={}             659ns per operation
1000000x t[-i]={}            579ns per operation
1000000x t[1M-i]=1           235ns per operation

Lua 5.1.4-2 tru count 0.2 WITH i=0
1000000x t[1] =1             218ns per operation
1000000x t[i] =1             232ns per operation

1000000x t[-i]=1             583ns per operation
1000000x t[i] =i             217ns per operation
1000000x t[i] =-i            215ns per operation
1000000x t[1]='a'            216ns per operation
1000000x t[i]='a'            213ns per operation
1000000x t[-i]='a'           259ns per operation
1000000x t[1]={}             475ns per operation
1000000x t[i]={}             641ns per operation
1000000x t[-i]={}            584ns per operation
1000000x t[1M-i]=1           220ns per operation


Below is the benchmark script. Maybe one of you has an idea why this happens?

I would have expected only gains. Or is my rewrite flawed?


function printf(...)
  io.write(string.format(...))
end

function nanosecs_per_cycle(time, cycles)
    return time * 1000000000 / cycles
end

local t = {}

local function measure(prompt, run)

  local clock = os.clock

  local cycles = 1000000
  local i = 0
  local tm = clock()
 
  while(i < cycles) do run(i); i = i + 1 end
 
  tm = clock() - tm

 
  printf("%dx %-12s %10.0fns per operation\n", cycles, prompt, nanosecs_per_cycle(tm, cycles))

end

if(_PATCH) then print(_PATCH) else print(_VERSION .. ' official') end
measure("t[1] =1", function(x) t[1] = 1 end)
measure("t[i] =1", function(i) t[i] = 1 end)
measure("t[-i]=1", function(i) t[-i] = 1 end)
measure("t[i] =i", function(i) t[i] = i end)
measure("t[i] =-i", function(i) t[i] = -i end)
measure("t[1]='a'", function(i) t[1] = 'a' end)
measure("t[i]='a'", function(i) t[i] = 'a' end)
measure("t[-i]='a'", function(i) t[-i] = 'a' end)
measure("t[1]={}", function(i) t[1] = {} end)
measure("t[i]={}", function(i) t[i] = {} end)
measure("t[-i]={}", function(i) t[-i] = {} end)
measure("t[1M-i]=1", function(i) t[1000000-i] = 1 end)
Thanks,
Henning

Dirk Laurie

unread,
Jan 19, 2011, 11:05:18 AM1/19/11
to lua-table...@googlegroups.com
On Wed, Jan 19, 2011 at 05:26:53PM +0200, Henning Diedrich wrote:
> I am trying to find out how much speed the 'tru count' patch consumes.
...
Try the following: do a benchmark with only one of these operations,
repeated ten times. You should get an idea of the statistical
variation of the timing routine.

Way back when, I had a colleague who had memorized the full instruction
set of the computer we all used and the number of CPU cycles for each
instruction. He kept telling us that the fastest way to set a register
to zero was not to store the constant 0 in it (that requires a memory
access) or to subtract it from itself (that invokes a machine instruction
that needs to know about carrying even though there are no carries, so
two cycles) but to XOR it with itself. One CPU cycle, the minimum
possible. He also had no good words for the operating system's way of
measuring CPU time. The only way, he said, was to disassemble the code
and add up the machine cycles. He could do it in his head ...

Dirk

Henning Diedrich

unread,
Jan 19, 2011, 8:04:26 PM1/19/11
to lua table semantics
On Jan 19, 5:05 pm, Dirk Laurie <d...@sun.ac.za> wrote:
> On Wed, Jan 19, 2011 at 05:26:53PM +0200, Henning Diedrich wrote:

> Try the following: do a benchmark with only one of these operations,
> repeated ten times.  

It's what I did manually, and actually over time, to try to gauge when
it get's stable, and thus less affected ostensibly by background
tasks.

So it's not mere random what is listed above, the pattern is stable.

> fastest way to set a register to zero was not to store the constant 0 in it
> ... but to XOR it with itself.  

I could need that guy. And I am sure Roberto or Luiz know the answer
to the above, too.

I am still curious! Did you look at the code?

Henning Diedrich

unread,
Jan 20, 2011, 1:08:54 AM1/20/11
to lua table semantics
On Jan 19, 5:05 pm, Dirk Laurie <d...@sun.ac.za> wrote:
> Try the following: do a benchmark with only one of these operations,
> repeated ten times.

I have followed your advice literally and here is the result.

You could try yourself, to see what I mean, but THIS is 'more off'
than the manual start-and-start-again I did before, probably due to
the garbage collector kicking in at some point. In other words, when I
sart only one iteration of 100000, but again and again, it results in
more stable results. What's your take?

As visibly below, what really makes laps oscillating is: creating
tables (in this case, as table value content).

So a fair benchmark would probably ask for starting Lua fresh for
every benchmark run, looping in a bash script, instead of inside Lua.
Or what's your proposal?

Still, the outcome in my eyes is that the time needed for the actual
table mechanics of adding values is quite stable.

Thanks,
Henning

Lua 5.1.4-2 official
1: 1000000x t[1] =1 197ns per operation
2: 1000000x t[1] =1 195ns per operation
3: 1000000x t[1] =1 196ns per operation
4: 1000000x t[1] =1 195ns per operation
5: 1000000x t[1] =1 194ns per operation
6: 1000000x t[1] =1 194ns per operation
7: 1000000x t[1] =1 194ns per operation
8: 1000000x t[1] =1 195ns per operation
9: 1000000x t[1] =1 195ns per operation
10: 1000000x t[1] =1 194ns per operation

1: 1000000x t[i] =1 227ns per operation
2: 1000000x t[i] =1 218ns per operation
3: 1000000x t[i] =1 207ns per operation
4: 1000000x t[i] =1 207ns per operation
5: 1000000x t[i] =1 206ns per operation
6: 1000000x t[i] =1 207ns per operation
7: 1000000x t[i] =1 207ns per operation
8: 1000000x t[i] =1 206ns per operation
9: 1000000x t[i] =1 207ns per operation
10: 1000000x t[i] =1 207ns per operation

1: 1000000x t[-i]=1 595ns per operation
2: 1000000x t[-i]=1 262ns per operation
3: 1000000x t[-i]=1 262ns per operation
4: 1000000x t[-i]=1 262ns per operation
5: 1000000x t[-i]=1 262ns per operation
6: 1000000x t[-i]=1 262ns per operation
7: 1000000x t[-i]=1 262ns per operation
8: 1000000x t[-i]=1 262ns per operation
9: 1000000x t[-i]=1 262ns per operation
10: 1000000x t[-i]=1 262ns per operation

1: 1000000x t[i] =i 202ns per operation
2: 1000000x t[i] =i 202ns per operation
3: 1000000x t[i] =i 202ns per operation
4: 1000000x t[i] =i 202ns per operation
5: 1000000x t[i] =i 202ns per operation
6: 1000000x t[i] =i 202ns per operation
7: 1000000x t[i] =i 202ns per operation
8: 1000000x t[i] =i 202ns per operation
9: 1000000x t[i] =i 202ns per operation
10: 1000000x t[i] =i 202ns per operation

1: 1000000x t[i] =-i 221ns per operation
2: 1000000x t[i] =-i 221ns per operation
3: 1000000x t[i] =-i 221ns per operation
4: 1000000x t[i] =-i 221ns per operation
5: 1000000x t[i] =-i 221ns per operation
6: 1000000x t[i] =-i 221ns per operation
7: 1000000x t[i] =-i 221ns per operation
8: 1000000x t[i] =-i 221ns per operation
9: 1000000x t[i] =-i 221ns per operation
10: 1000000x t[i] =-i 221ns per operation

1: 1000000x t[1]='a' 196ns per operation
2: 1000000x t[1]='a' 196ns per operation
3: 1000000x t[1]='a' 197ns per operation
4: 1000000x t[1]='a' 196ns per operation
5: 1000000x t[1]='a' 196ns per operation
6: 1000000x t[1]='a' 196ns per operation
7: 1000000x t[1]='a' 196ns per operation
8: 1000000x t[1]='a' 197ns per operation
9: 1000000x t[1]='a' 197ns per operation
10: 1000000x t[1]='a' 197ns per operation

1: 1000000x t[i]='a' 216ns per operation
2: 1000000x t[i]='a' 216ns per operation
3: 1000000x t[i]='a' 216ns per operation
4: 1000000x t[i]='a' 217ns per operation
5: 1000000x t[i]='a' 217ns per operation
6: 1000000x t[i]='a' 217ns per operation
7: 1000000x t[i]='a' 217ns per operation
8: 1000000x t[i]='a' 217ns per operation
9: 1000000x t[i]='a' 217ns per operation
10: 1000000x t[i]='a' 217ns per operation

1: 1000000x t[-i]='a' 268ns per operation
2: 1000000x t[-i]='a' 268ns per operation
3: 1000000x t[-i]='a' 268ns per operation
4: 1000000x t[-i]='a' 268ns per operation
5: 1000000x t[-i]='a' 268ns per operation
6: 1000000x t[-i]='a' 268ns per operation
7: 1000000x t[-i]='a' 268ns per operation
8: 1000000x t[-i]='a' 268ns per operation
9: 1000000x t[-i]='a' 269ns per operation
10: 1000000x t[-i]='a' 271ns per operation

1: 1000000x t[1]={} 448ns per operation
2: 1000000x t[1]={} 655ns per operation
3: 1000000x t[1]={} 653ns per operation
4: 1000000x t[1]={} 505ns per operation
5: 1000000x t[1]={} 558ns per operation
6: 1000000x t[1]={} 650ns per operation
7: 1000000x t[1]={} 650ns per operation
8: 1000000x t[1]={} 573ns per operation
9: 1000000x t[1]={} 498ns per operation
10: 1000000x t[1]={} 655ns per operation

1: 1000000x t[i]={} 516ns per operation
2: 1000000x t[i]={} 582ns per operation
3: 1000000x t[i]={} 459ns per operation
4: 1000000x t[i]={} 513ns per operation
5: 1000000x t[i]={} 882ns per operation
6: 1000000x t[i]={} 427ns per operation
7: 1000000x t[i]={} 436ns per operation
8: 1000000x t[i]={} 506ns per operation
9: 1000000x t[i]={} 867ns per operation
10: 1000000x t[i]={} 751ns per operation

1: 1000000x t[-i]={} 477ns per operation
2: 1000000x t[-i]={} 552ns per operation
3: 1000000x t[-i]={} 526ns per operation
4: 1000000x t[-i]={} 815ns per operation
5: 1000000x t[-i]={} 961ns per operation
6: 1000000x t[-i]={} 483ns per operation
7: 1000000x t[-i]={} 479ns per operation
8: 1000000x t[-i]={} 552ns per operation
9: 1000000x t[-i]={} 560ns per operation
10: 1000000x t[-i]={} 763ns per operation

1: 1000000x t[1M-i]=1 226ns per operation
2: 1000000x t[1M-i]=1 225ns per operation
3: 1000000x t[1M-i]=1 225ns per operation
4: 1000000x t[1M-i]=1 225ns per operation
5: 1000000x t[1M-i]=1 225ns per operation
6: 1000000x t[1M-i]=1 225ns per operation
7: 1000000x t[1M-i]=1 225ns per operation
8: 1000000x t[1M-i]=1 226ns per operation
9: 1000000x t[1M-i]=1 226ns per operation
10: 1000000x t[1M-i]=1 225ns per operation


Here is the source:

function printf(...)
io.write(string.format(...))
end

function nanosecs_per_cycle(time, cycles)
return time * 1000000000 / cycles
end

local t = {}

local function measure1(prompt, run, iter)

local clock = os.clock

local cycles = 1000000
local i = 0
local tm = clock()

while(i < cycles) do run(i); i = i + 1 end

tm = clock() - tm


printf("%2d: %dx %-12s %10.0fns per operation\n", iter, cycles,
prompt, nanosecs_per_cycle(tm, cycles))

end

local function measure(prompt, run)

local i = 1
while(i<=10) do measure1(prompt, run, i); i = i + 1 end

Henning Diedrich

unread,
Jan 28, 2011, 12:31:58 PM1/28/11
to lua table semantics
Hi list,

I inspected the Lua core a little deeper, looking for ways that the
'tru count' could be used to create time savings for ordinary cases.
Rehashes are not that rare and during every rehash the actual true
count of elements in the table is counted, in the core, by traversing
all elements. But on the way, the number of actual integer keys is
also counted - you will remember that some of them also end up stored
in the hash part, instead of in the array.

If I get it right, the algorithm used to allocate space for the array
part, with space to grow, can become quite courageous. A lot of bytes
can get allocated, never to be used, once the table reaches a certain
size. I am not sure Lua can claim to be that focussed on a little foot
print and memory savings there ...

I have inserted a simple printf() to check if the counting that the
rehash does is in fact, as it looks, counting of all present, non-nil
elements. Looks like the answer is, Yes it is. So this means that the
'tru count' counter could make part of that traversal unnecessary and
by that, save some time. Probably not without an additional counter
though, that counts the 'real integer keys' used. And that may not
even be enough, they need to be counted separate as for wether they
are in the right range to fit into the array part or have to reside in
the hash part. So that's two additional counters, with overhead
anytime a key is added (which is different from 'tru count', which has
overhead everytime a value is added, deleted or changed).

Below I am listing an intermediate step on that way, maybe it's of
interest for one of you, to see the re-hash in action. (Just ignore
for now that 'tru count' is always one less than 'totalcount', because
the latter is pro-actively counted up by 1 already at the point when I
print it.)

Note how the hash and array parts grow, respectively. Depending on the
index keys. Also note how the hash part often has only one element ---
that's [0]. A flaw of the test Lua source, if you want, which tests
[0..99999] rather than well behaved [1..100000].

If you run the test yourself, which you could with the sources below,
and the patch, note that the initial lines you will be seeing are Lua
start up activity filling up of the global and other tables. Below, I
[snip]ped it.

Best,
Henning

[snip]
Lua 5.1.4-2 tru count 0.2
-----------------------------------------------------
1000000x t[1] =1
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
1000000x t[1] =1 219ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i] =1 [Below, 'numusehash 1' is due to i=0!]
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 1, numusehash 1, totaluse 3, tru count 2
Rehash: nasize 2, numusehash 1, totaluse 4, tru count 3
Rehash: nasize 4, numusehash 1, totaluse 6, tru count 5
Rehash: nasize 8, numusehash 1, totaluse 10, tru count 9
Rehash: nasize 16, numusehash 1, totaluse 18, tru count 17
Rehash: nasize 32, numusehash 1, totaluse 34, tru count 33
Rehash: nasize 64, numusehash 1, totaluse 66, tru count 65
Rehash: nasize 128, numusehash 1, totaluse 130, tru count 129
Rehash: nasize 256, numusehash 1, totaluse 258, tru count 257
Rehash: nasize 512, numusehash 1, totaluse 514, tru count 513
Rehash: nasize 1024, numusehash 1, totaluse 1026, tru count 1025
Rehash: nasize 2048, numusehash 1, totaluse 2050, tru count 2049
Rehash: nasize 4096, numusehash 1, totaluse 4098, tru count 4097
Rehash: nasize 8192, numusehash 1, totaluse 8194, tru count 8193
Rehash: nasize 16384, numusehash 1, totaluse 16386, tru count 16385
Rehash: nasize 32768, numusehash 1, totaluse 32770, tru count 32769
Rehash: nasize 65536, numusehash 1, totaluse 65538, tru count 65537
Rehash: nasize 131072, numusehash 1, totaluse 131074, tru count 131073
Rehash: nasize 262144, numusehash 1, totaluse 262146, tru count 262145
Rehash: nasize 524288, numusehash 1, totaluse 524290, tru count 524289
1000000x t[i] =1 226ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[-i]=1
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 2, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 4, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 8, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 16, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 32, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 64, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 128, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 256, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 512, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 1024, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 2048, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 4096, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 8192, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 16384, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 32768, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 65536, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 131072, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 262144, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 524288, totaluse 524289, tru count 524288
1000000x t[-i]=1 541ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i] =i
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 1, numusehash 1, totaluse 3, tru count 2
Rehash: nasize 2, numusehash 1, totaluse 4, tru count 3
Rehash: nasize 4, numusehash 1, totaluse 6, tru count 5
Rehash: nasize 8, numusehash 1, totaluse 10, tru count 9
Rehash: nasize 16, numusehash 1, totaluse 18, tru count 17
Rehash: nasize 32, numusehash 1, totaluse 34, tru count 33
Rehash: nasize 64, numusehash 1, totaluse 66, tru count 65
Rehash: nasize 128, numusehash 1, totaluse 130, tru count 129
Rehash: nasize 256, numusehash 1, totaluse 258, tru count 257
Rehash: nasize 512, numusehash 1, totaluse 514, tru count 513
Rehash: nasize 1024, numusehash 1, totaluse 1026, tru count 1025
Rehash: nasize 2048, numusehash 1, totaluse 2050, tru count 2049
Rehash: nasize 4096, numusehash 1, totaluse 4098, tru count 4097
Rehash: nasize 8192, numusehash 1, totaluse 8194, tru count 8193
Rehash: nasize 16384, numusehash 1, totaluse 16386, tru count 16385
Rehash: nasize 32768, numusehash 1, totaluse 32770, tru count 32769
Rehash: nasize 65536, numusehash 1, totaluse 65538, tru count 65537
Rehash: nasize 131072, numusehash 1, totaluse 131074, tru count 131073
Rehash: nasize 262144, numusehash 1, totaluse 262146, tru count 262145
Rehash: nasize 524288, numusehash 1, totaluse 524290, tru count 524289
1000000x t[i] =i 228ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i] =-i
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 1, numusehash 1, totaluse 3, tru count 2
Rehash: nasize 2, numusehash 1, totaluse 4, tru count 3
Rehash: nasize 4, numusehash 1, totaluse 6, tru count 5
Rehash: nasize 8, numusehash 1, totaluse 10, tru count 9
Rehash: nasize 16, numusehash 1, totaluse 18, tru count 17
Rehash: nasize 32, numusehash 1, totaluse 34, tru count 33
Rehash: nasize 64, numusehash 1, totaluse 66, tru count 65
Rehash: nasize 128, numusehash 1, totaluse 130, tru count 129
Rehash: nasize 256, numusehash 1, totaluse 258, tru count 257
Rehash: nasize 512, numusehash 1, totaluse 514, tru count 513
Rehash: nasize 1024, numusehash 1, totaluse 1026, tru count 1025
Rehash: nasize 2048, numusehash 1, totaluse 2050, tru count 2049
Rehash: nasize 4096, numusehash 1, totaluse 4098, tru count 4097
Rehash: nasize 8192, numusehash 1, totaluse 8194, tru count 8193
Rehash: nasize 16384, numusehash 1, totaluse 16386, tru count 16385
Rehash: nasize 32768, numusehash 1, totaluse 32770, tru count 32769
Rehash: nasize 65536, numusehash 1, totaluse 65538, tru count 65537
Rehash: nasize 131072, numusehash 1, totaluse 131074, tru count 131073
Rehash: nasize 262144, numusehash 1, totaluse 262146, tru count 262145
Rehash: nasize 524288, numusehash 1, totaluse 524290, tru count 524289
1000000x t[i] =-i 231ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[1]='a'
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
1000000x t[1]='a' 221ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i]='a'
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 1, numusehash 1, totaluse 3, tru count 2
Rehash: nasize 2, numusehash 1, totaluse 4, tru count 3
Rehash: nasize 4, numusehash 1, totaluse 6, tru count 5
Rehash: nasize 8, numusehash 1, totaluse 10, tru count 9
Rehash: nasize 16, numusehash 1, totaluse 18, tru count 17
Rehash: nasize 32, numusehash 1, totaluse 34, tru count 33
Rehash: nasize 64, numusehash 1, totaluse 66, tru count 65
Rehash: nasize 128, numusehash 1, totaluse 130, tru count 129
Rehash: nasize 256, numusehash 1, totaluse 258, tru count 257
Rehash: nasize 512, numusehash 1, totaluse 514, tru count 513
Rehash: nasize 1024, numusehash 1, totaluse 1026, tru count 1025
Rehash: nasize 2048, numusehash 1, totaluse 2050, tru count 2049
Rehash: nasize 4096, numusehash 1, totaluse 4098, tru count 4097
Rehash: nasize 8192, numusehash 1, totaluse 8194, tru count 8193
Rehash: nasize 16384, numusehash 1, totaluse 16386, tru count 16385
Rehash: nasize 32768, numusehash 1, totaluse 32770, tru count 32769
Rehash: nasize 65536, numusehash 1, totaluse 65538, tru count 65537
Rehash: nasize 131072, numusehash 1, totaluse 131074, tru count 131073
Rehash: nasize 262144, numusehash 1, totaluse 262146, tru count 262145
Rehash: nasize 524288, numusehash 1, totaluse 524290, tru count 524289
1000000x t[i]='a' 230ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[-i]='a'
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 2, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 4, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 8, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 16, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 32, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 64, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 128, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 256, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 512, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 1024, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 2048, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 4096, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 8192, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 16384, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 32768, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 65536, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 131072, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 262144, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 524288, totaluse 524289, tru count 524288
1000000x t[-i]='a' 546ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[1]={}
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
1000000x t[1]={} 632ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i]={}
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 1, numusehash 1, totaluse 3, tru count 2
Rehash: nasize 2, numusehash 1, totaluse 4, tru count 3
Rehash: nasize 4, numusehash 1, totaluse 6, tru count 5
Rehash: nasize 8, numusehash 1, totaluse 10, tru count 9
Rehash: nasize 16, numusehash 1, totaluse 18, tru count 17
Rehash: nasize 32, numusehash 1, totaluse 34, tru count 33
Rehash: nasize 64, numusehash 1, totaluse 66, tru count 65
Rehash: nasize 128, numusehash 1, totaluse 130, tru count 129
Rehash: nasize 256, numusehash 1, totaluse 258, tru count 257
Rehash: nasize 512, numusehash 1, totaluse 514, tru count 513
Rehash: nasize 1024, numusehash 1, totaluse 1026, tru count 1025
Rehash: nasize 2048, numusehash 1, totaluse 2050, tru count 2049
Rehash: nasize 4096, numusehash 1, totaluse 4098, tru count 4097
Rehash: nasize 8192, numusehash 1, totaluse 8194, tru count 8193
Rehash: nasize 16384, numusehash 1, totaluse 16386, tru count 16385
Rehash: nasize 32768, numusehash 1, totaluse 32770, tru count 32769
Rehash: nasize 65536, numusehash 1, totaluse 65538, tru count 65537
Rehash: nasize 131072, numusehash 1, totaluse 131074, tru count 131073
Rehash: nasize 262144, numusehash 1, totaluse 262146, tru count 262145
Rehash: nasize 524288, numusehash 1, totaluse 524290, tru count 524289
1000000x t[i]={} 529ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[-i]={}
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 2, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 4, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 8, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 16, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 32, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 64, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 128, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 256, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 512, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 1024, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 2048, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 4096, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 8192, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 16384, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 32768, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 65536, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 131072, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 262144, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 524288, totaluse 524289, tru count 524288
1000000x t[-i]={} 891ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[1M-i]=1
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 0, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 0, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 0, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 0, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 0, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 0, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 0, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 0, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 0, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 0, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 0, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 0, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 0, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 0, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 0, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 0, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 0, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 0, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 0, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 0, totaluse 524289, tru count 524288
1000000x t[1M-i]=1 399ns per operation
-----------------------------------------------------


Here is the source of the benchmark:

function printf(...)
io.write(string.format(...))
end

function nanosecs_per_cycle(time, cycles)
return time * 1000000000 / cycles
end

local t = {}

local function measure(prompt, run)

t = {}
local clock = os.clock
local cycles = 1000000

print("-----------------------------------------------------")
printf("%dx %-12s\n", cycles, prompt)
print("-----------------------------------------------------")

local i = 0
local tm = clock()

while(i < cycles) do run(i); i = i + 1 end

tm = clock() - tm

printf("%dx %-12s %10.0fns per operation\n", cycles, prompt,
nanosecs_per_cycle(tm, cycles))
print("-----------------------------------------------------")

end

if(_PATCH) then print(_PATCH) else print(_VERSION .. ' official') end
measure("t[1] =1", function(x) t[1] = 1 end)
measure("t[i] =1", function(i) t[i] = 1 end)
measure("t[-i]=1", function(i) t[-i] = 1 end)
measure("t[i] =i", function(i) t[i] = i end)
measure("t[i] =-i", function(i) t[i] = -i end)
measure("t[1]='a'", function(i) t[1] = 'a' end)
measure("t[i]='a'", function(i) t[i] = 'a' end)
measure("t[-i]='a'", function(i) t[-i] = 'a' end)
measure("t[1]={}", function(i) t[1] = {} end)
measure("t[i]={}", function(i) t[i] = {} end)
measure("t[-i]={}", function(i) t[-i] = {} end)
measure("t[1M-i]=1", function(i) t[1000000-i] = 1 end)



And this is the rehash function that the printf() where inserted into,
which of course slows things down:

ltable.c:334 (Lua 5.1.4-2, tru count patched)

static void rehash (lua_State *L, Table *t, const TValue *ek) {
int nasize, na;
int nums[MAXBITS+1]; /* nums[i] = number of keys between 2^(i-1)
and 2^i */
int i;
int totaluse;
for (i=0; i<=MAXBITS; i++) nums[i] = 0; /* reset counts */
nasize = numusearray(t, nums); /* count keys in array part */
printf("Rehash: nasize %d, ", nasize);
totaluse = na /* !!! */ = nasize; /* all those keys are integer
keys */
totaluse += numusehash(t, nums, &nasize); /* count keys in hash
part */
printf("numusehash %d, ", totaluse - na); /* nasize changed by
numusehash sometimes */
/* count extra key */
nasize += countint(ek, nums);
totaluse++;
printf("totaluse %d, ", totaluse);
printf("tru count %ld\n", t->count);
/* compute new size for array part */
na = computesizes(nums, &nasize);
/* resize the table to new computed sizes */
resize(L, t, nasize, totaluse - na);
}

************************************************************************

************************************************************************

It's also interesting to see how things change when I behave better
and start counting the i in [i] at 1, not 0 as above:

Basically 'the hash stays clean':

Lua 5.1.4-2 tru count 0.2
-----------------------------------------------------
1000000x t[1] =1
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
1000000x t[1] =1 213ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i] =1
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 1, numusehash 0, totaluse 2, tru count 1
Rehash: nasize 2, numusehash 0, totaluse 3, tru count 2
Rehash: nasize 4, numusehash 0, totaluse 5, tru count 4
Rehash: nasize 8, numusehash 0, totaluse 9, tru count 8
Rehash: nasize 16, numusehash 0, totaluse 17, tru count 16
Rehash: nasize 32, numusehash 0, totaluse 33, tru count 32
Rehash: nasize 64, numusehash 0, totaluse 65, tru count 64
Rehash: nasize 128, numusehash 0, totaluse 129, tru count 128
Rehash: nasize 256, numusehash 0, totaluse 257, tru count 256
Rehash: nasize 512, numusehash 0, totaluse 513, tru count 512
Rehash: nasize 1024, numusehash 0, totaluse 1025, tru count 1024
Rehash: nasize 2048, numusehash 0, totaluse 2049, tru count 2048
Rehash: nasize 4096, numusehash 0, totaluse 4097, tru count 4096
Rehash: nasize 8192, numusehash 0, totaluse 8193, tru count 8192
Rehash: nasize 16384, numusehash 0, totaluse 16385, tru count 16384
Rehash: nasize 32768, numusehash 0, totaluse 32769, tru count 32768
Rehash: nasize 65536, numusehash 0, totaluse 65537, tru count 65536
Rehash: nasize 131072, numusehash 0, totaluse 131073, tru count 131072
Rehash: nasize 262144, numusehash 0, totaluse 262145, tru count 262144
Rehash: nasize 524288, numusehash 0, totaluse 524289, tru count 524288
1000000x t[i] =1 217ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[-i]=1
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 2, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 4, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 8, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 16, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 32, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 64, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 128, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 256, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 512, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 1024, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 2048, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 4096, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 8192, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 16384, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 32768, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 65536, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 131072, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 262144, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 524288, totaluse 524289, tru count 524288
1000000x t[-i]=1 528ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i] =i
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 1, numusehash 0, totaluse 2, tru count 1
Rehash: nasize 2, numusehash 0, totaluse 3, tru count 2
Rehash: nasize 4, numusehash 0, totaluse 5, tru count 4
Rehash: nasize 8, numusehash 0, totaluse 9, tru count 8
Rehash: nasize 16, numusehash 0, totaluse 17, tru count 16
Rehash: nasize 32, numusehash 0, totaluse 33, tru count 32
Rehash: nasize 64, numusehash 0, totaluse 65, tru count 64
Rehash: nasize 128, numusehash 0, totaluse 129, tru count 128
Rehash: nasize 256, numusehash 0, totaluse 257, tru count 256
Rehash: nasize 512, numusehash 0, totaluse 513, tru count 512
Rehash: nasize 1024, numusehash 0, totaluse 1025, tru count 1024
Rehash: nasize 2048, numusehash 0, totaluse 2049, tru count 2048
Rehash: nasize 4096, numusehash 0, totaluse 4097, tru count 4096
Rehash: nasize 8192, numusehash 0, totaluse 8193, tru count 8192
Rehash: nasize 16384, numusehash 0, totaluse 16385, tru count 16384
Rehash: nasize 32768, numusehash 0, totaluse 32769, tru count 32768
Rehash: nasize 65536, numusehash 0, totaluse 65537, tru count 65536
Rehash: nasize 131072, numusehash 0, totaluse 131073, tru count 131072
Rehash: nasize 262144, numusehash 0, totaluse 262145, tru count 262144
Rehash: nasize 524288, numusehash 0, totaluse 524289, tru count 524288
1000000x t[i] =i 218ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i] =-i
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 1, numusehash 0, totaluse 2, tru count 1
Rehash: nasize 2, numusehash 0, totaluse 3, tru count 2
Rehash: nasize 4, numusehash 0, totaluse 5, tru count 4
Rehash: nasize 8, numusehash 0, totaluse 9, tru count 8
Rehash: nasize 16, numusehash 0, totaluse 17, tru count 16
Rehash: nasize 32, numusehash 0, totaluse 33, tru count 32
Rehash: nasize 64, numusehash 0, totaluse 65, tru count 64
Rehash: nasize 128, numusehash 0, totaluse 129, tru count 128
Rehash: nasize 256, numusehash 0, totaluse 257, tru count 256
Rehash: nasize 512, numusehash 0, totaluse 513, tru count 512
Rehash: nasize 1024, numusehash 0, totaluse 1025, tru count 1024
Rehash: nasize 2048, numusehash 0, totaluse 2049, tru count 2048
Rehash: nasize 4096, numusehash 0, totaluse 4097, tru count 4096
Rehash: nasize 8192, numusehash 0, totaluse 8193, tru count 8192
Rehash: nasize 16384, numusehash 0, totaluse 16385, tru count 16384
Rehash: nasize 32768, numusehash 0, totaluse 32769, tru count 32768
Rehash: nasize 65536, numusehash 0, totaluse 65537, tru count 65536
Rehash: nasize 131072, numusehash 0, totaluse 131073, tru count 131072
Rehash: nasize 262144, numusehash 0, totaluse 262145, tru count 262144
Rehash: nasize 524288, numusehash 0, totaluse 524289, tru count 524288
1000000x t[i] =-i 229ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[1]='a'
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
1000000x t[1]='a' 215ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i]='a'
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 1, numusehash 0, totaluse 2, tru count 1
Rehash: nasize 2, numusehash 0, totaluse 3, tru count 2
Rehash: nasize 4, numusehash 0, totaluse 5, tru count 4
Rehash: nasize 8, numusehash 0, totaluse 9, tru count 8
Rehash: nasize 16, numusehash 0, totaluse 17, tru count 16
Rehash: nasize 32, numusehash 0, totaluse 33, tru count 32
Rehash: nasize 64, numusehash 0, totaluse 65, tru count 64
Rehash: nasize 128, numusehash 0, totaluse 129, tru count 128
Rehash: nasize 256, numusehash 0, totaluse 257, tru count 256
Rehash: nasize 512, numusehash 0, totaluse 513, tru count 512
Rehash: nasize 1024, numusehash 0, totaluse 1025, tru count 1024
Rehash: nasize 2048, numusehash 0, totaluse 2049, tru count 2048
Rehash: nasize 4096, numusehash 0, totaluse 4097, tru count 4096
Rehash: nasize 8192, numusehash 0, totaluse 8193, tru count 8192
Rehash: nasize 16384, numusehash 0, totaluse 16385, tru count 16384
Rehash: nasize 32768, numusehash 0, totaluse 32769, tru count 32768
Rehash: nasize 65536, numusehash 0, totaluse 65537, tru count 65536
Rehash: nasize 131072, numusehash 0, totaluse 131073, tru count 131072
Rehash: nasize 262144, numusehash 0, totaluse 262145, tru count 262144
Rehash: nasize 524288, numusehash 0, totaluse 524289, tru count 524288
1000000x t[i]='a' 223ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[-i]='a'
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 2, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 4, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 8, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 16, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 32, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 64, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 128, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 256, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 512, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 1024, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 2048, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 4096, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 8192, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 16384, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 32768, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 65536, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 131072, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 262144, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 524288, totaluse 524289, tru count 524288
1000000x t[-i]='a' 528ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[1]={}
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
1000000x t[1]={} 635ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[i]={}
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 1, numusehash 0, totaluse 2, tru count 1
Rehash: nasize 2, numusehash 0, totaluse 3, tru count 2
Rehash: nasize 4, numusehash 0, totaluse 5, tru count 4
Rehash: nasize 8, numusehash 0, totaluse 9, tru count 8
Rehash: nasize 16, numusehash 0, totaluse 17, tru count 16
Rehash: nasize 32, numusehash 0, totaluse 33, tru count 32
Rehash: nasize 64, numusehash 0, totaluse 65, tru count 64
Rehash: nasize 128, numusehash 0, totaluse 129, tru count 128
Rehash: nasize 256, numusehash 0, totaluse 257, tru count 256
Rehash: nasize 512, numusehash 0, totaluse 513, tru count 512
Rehash: nasize 1024, numusehash 0, totaluse 1025, tru count 1024
Rehash: nasize 2048, numusehash 0, totaluse 2049, tru count 2048
Rehash: nasize 4096, numusehash 0, totaluse 4097, tru count 4096
Rehash: nasize 8192, numusehash 0, totaluse 8193, tru count 8192
Rehash: nasize 16384, numusehash 0, totaluse 16385, tru count 16384
Rehash: nasize 32768, numusehash 0, totaluse 32769, tru count 32768
Rehash: nasize 65536, numusehash 0, totaluse 65537, tru count 65536
Rehash: nasize 131072, numusehash 0, totaluse 131073, tru count 131072
Rehash: nasize 262144, numusehash 0, totaluse 262145, tru count 262144
Rehash: nasize 524288, numusehash 0, totaluse 524289, tru count 524288
1000000x t[i]={} 521ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[-i]={}
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 1, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 2, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 4, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 8, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 16, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 32, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 64, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 128, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 256, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 512, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 1024, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 2048, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 4096, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 8192, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 16384, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 32768, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 65536, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 131072, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 262144, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 524288, totaluse 524289, tru count 524288
1000000x t[-i]={} 881ns per operation
-----------------------------------------------------
-----------------------------------------------------
1000000x t[1M-i]=1
-----------------------------------------------------
Rehash: nasize 0, numusehash 0, totaluse 1, tru count 0
Rehash: nasize 0, numusehash 0, totaluse 2, tru count 1
Rehash: nasize 0, numusehash 0, totaluse 3, tru count 2
Rehash: nasize 0, numusehash 0, totaluse 5, tru count 4
Rehash: nasize 0, numusehash 0, totaluse 9, tru count 8
Rehash: nasize 0, numusehash 0, totaluse 17, tru count 16
Rehash: nasize 0, numusehash 0, totaluse 33, tru count 32
Rehash: nasize 0, numusehash 0, totaluse 65, tru count 64
Rehash: nasize 0, numusehash 0, totaluse 129, tru count 128
Rehash: nasize 0, numusehash 0, totaluse 257, tru count 256
Rehash: nasize 0, numusehash 0, totaluse 513, tru count 512
Rehash: nasize 0, numusehash 0, totaluse 1025, tru count 1024
Rehash: nasize 0, numusehash 0, totaluse 2049, tru count 2048
Rehash: nasize 0, numusehash 0, totaluse 4097, tru count 4096
Rehash: nasize 0, numusehash 0, totaluse 8193, tru count 8192
Rehash: nasize 0, numusehash 0, totaluse 16385, tru count 16384
Rehash: nasize 0, numusehash 0, totaluse 32769, tru count 32768
Rehash: nasize 0, numusehash 0, totaluse 65537, tru count 65536
Rehash: nasize 0, numusehash 0, totaluse 131073, tru count 131072
Rehash: nasize 0, numusehash 0, totaluse 262145, tru count 262144
Rehash: nasize 0, numusehash 0, totaluse 524289, tru count 524288
Rehash: nasize 999999, numusehash 0, totaluse 1000000, tru count
999999
1000000x t[1M-i]=1 389ns per operation
-----------------------------------------------------

Leo Razoumov

unread,
Jan 29, 2011, 2:48:22 PM1/29/11
to lua-table...@googlegroups.com
On Fri, Jan 28, 2011 at 12:31, Henning Diedrich <hd2...@eonblast.com> wrote:
> Hi list,
>
> I inspected the Lua core a little deeper, looking for ways that the
> 'tru count' could be used to create time savings for ordinary cases.
> Rehashes are not that rare and during every rehash the actual true
> count of elements in the table is counted, in the core, by traversing
> all elements. But on the way, the number of actual integer keys is
> also counted - you will remember that some of them also end up stored
> in the hash part, instead of in the array.
>
> If I get it right, the algorithm used to allocate space for the array
> part, with space to grow, can become quite courageous. A lot of bytes
> can get allocated, never to be used, once the table reaches a certain
> size. I am not sure Lua can claim to be that focussed on a little foot
> print and memory savings there ...
>

Most modern dynamic languages (Lua, Python, Ruby) are taking
hysteresis approach to grow/shrink memory used by arrays in order to
avoid excessive alloc/free cycles when new elements get added (grow)
then removed (shrink) then added again ...
So it is customary to grow a dynamic array by factor of two beyond its
new size in anticipation that it will grow further. Similarly,
shrinking of an dynamic array does not happen right away. It waits
until, let say, half the slots are empty and only then shrinks.
Also, to minimize collisions, a hash table should be sparsely
populated. Normally you have 2 to 4 more slots in a hash table that
its actual size (load factor between 1/4 and 1/2 are considered
normal). Such allocation/deallocation policy allows for O(log) number
of alloc/free calls and wastes not more than a constant fraction of
minimal memory needed. So far these algorithms work just fine most of
the time.

The real problem I have with Lua API is that it does not give me
tools to manage grow/shrin table memory manually.
I often know ahead of time how big my tables will be. I can have
arrays with couple of million indexes or hash tables with a million
keys (x86_64 really helps here). I would like to preallocate tables
with the exact size. Uncontrollable memory allocation overhead can
exhaust all the remaining physical memory I have. For this very
purpose tru count is a very welcome addition.

--Leo--

Henning Diedrich

unread,
Mar 3, 2011, 6:33:18 PM3/3/11
to lua table semantics
> The real problem I have with  Lua API is that it does not give me
> tools to manage grow/shrin table memory manually.
> --Leo--

Hi Leo,
it should be very possible to enhance the API in that way ... I am not
immediately positive, there might be some implications or implicit
garbage collection related issues.

Henning
Reply all
Reply to author
Forward
0 new messages