[Python-Dev] More compact dictionaries with faster iteration

Raymond Hettinger

unread,

Dec 9, 2012, 8:44:30 PM12/9/12

to Python Dev

The current memory layout for dictionaries is
unnecessarily inefficient. It has a sparse table of
24-byte entries containing the hash value, key pointer,
and value pointer.

Instead, the 24-byte entries should be stored in a
dense table referenced by a sparse table of indices.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]

Only the data layout needs to change. The hash table
algorithms would stay the same. All of the current
optimizations would be kept, including key-sharing
dicts and custom lookup functions for string-only
dicts. There is no change to the hash functions, the
table search order, or collision statistics.

The memory savings are significant (from 30% to 95%
compression depending on the how full the table is).
Small dicts (size 0, 1, or 2) get the most benefit.

For a sparse table of size t with n entries, the sizes are:

curr_size = 24 * t
new_size = 24 * n + sizeof(index) * t

In the above timmy/barry/guido example, the current
size is 192 bytes (eight 24-byte entries) and the new
size is 80 bytes (three 24-byte entries plus eight
1-byte indices). That gives 58% compression.

Note, the sizeof(index) can be as small as a single
byte for small dicts, two bytes for bigger dicts and
up to sizeof(Py_ssize_t) for huge dict.

In addition to space savings, the new memory layout
makes iteration faster. Currently, keys(), values, and
items() loop over the sparse table, skipping-over free
slots in the hash table. Now, keys/values/items can
loop directly over the dense table, using fewer memory
accesses.

Another benefit is that resizing is faster and
touches fewer pieces of memory. Currently, every
hash/key/value entry is moved or copied during a
resize. In the new layout, only the indices are
updated. For the most part, the hash/key/value entries
never move (except for an occasional swap to fill a
hole left by a deletion).

With the reduced memory footprint, we can also expect
better cache utilization.

For those wanting to experiment with the design,
there is a pure Python proof-of-concept here:

http://code.activestate.com/recipes/578375

YMMV: Keep in mind that the above size statics assume a
build with 64-bit Py_ssize_t and 64-bit pointers. The
space savings percentages are a bit different on other
builds. Also, note that in many applications, the size
of the data dominates the size of the container (i.e.
the weight of a bucket of water is mostly the water,
not the bucket).

Raymond
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

MRAB

unread,

Dec 9, 2012, 10:03:37 PM12/9/12

to python-dev

What happens when a key is deleted? Will that leave an empty slot in
the entry array? If so, will the array be compacted if the proportion
of entries grows beyond a certain limit?

Adding a key would involve simply appending to the entry array (filling
the last empty slot), but if there could be empty slots in other
places, would it look for the first?

Actually, as I think about it, you could form a linked list (using the
'hash' field) of those empty slots that aren't part of the final
contiguous block and fill those preferentially.

Raymond Hettinger

unread,

Dec 9, 2012, 10:30:22 PM12/9/12

to python-dev

On Dec 9, 2012, at 10:03 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:

What happens when a key is deleted? Will that leave an empty slot in
the entry array?

Yes. See the __delitem__() method in the pure python implemention

at http://code.activestate.com/recipes/578375

If so, will the array be compacted if the proportion
of entries grows beyond a certain limit?

Yes. Compaction happens during resizing() which triggers when

the dict reaches two-thirds full (including dummy entries).

See the __setitem__() method in the pure python implementation.

Adding a key would involve simply appending to the entry array (filling
the last empty slot), but if there could be empty slots in other
places, would it look for the first?

Yes. See the _next_open_index() helper method in the pure python implemention.

Actually, as I think about it, you could form a linked list (using the
'hash' field) of those empty slots that aren't part of the final
contiguous block and fill those preferentially.

That's the plan. See the comment above the keylist.index(UNUSED)

line in the _next_open_index() method in the pure python implementation.

Raymond

Gregory P. Smith

unread,

Dec 10, 2012, 1:38:19 AM12/10/12

to Raymond Hettinger, Python Dev

+1 I like it. The plethora of 64-bit NULL values we cart around in our internals bothers me, this gets rid of some.

The worst case of ~30% less memory consumed for the bucket (ignoring the water) is still decent. For a program with a lot of instances of small to medium sized objects I'd expect a measurable win.

I'm interested in seeing performance and memory footprint numbers from a C implementation of this.

-gps

Christian Heimes

unread,

Dec 10, 2012, 2:48:02 AM12/10/12

to Raymond Hettinger, Python Dev

Hello Raymond

Am 10.12.2012 02:44, schrieb Raymond Hettinger:
> Another benefit is that resizing is faster and
> touches fewer pieces of memory. Currently, every
> hash/key/value entry is moved or copied during a
> resize. In the new layout, only the indices are
> updated. For the most part, the hash/key/value entries
> never move (except for an occasional swap to fill a
> hole left by a deletion).
>
> With the reduced memory footprint, we can also expect
> better cache utilization.

On the other hand every lookup and collision checks needs an additional
multiplication, addition and pointer dereferencing:

entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx

It's hard to predict how the extra CPU instructions are going to affect
the performance on different CPUs and scenarios. My guts tell me that
your proposal is going to perform better on modern CPUs with lots of L1
and L2 cache and in an average case scenario. A worst case scenario with
lots of collisions might be measurable slower for large dicts and small
CPU cache.

But this is pure speculation and my guts could be terrible wrong. :) I'm
+1 to give it a shot.

Good luck!
Christian

Serhiy Storchaka

unread,

Dec 10, 2012, 4:02:26 AM12/10/12

to pytho...@python.org

On 10.12.12 05:30, Raymond Hettinger wrote:
> On Dec 9, 2012, at 10:03 PM, MRAB <pyt...@mrabarnett.plus.com

> <mailto:pyt...@mrabarnett.plus.com>> wrote:
>> What happens when a key is deleted? Will that leave an empty slot in
>> the entry array?
> Yes. See the __delitem__() method in the pure python implemention
> at http://code.activestate.com/recipes/578375

You can move the last entry on freed place. This will preserve
compactness of entries array and simplify and speedup iterations and
some other operations.

def __delitem__(self, key, hashvalue=None):
if hashvalue is None:
hashvalue = hash(key)
found, i = self._lookup(key, hashvalue)
if found:
index = self.indices[i]
self.indices[i] = DUMMY
self.size -= 1
if index != size:
lasthash = self.hashlist[index]
lastkey = self.keylist[index]
found, j = self._lookup(lastkey, lasthash)
assert found
assert i != j
self.indices[j] = index
self.hashlist[index] = lasthash
self.keylist[index] = lastkey
self.valuelist[index] = self.valuelist[size]
index = size
self.hashlist[index] = UNUSED
self.keylist[index] = UNUSED
self.valuelist[index] = UNUSED
else:
raise KeyError(key)

Serhiy Storchaka

unread,

Dec 10, 2012, 4:12:27 AM12/10/12

to pytho...@python.org

On 10.12.12 09:48, Christian Heimes wrote:
> On the other hand every lookup and collision checks needs an additional
> multiplication, addition and pointer dereferencing:
>
> entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx

I think that main slowdown will be in getting index from array with
variable size of elements. It requires conditional jump which is not
such cheap as additional or shifting.

switch (self->index_size) {
case 1: idx = ((uint8_t *)self->indices)[i]; break;
case 2: idx = ((uint16_t *)self->indices)[i]; break;
...
}

I think that variable-size indices don't worth effort.

Mark Shannon

unread,

Dec 10, 2012, 4:06:51 AM12/10/12

to pytho...@python.org

On 10/12/12 01:44, Raymond Hettinger wrote:
> The current memory layout for dictionaries is
> unnecessarily inefficient. It has a sparse table of
> 24-byte entries containing the hash value, key pointer,
> and value pointer.
>
> Instead, the 24-byte entries should be stored in a
> dense table referenced by a sparse table of indices.

What minimum size and resizing factor do you propose for the entries array?

Cheers,
Mark.

Armin Rigo

unread,

Dec 10, 2012, 4:40:30 AM12/10/12

to Raymond Hettinger, Python Dev

Hi Raymond,

On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger
<raymond....@gmail.com> wrote:
> Instead, the data should be organized as follows:
>
> indices = [None, 1, None, None, None, 0, None, 2]
> entries = [[-9092791511155847987, 'timmy', 'red'],
> [-8522787127447073495, 'barry', 'green'],
> [-6480567542315338377, 'guido', 'blue']]

As a side note, your suggestion also enables order-preserving
dictionaries: iter() would automatically yield items in the order they
were inserted, as long as there was no deletion. People will
immediately start relying on this "feature"... and be confused by the
behavior of deletion. :-/

A bientôt,

Armin.

Alexey Kachayev

unread,

Dec 10, 2012, 5:01:53 AM12/10/12

to Armin Rigo, Raymond Hettinger, Python Dev

Hi!

2012/12/10 Armin Rigo <ar...@tunes.org>

Hi Raymond,

On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger
<raymond....@gmail.com> wrote:
> Instead, the data should be organized as follows:
>
> indices = [None, 1, None, None, None, 0, None, 2]
> entries = [[-9092791511155847987, 'timmy', 'red'],
> [-8522787127447073495, 'barry', 'green'],
> [-6480567542315338377, 'guido', 'blue']]

As a side note, your suggestion also enables order-preserving
dictionaries: iter() would automatically yield items in the order they
were inserted, as long as there was no deletion. People will
immediately start relying on this "feature"... and be confused by the
behavior of deletion. :-/

I'm not sure about "relying" cause currently Python supports this feature only with OrderedDict object. So it's common for python developers do not rely on inserting ordering when using generic dict.

A bientôt,

Armin.

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: http://mail.python.org/mailman/options/python-dev/kachayev%40gmail.com

--
Kind regards,
Alexey S. Kachayev,

CTO at Kitapps Inc.

----------
http://codemehanika.org
Skype: kachayev
Tel: +380-996692092

Barry Warsaw

unread,

Dec 10, 2012, 10:28:28 AM12/10/12

to pytho...@python.org

On Dec 10, 2012, at 08:48 AM, Christian Heimes wrote:

>It's hard to predict how the extra CPU instructions are going to affect
>the performance on different CPUs and scenarios. My guts tell me that
>your proposal is going to perform better on modern CPUs with lots of L1
>and L2 cache and in an average case scenario. A worst case scenario with
>lots of collisions might be measurable slower for large dicts and small
>CPU cache.

I'd be interested to see what effect this has on memory constrained platforms
such as many current ARM applications (mostly likely 32bit for now). Python's
memory consumption is an overheard complaint for folks developing for those
platforms.

Cheers,
-Barry

Antoine Pitrou

unread,

Dec 10, 2012, 10:48:45 AM12/10/12

to pytho...@python.org

Le Mon, 10 Dec 2012 10:40:30 +0100,
Armin Rigo <ar...@tunes.org> a écrit :

> Hi Raymond,
>
> On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger
> <raymond....@gmail.com> wrote:
> > Instead, the data should be organized as follows:
> >
> > indices = [None, 1, None, None, None, 0, None, 2]
> > entries = [[-9092791511155847987, 'timmy', 'red'],
> > [-8522787127447073495, 'barry', 'green'],
> > [-6480567542315338377, 'guido', 'blue']]
>
> As a side note, your suggestion also enables order-preserving
> dictionaries: iter() would automatically yield items in the order they
> were inserted, as long as there was no deletion. People will
> immediately start relying on this "feature"... and be confused by the
> behavior of deletion. :-/

If that's really an issue, we can deliberately scramble the iteration
order a bit :-) (of course it might negatively impact HW prefetching)

Regards

Antoine.

Steven D'Aprano

unread,

Dec 10, 2012, 11:15:11 AM12/10/12

to pytho...@python.org

On 10/12/12 20:40, Armin Rigo wrote:

> As a side note, your suggestion also enables order-preserving
> dictionaries: iter() would automatically yield items in the order they
> were inserted, as long as there was no deletion. People will
> immediately start relying on this "feature"... and be confused by the
> behavior of deletion. :-/

If we want to avoid the attractive nuisance of iteration order being
almost, but not quite, order-preserving, there is a simple fix: when
iterating over the dict, instead of starting at the start of the table,
start at some arbitrary point in the middle and wrap around. That will
increase the cost of iteration slightly, but avoid misleading behaviour.

I think all we need do is change the existing __iter__ method from this:

def __iter__(self):
for key in self.keylist:
if key is not UNUSED:
yield key

to this:

# Untested!
def __iter__(self):
# Choose an arbitrary starting point.
# 103 is chosen because it is prime.
n = (103 % len(self.keylist)) if self.keylist else 0
for key in self.keylist[n:] + self.keylist[:n]:
# I presume the C version will not duplicate the keylist.
if key is not UNUSED:
yield key

This mixes the order of iteration up somewhat, just enough to avoid
misleading people into thinking that this is order-preserving. The
order will depend on the size of the dict, not the history. For
example, with keys a, b, c, ... added in that order, the output is:

len=1 key=a
len=2 key=ba
len=3 key=bca
len=4 key=dabc
len=5 key=deabc
len=6 key=bcdefa
len=7 key=fgabcde
len=8 key=habcdefg
len=9 key=efghiabcd

which I expect is mixed up enough to discourage programmers from
jumping to conclusions about dicts having guaranteed order.

--
Steven

PJ Eby

unread,

Dec 10, 2012, 11:16:49 AM12/10/12

to Antoine Pitrou, pytho...@python.org

On Mon, Dec 10, 2012 at 10:48 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> Le Mon, 10 Dec 2012 10:40:30 +0100,
> Armin Rigo <ar...@tunes.org> a écrit :
>> Hi Raymond,
>>
>> On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger
>> <raymond....@gmail.com> wrote:
>> > Instead, the data should be organized as follows:
>> >
>> > indices = [None, 1, None, None, None, 0, None, 2]
>> > entries = [[-9092791511155847987, 'timmy', 'red'],
>> > [-8522787127447073495, 'barry', 'green'],
>> > [-6480567542315338377, 'guido', 'blue']]
>>
>> As a side note, your suggestion also enables order-preserving
>> dictionaries: iter() would automatically yield items in the order they
>> were inserted, as long as there was no deletion. People will
>> immediately start relying on this "feature"... and be confused by the
>> behavior of deletion. :-/
>
> If that's really an issue, we can deliberately scramble the iteration
> order a bit :-) (of course it might negatively impact HW prefetching)

On the other hand, this would also make a fast ordered dictionary
subclass possible, just by not using the free list for additions,
combined with periodic compaction before adds or after deletes.

Antoine Pitrou

unread,

Dec 10, 2012, 11:22:52 AM12/10/12

to pytho...@python.org

Le Mon, 10 Dec 2012 11:16:49 -0500,
PJ Eby <p...@telecommunity.com> a écrit :

> On Mon, Dec 10, 2012 at 10:48 AM, Antoine Pitrou
> <soli...@pitrou.net> wrote:
> > Le Mon, 10 Dec 2012 10:40:30 +0100,
> > Armin Rigo <ar...@tunes.org> a écrit :
> >> Hi Raymond,
> >>
> >> On Mon, Dec 10, 2012 at 2:44 AM, Raymond Hettinger
> >> <raymond....@gmail.com> wrote:
> >> > Instead, the data should be organized as follows:
> >> >
> >> > indices = [None, 1, None, None, None, 0, None, 2]
> >> > entries = [[-9092791511155847987, 'timmy', 'red'],
> >> > [-8522787127447073495, 'barry', 'green'],
> >> > [-6480567542315338377, 'guido', 'blue']]
> >>
> >> As a side note, your suggestion also enables order-preserving
> >> dictionaries: iter() would automatically yield items in the order
> >> they were inserted, as long as there was no deletion. People will
> >> immediately start relying on this "feature"... and be confused by
> >> the behavior of deletion. :-/
> >
> > If that's really an issue, we can deliberately scramble the
> > iteration order a bit :-) (of course it might negatively impact HW
> > prefetching)
>
> On the other hand, this would also make a fast ordered dictionary
> subclass possible, just by not using the free list for additions,
> combined with periodic compaction before adds or after deletes.

I suspect that's what Raymond has in mind :-)

Regards

Antoine.

Brett Cannon

unread,

Dec 10, 2012, 11:27:45 AM12/10/12

to Barry Warsaw, Kristján Valur Jónsson, python-dev

On Mon, Dec 10, 2012 at 10:28 AM, Barry Warsaw <ba...@python.org> wrote:

On Dec 10, 2012, at 08:48 AM, Christian Heimes wrote:

>It's hard to predict how the extra CPU instructions are going to affect
>the performance on different CPUs and scenarios. My guts tell me that
>your proposal is going to perform better on modern CPUs with lots of L1
>and L2 cache and in an average case scenario. A worst case scenario with
>lots of collisions might be measurable slower for large dicts and small
>CPU cache.

I'd be interested to see what effect this has on memory constrained platforms
such as many current ARM applications (mostly likely 32bit for now). Python's
memory consumption is an overheard complaint for folks developing for those
platforms.

Maybe Kristjan can shed some light on how this would have helped him on the ps3 work he has been doing for Dust 514 as he has had memory issues there.

Christian Heimes

unread,

Dec 10, 2012, 11:39:04 AM12/10/12

to Barry Warsaw, pytho...@python.org

Am 10.12.2012 16:28, schrieb Barry Warsaw:
> I'd be interested to see what effect this has on memory constrained platforms
> such as many current ARM applications (mostly likely 32bit for now). Python's
> memory consumption is an overheard complaint for folks developing for those
> platforms.

I had ARM platforms in mind, too. Unfortunately we don't have any ARM
platforms for testing and performance measurement. Even Trent's
Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi,
that's all.

Perhaps somebody (PSF ?) is willing to donate a couple of ARM boards to
Snakebite. I'm thinking of Raspberry Pi (ARMv6), Pandaboard (ARMv7
Cortex-A9) and similar.

Christian

Kristján Valur Jónsson

unread,

Dec 10, 2012, 11:06:23 AM12/10/12

to Barry Warsaw, pytho...@python.org

Indeed, I had to change the dict tuning parameters to make dicts behave reasonably on the PS3.
Interesting stuff indeed.

Barry Warsaw

unread,

Dec 10, 2012, 11:53:17 AM12/10/12

to Christian Heimes, pytho...@python.org

On Dec 10, 2012, at 05:39 PM, Christian Heimes wrote:

>I had ARM platforms in mind, too. Unfortunately we don't have any ARM
>platforms for testing and performance measurement. Even Trent's
>Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi,
>that's all.

I have a few ARM machines that I can use for testing, though I can't provide
external access to them.

* http://buildbot.python.org/all/buildslaves/warsaw-ubuntu-arm
(Which oops, I see is down. Why did I not get notifications about that?)

* I have a Nexus 7 flashed with Ubuntu 12.10 (soon to be 13.04).

* Pandaboard still sitting in its box. ;)

>Perhaps somebody (PSF ?) is willing to donate a couple of ARM boards to
>Snakebite. I'm thinking of Raspberry Pi (ARMv6), Pandaboard (ARMv7
>Cortex-A9) and similar.

Suitable ARM boards can be had cheap, probably overwhelmed by labor costs of
getting them up and running. I am not offering for *that*. ;)

Cheers,
-Barry

signature.asc

Armin Rigo

unread,

Dec 10, 2012, 1:01:51 PM12/10/12

to PJ Eby, Antoine Pitrou, pytho...@python.org

Hi Philip,

On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <p...@telecommunity.com> wrote:
> On the other hand, this would also make a fast ordered dictionary
> subclass possible, just by not using the free list for additions,
> combined with periodic compaction before adds or after deletes.

Technically, I could see Python switching to ordered dictionaries
everywhere. Raymond's insight suddenly makes it easy for CPython and
PyPy, and at least Jython could use the LinkedHashMap class (although
this would need checking with Jython guys). I'd vaguely argue that
dictionary orders are one of the few non-reproducible factors in a
Python program, so it might be a good thing. But only vaguely ---
maybe I got far too used to random orders over time...

A bientôt,

Armin.

R. David Murray

unread,

Dec 10, 2012, 12:56:38 PM12/10/12

to pytho...@python.org

On Mon, 10 Dec 2012 16:06:23 +0000, kris...@ccpgames.com wrote:
> Indeed, I had to change the dict tuning parameters to make dicts
> behave reasonably on the PS3.
>
> Interesting stuff indeed.

Out of both curiosity and a possible application of my own for the
information, what values did you end up with?

--David

PJ Eby

unread,

Dec 10, 2012, 1:38:52 PM12/10/12

to Armin Rigo, Antoine Pitrou, pytho...@python.org

On Mon, Dec 10, 2012 at 1:01 PM, Armin Rigo <ar...@tunes.org> wrote:
> On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <p...@telecommunity.com> wrote:
>> On the other hand, this would also make a fast ordered dictionary
>> subclass possible, just by not using the free list for additions,
>> combined with periodic compaction before adds or after deletes.
>
> Technically, I could see Python switching to ordered dictionaries
> everywhere. Raymond's insight suddenly makes it easy for CPython and
> PyPy, and at least Jython could use the LinkedHashMap class (although
> this would need checking with Jython guys).

What about IronPython?

Also, note that using ordered dictionaries carries a performance cost
for dictionaries whose keys change a lot. This probably wouldn't
affect most dictionaries in most programs, because module and object
dictionaries generally don't delete and re-add a lot of keys very
often. But in cases where a dictionary is used as a queue or stack or
something of that sort, the packing costs could add up. Under the
current scheme, as long as collisions were minimal, the contents
wouldn't be repacked very often.

Without numbers it's hard to say for certain, but the advantage to
keeping ordered dictionaries a distinct type is that the standard
dictionary type can then get that extra bit of speed in exchange for
dropping the ordering requirement.

Antoine Pitrou

unread,

Dec 10, 2012, 1:47:37 PM12/10/12

to pytho...@python.org

On Mon, 10 Dec 2012 11:53:17 -0500
Barry Warsaw <ba...@python.org> wrote:
> On Dec 10, 2012, at 05:39 PM, Christian Heimes wrote:
>
> >I had ARM platforms in mind, too. Unfortunately we don't have any ARM
> >platforms for testing and performance measurement. Even Trent's
> >Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi,
> >that's all.
>
> I have a few ARM machines that I can use for testing, though I can't provide
> external access to them.
>
> * http://buildbot.python.org/all/buildslaves/warsaw-ubuntu-arm
> (Which oops, I see is down. Why did I not get notifications about that?)

Because buildbot.python.org is waiting for someone (mail.python.org,
perhaps) to accept its SMTP requests. Feel free to ping the necessary
people ;-)

Regards

Antoine.

Terry Reedy

unread,

Dec 10, 2012, 3:50:15 PM12/10/12

to pytho...@python.org

On 12/10/2012 1:38 PM, PJ Eby wrote:
> On Mon, Dec 10, 2012 at 1:01 PM, Armin Rigo <ar...@tunes.org> wrote:
>> On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <p...@telecommunity.com> wrote:
>>> On the other hand, this would also make a fast ordered dictionary
>>> subclass possible, just by not using the free list for additions,
>>> combined with periodic compaction before adds or after deletes.
>>
>> Technically, I could see Python switching to ordered dictionaries
>> everywhere. Raymond's insight suddenly makes it easy for CPython and
>> PyPy, and at least Jython could use the LinkedHashMap class (although
>> this would need checking with Jython guys).
>
> What about IronPython?
>
> Also, note that using ordered dictionaries carries a performance cost
> for dictionaries whose keys change a lot. This probably wouldn't
> affect most dictionaries in most programs, because module and object
> dictionaries generally don't delete and re-add a lot of keys very
> often. But in cases where a dictionary is used as a queue or stack or
> something of that sort, the packing costs could add up. Under the
> current scheme, as long as collisions were minimal, the contents
> wouldn't be repacked very often.
>
> Without numbers it's hard to say for certain, but the advantage to
> keeping ordered dictionaries a distinct type is that the standard
> dictionary type can then get that extra bit of speed in exchange for
> dropping the ordering requirement.

I think that there should be a separate OrderedDict class (or subclass)
and that the dict docs should warn (if not now) that while iterating a
dict *may*, in some circumstances, give items in insertion *or* sort
order, it *will not* in all cases. Example of the latter:

>>> d = {8:0, 9:0, 15:0, 13:0, 14:0}
>>> for k in d: print(k)

8
9
13
14
15

If one entered the keys in sorted order, as might be natural, one might
mistakenly think that insertion order was being reproduced.

--
Terry Jan Reedy

Tim Delaney

unread,

Dec 10, 2012, 4:29:50 PM12/10/12

to Armin Rigo, PJ Eby, Antoine Pitrou, Python-Dev

On 11 December 2012 05:01, Armin Rigo <ar...@tunes.org> wrote:

Hi Philip,

On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <p...@telecommunity.com> wrote:
> On the other hand, this would also make a fast ordered dictionary
> subclass possible, just by not using the free list for additions,
> combined with periodic compaction before adds or after deletes.

Technically, I could see Python switching to ordered dictionaries
everywhere. Raymond's insight suddenly makes it easy for CPython and
PyPy, and at least Jython could use the LinkedHashMap class (although
this would need checking with Jython guys). I'd vaguely argue that
dictionary orders are one of the few non-reproducible factors in a
Python program, so it might be a good thing. But only vaguely ---
maybe I got far too used to random orders over time...

Whilst I think Python should not move to ordered dictionaries everywhere, I would say there is an argument (no pun intended) for making **kwargs a dictionary that maintains insertion order *if there are no deletions*. It sounds like we could get that for free with this implementation, although from another post IronPython might not have something suitable.

I think there are real advantages to doing so - a trivial one being the ability to easily initialise an ordered dictionary from another ordered dictionary.

I could also see an argument for having this property for all dicts. There are many dictionaries that are never deleted from (probably most dict literals) and it would be nice to have an iteration order for them that matched the source code.

However if deletions occur all bets would be off. If you need to maintain insertion order in the face of deletions, use an explicit ordereddict.

Tim Delaney

Andrew Svetlov

unread,

Dec 10, 2012, 5:14:10 PM12/10/12

to Tim Delaney, PJ Eby, Antoine Pitrou, Python-Dev

On Mon, Dec 10, 2012 at 11:29 PM, Tim Delaney <tim.d...@aptare.com> wrote:
> On 11 December 2012 05:01, Armin Rigo <ar...@tunes.org> wrote:
>>
>> Hi Philip,
>>
>> On Mon, Dec 10, 2012 at 5:16 PM, PJ Eby <p...@telecommunity.com> wrote:
>> > On the other hand, this would also make a fast ordered dictionary
>> > subclass possible, just by not using the free list for additions,
>> > combined with periodic compaction before adds or after deletes.
>>
>> Technically, I could see Python switching to ordered dictionaries
>> everywhere. Raymond's insight suddenly makes it easy for CPython and
>> PyPy, and at least Jython could use the LinkedHashMap class (although
>> this would need checking with Jython guys). I'd vaguely argue that
>> dictionary orders are one of the few non-reproducible factors in a
>> Python program, so it might be a good thing. But only vaguely ---
>> maybe I got far too used to random orders over time...
>
>
> Whilst I think Python should not move to ordered dictionaries everywhere, I
> would say there is an argument (no pun intended) for making **kwargs a
> dictionary that maintains insertion order *if there are no deletions*. It
> sounds like we could get that for free with this implementation, although
> from another post IronPython might not have something suitable.
>

Please, no. dict and kwargs should be unordered without any assumptions.

> I think there are real advantages to doing so - a trivial one being the
> ability to easily initialise an ordered dictionary from another ordered
> dictionary.
>

It can be done with adding short-circuit for OrderedDict class to
accept another OrderedDict instance.

> I could also see an argument for having this property for all dicts. There
> are many dictionaries that are never deleted from (probably most dict
> literals) and it would be nice to have an iteration order for them that
> matched the source code.
>
> However if deletions occur all bets would be off. If you need to maintain
> insertion order in the face of deletions, use an explicit ordereddict.
>
> Tim Delaney
>

> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:

> http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
>

--
Thanks,
Andrew Svetlov

PJ Eby

unread,

Dec 10, 2012, 6:05:04 PM12/10/12

to Tim Delaney, Antoine Pitrou, Python-Dev

On Mon, Dec 10, 2012 at 4:29 PM, Tim Delaney <tim.d...@aptare.com> wrote:
> Whilst I think Python should not move to ordered dictionaries everywhere, I
> would say there is an argument (no pun intended) for making **kwargs a
> dictionary that maintains insertion order *if there are no deletions*.

Oooh. Me likey. There have been many times where I've wished kwargs
were ordered when designing an API.

(Oddly, I don't remember any one of the APIs specifically, so don't
ask me for a good example. I just remember a bunch of different
physical locations where I was when I thought, "Ooh, what if I
could... no, that's not going to work.")

One other useful place for ordered dictionaries is class definitions
processed by class decorators: no need to write a metaclass just to
know what order stuff was defined in.

> It sounds like we could get that for free with this implementation, although
> from another post IronPython might not have something suitable.

Actually, IronPython may already have ordered dictionaries by default; see:

http://mail.python.org/pipermail/ironpython-users/2006-May/002319.html

It's described as an implementation detail that may change, perhaps
that could be changed to being unchanging. ;-)

> I think there are real advantages to doing so - a trivial one being the ability
> to easily initialise an ordered dictionary from another ordered dictionary.

Or to merge two of them together, either at creation or .update().

I'm really starting to wonder if it might not be worth paying the
compaction overhead to just make all dictionaries ordered, all the
time. The problem is that if you are always adding new keys and
deleting old ones (as might be done in a LRU cache, a queue, or other
things like that) you'll probably see a lot of compaction overhead
compared to today's dicts.

OTOH... with a good algorithm for deciding *when* to compact, we can
actually make the amortized cost O(1) or so, so maybe that's not a big
deal. The cost to do a compaction is at worst, the current size of
the table. So if you wait until a table has twice as many entries
(more precisely, until the index of the last entry is twice what it
was at last compaction), you will amortize the compaction cost down to
one entry move per add, or O(1).

That would handle the case of a cache or queue, but I'm not sure how
it would work with supersized dictionaries that are then deleted down
to a fraction of their original size. I suppose if you delete your
way down to half the entries being populated, then you end up with two
moves per delete, or O(2). (Yes, I know that's not a valid O number.)

So, offhand, it seems pretty doable, and unlikely to significantly
change worst-case performance even for pathological use cases. (Like
using a dict when you should be using a deque.)

PJ Eby

unread,

Dec 10, 2012, 6:06:43 PM12/10/12

to Andrew Svetlov, Python-Dev

On Mon, Dec 10, 2012 at 5:14 PM, Andrew Svetlov
<andrew....@gmail.com> wrote:
> Please, no. dict and kwargs should be unordered without any assumptions.

Why?

fwier...@gmail.com

unread,

Dec 10, 2012, 6:13:14 PM12/10/12

to Armin Rigo, PJ Eby, Antoine Pitrou, python-dev List

On Mon, Dec 10, 2012 at 10:01 AM, Armin Rigo <ar...@tunes.org> wrote:
> Technically, I could see Python switching to ordered dictionaries
> everywhere. Raymond's insight suddenly makes it easy for CPython and
> PyPy, and at least Jython could use the LinkedHashMap class (although
> this would need checking with Jython guys).

I honestly hope this doesn't happen - we use ConcurrentHashMap for our
dictionaries (which lack ordering) and I'm sure getting it to preserve
insertion order would cost us.

-Frank

Raymond Hettinger

unread,

Dec 10, 2012, 6:17:57 PM12/10/12

to Christian Heimes, Python Dev

On Dec 10, 2012, at 2:48 AM, Christian Heimes <chri...@python.org> wrote:

On the other hand every lookup and collision checks needs an additional
multiplication, addition and pointer dereferencing:

entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx

Currently, the dict implementation allows alternative lookup

functions based on whether the keys are all strings.

The choice of lookup function is stored in a function pointer.

That lets each lookup use the currently active lookup

function without having to make any computations or branches.

Likewise, the lookup functions could be swapped between

char, short, long, and ulong index sizes during the resize step.

IOW, the selection only needs to be made once per resize,

not one per lookup.

Raymond

fwier...@gmail.com

unread,

Dec 10, 2012, 6:21:37 PM12/10/12

to Armin Rigo, PJ Eby, Antoine Pitrou, python-dev List

On Mon, Dec 10, 2012 at 3:13 PM, fwier...@gmail.com
<fwier...@gmail.com> wrote:
> On Mon, Dec 10, 2012 at 10:01 AM, Armin Rigo <ar...@tunes.org> wrote:
>> Technically, I could see Python switching to ordered dictionaries
>> everywhere. Raymond's insight suddenly makes it easy for CPython and
>> PyPy, and at least Jython could use the LinkedHashMap class (although
>> this would need checking with Jython guys).
> I honestly hope this doesn't happen - we use ConcurrentHashMap for our
> dictionaries (which lack ordering) and I'm sure getting it to preserve
> insertion order would cost us.

I just found this http://code.google.com/p/concurrentlinkedhashmap/ so
maybe it wouldn't be all that bad. I still personally like the idea of
leaving basic dict order undetermined (there is already an OrderedDict
if you need it right?) But if ConcurrentLinkedHashMap is as good as is
suggested on that page then Jython doesn't need to be the thing that
blocks the argument.

Raymond Hettinger

unread,

Dec 10, 2012, 6:39:22 PM12/10/12

to Mark Shannon, pytho...@python.org

On Dec 10, 2012, at 4:06 AM, Mark Shannon <ma...@hotpy.org> wrote:

Instead, the 24-byte entries should be stored in a
dense table referenced by a sparse table of indices.

What minimum size and resizing factor do you propose for the entries array?

There are many ways to do this. I don't know which is best.

The proof-of-concept code uses the existing list resizing logic.

Another approach is to pre-allocate the two-thirds maximum

(This is simple and fast but gives the smallest space savings.)

A hybrid approach is to allocate in two steps (1/3 and then 2/3

if needed).

Since hash tables aren't a new problem, there may

already be published research on the best way to handle

the entries array.

Raymond

Raymond Hettinger

unread,

Dec 10, 2012, 6:44:49 PM12/10/12

to PJ Eby, Antoine Pitrou, pytho...@python.org

On Dec 10, 2012, at 1:38 PM, PJ Eby <p...@telecommunity.com> wrote:

Without numbers it's hard to say for certain, but the advantage to
keeping ordered dictionaries a distinct type is that the standard
dictionary type can then get that extra bit of speed in exchange for
dropping the ordering requirement.

I expect that dicts and OrderedDicts will remain separate

for reasons of speed, space, and respect for people's mental models.

Raymond

Mark Shannon

unread,

Dec 10, 2012, 7:04:14 PM12/10/12

to Raymond Hettinger, pytho...@python.org

On 10/12/12 23:39, Raymond Hettinger wrote:
>
> On Dec 10, 2012, at 4:06 AM, Mark Shannon <ma...@hotpy.org

> <mailto:ma...@hotpy.org>> wrote:
>
>>> Instead, the 24-byte entries should be stored in a
>>> dense table referenced by a sparse table of indices.
>>
>> What minimum size and resizing factor do you propose for the entries
>> array?
>
> There are many ways to do this. I don't know which is best.
> The proof-of-concept code uses the existing list resizing logic.
> Another approach is to pre-allocate the two-thirds maximum

What do you mean by maximum?

> (This is simple and fast but gives the smallest space savings.)
> A hybrid approach is to allocate in two steps (1/3 and then 2/3
> if needed).

I think you need to do some more analysis on this. It is possible that
there is some improvement to be had from your approach, but I don't
think the improvements will be as large as you have claimed.

I suspect that you may be merely trading performance for reduced memory
use, which can be done much more easily by reducing the minimum size
and increasing the load factor.

Cheers,
Mark.

Raymond Hettinger

unread,

Dec 10, 2012, 10:45:07 PM12/10/12

to Mark Shannon, pytho...@python.org

On Dec 10, 2012, at 7:04 PM, Mark Shannon <ma...@hotpy.org> wrote:

Another approach is to pre-allocate the two-thirds maximum

(This is simple and fast but gives the smallest space savings.)

What do you mean by maximum?

A dict with an index table size of 8 gets resized when it is two-thirds full,

so the maximum number of entries is 5. If you pre-allocate five entries

for the initial dict, you've spent 5 * 24 bytes + 8 bytes for the indices

for a total of 128 bytes. This compares to the current table of 8 * 24 bytes

totaling 192 bytes.

Many other strategies are possible. The proof-of-concept code

uses the one employed by regular python lists.

Their growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ....

This produces nice memory savings for entry lists.

If you have a suggested allocation pattern or other

constructive suggestion, it would be would welcome.

Further sniping and unsubstantiated FUD would not.

Raymond

Trent Nelson

unread,

Dec 10, 2012, 11:20:55 PM12/10/12

to Barry Warsaw, Christian Heimes, pytho...@python.org

On Mon, Dec 10, 2012 at 08:53:17AM -0800, Barry Warsaw wrote:
> On Dec 10, 2012, at 05:39 PM, Christian Heimes wrote:
>
> >I had ARM platforms in mind, too. Unfortunately we don't have any ARM
> >platforms for testing and performance measurement. Even Trent's
> >Snakebite doesn't have ARM boxes. I've a first generation Raspberry Pi,
> >that's all.

> >Perhaps somebody (PSF ?) is willing to donate a couple of ARM boards to
> >Snakebite. I'm thinking of Raspberry Pi (ARMv6), Pandaboard (ARMv7
> >Cortex-A9) and similar.
>
> Suitable ARM boards can be had cheap, probably overwhelmed by labor costs of
> getting them up and running. I am not offering for *that*. ;)

If someone donates the hardware, I'll take care of everything else.

Trent.

Raymond Hettinger

unread,

Dec 10, 2012, 11:59:31 PM12/10/12

to Serhiy Storchaka, pytho...@python.org

On Dec 10, 2012, at 4:02 AM, Serhiy Storchaka <stor...@gmail.com> wrote:

On 10.12.12 05:30, Raymond Hettinger wrote:
On Dec 9, 2012, at 10:03 PM, MRAB <pyt...@mrabarnett.plus.com
<mailto:pyt...@mrabarnett.plus.com>> wrote:
What happens when a key is deleted? Will that leave an empty slot in
the entry array?
Yes. See the __delitem__() method in the pure python implemention
at http://code.activestate.com/recipes/578375

You can move the last entry on freed place. This will preserve compactness of entries array and simplify and speedup iterations and some other operations.

def __delitem__(self, key, hashvalue=None):
       if hashvalue is None:
           hashvalue = hash(key)
       found, i = self._lookup(key, hashvalue)
       if found:
           index = self.indices[i]
           self.indices[i] = DUMMY
           self.size -= 1

if index != size:
lasthash = self.hashlist[index]

               lastkey = self.keylist[index]
               found, j = self._lookup(lastkey, lasthash)
               assert found
               assert i != j
               self.indices[j] = index
               self.hashlist[index] = lasthash
               self.keylist[index] = lastkey
               self.valuelist[index] = self.valuelist[size]
               index = size
           self.hashlist[index] = UNUSED
           self.keylist[index] = UNUSED
           self.valuelist[index] = UNUSED
       else:
           raise KeyError(key)

That is a clever improvement. Thank you.

Using your idea (plus some tweaks) cleans-up the code a bit

(simplifying iteration, simplifying the resizing logic, and eliminating the UNUSED constant).

I'm updating the posted code to reflect your suggestion.

Thanks again,

Raymond

Mark Shannon

unread,

Dec 11, 2012, 3:41:32 AM12/11/12

to Raymond Hettinger, pytho...@python.org

On 11/12/12 03:45, Raymond Hettinger wrote:
>
> On Dec 10, 2012, at 7:04 PM, Mark Shannon <ma...@hotpy.org

> <mailto:ma...@hotpy.org>> wrote:
>
>>> Another approach is to pre-allocate the two-thirds maximum
>>> (This is simple and fast but gives the smallest space savings.)
>>
>> What do you mean by maximum?
>
> A dict with an index table size of 8 gets resized when it is two-thirds
> full,
> so the maximum number of entries is 5. If you pre-allocate five entries
> for the initial dict, you've spent 5 * 24 bytes + 8 bytes for the indices
> for a total of 128 bytes. This compares to the current table of 8 * 24
> bytes
> totaling 192 bytes.
>
> Many other strategies are possible. The proof-of-concept code
> uses the one employed by regular python lists.
> Their growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ....
> This produces nice memory savings for entry lists.
>
> If you have a suggested allocation pattern or other
> constructive suggestion, it would be would welcome.

It seems like a reasonable starting point.
Trying to avoid resizing the index array and the entries array at the
same time is probably a good idea.

> Further sniping and unsubstantiated FUD would not.

Is asking that you back up your claims with some analysis
that unreasonable?

When you make claims such as
"""
The memory savings are significant (from 30% to 95%
compression depending on the how full the table is).
Small dicts (size 0, 1, or 2) get the most benefit.
"""
is it a surprise that I am sceptical?

I like you idea. I just don't want everyone getting their
hopes up for what may turn out to be a fairly minor improvement.

Don't forget Unladen Swallow :)

Antoine Pitrou

unread,

Dec 11, 2012, 4:13:31 AM12/11/12

to pytho...@python.org

Le Mon, 10 Dec 2012 18:17:57 -0500,
Raymond Hettinger <raymond....@gmail.com> a écrit :

>
> On Dec 10, 2012, at 2:48 AM, Christian Heimes <chri...@python.org>
> wrote:
>
> > On the other hand every lookup and collision checks needs an
> > additional multiplication, addition and pointer dereferencing:
> >
> > entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx
>
>
> Currently, the dict implementation allows alternative lookup
> functions based on whether the keys are all strings.
> The choice of lookup function is stored in a function pointer.
> That lets each lookup use the currently active lookup
> function without having to make any computations or branches.

An indirect function call is technically a branch, as seen from the CPU
(and not necessarily a very predictable one, although recent Intel
CPUs are said to be quite good at that).

Regards

Antoine.

Antoine Pitrou

unread,

Dec 11, 2012, 4:15:38 AM12/11/12

to pytho...@python.org

Le Tue, 11 Dec 2012 08:41:32 +0000,
Mark Shannon <ma...@hotpy.org> a écrit :

> >
> > If you have a suggested allocation pattern or other
> > constructive suggestion, it would be would welcome.
> It seems like a reasonable starting point.
> Trying to avoid resizing the index array and the entries array at the
> same time is probably a good idea.

Why would you want to avoid that?
If we want to allocate the dict's data as a single memory block (which
saves a bit in memory consumption and also makes dict allocations
faster), we need to resize both arrays at the same time.

Regards

Antoine.

Serhiy Storchaka

unread,

Dec 11, 2012, 6:12:10 AM12/11/12

to pytho...@python.org

Yet some comments about your Python implementation.

1. Don't use "is FREE" and "is DUMMY" as array doesn't preserve identity.

2. Wrong limits used in _make_index(): 128 overflows 'b', 65536
overflows 'h' and 'l' can be not enough for ssize_t.

3. round_upto_powtwo() can be implemented as 1 << n.bit_length().

4. i * 5 faster than (i << 2) + i on Python.

5. You can get rid of "size" attribute and use len(self.keylist) instead.

Dino Viehland

unread,

Dec 11, 2012, 2:37:13 PM12/11/12

to PJ Eby, Tim Delaney, Antoine Pitrou, Python-Dev

PJ wrote:
> Actually, IronPython may already have ordered dictionaries by default; see:
>
> http://mail.python.org/pipermail/ironpython-users/2006-
> May/002319.html
>
> It's described as an implementation detail that may change, perhaps that
> could be changed to being unchanging. ;-)
>

I think this has changed since 2006. IronPython was originally using the .NET
dictionary class and just locking while using it, but it now has a custom dictionary
which is thread safe for multiple readers and allows 1 writer. But it doesn't do
anything to preserve order of insertions.

OTOH changing certain dictionaries in IronPython (such as keyword args) to be
ordered would certainly be possible. Personally I just wouldn't want to see it
be the default as that seems like unnecessary overhead when the specialized
class exists.

Nick Coghlan

unread,

Dec 11, 2012, 7:15:34 PM12/11/12

to Dino Viehland, PJ Eby, Antoine Pitrou, Tim Delaney, Python-Dev

On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <di...@microsoft.com> wrote:

OTOH changing certain dictionaries in IronPython (such as keyword args) to be
ordered would certainly be possible. Personally I just wouldn't want to see it
be the default as that seems like unnecessary overhead when the specialized
class exists.

Which reminds me, I was going to note that one of the main gains with ordered keyword arguments, is their use in the construction of string-keyed objects where you want to be able to control the order of iteration (e.g. for serialisation or display purposes). Currently you have to go the path of something like namedtuple where you define the order of iteration in one operation, and set the values in another.

Initialising an ordered dict itself is one obvious use case, but anything else where you want to control the iteration order *and* set field names and values in a single call will potentially benefit.

Independently of that, I'll note that this change would make it possible to add a .sort() method to dictionaries. Any subsequent mutation of the dictionary would requiring resorting, though (which isn't really all that different from maintaining a sorted list). The performance impact definitely needs to be benchmarked though, as the need to read two memory locations rather than one for a dictionary read could have weird caching effects. (Fortunately, many of the benchmarks run on Python 3.3 now, so it should be possible to get that data fairly easily)

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Dag Sverre Seljebotn

unread,

Dec 12, 2012, 3:37:08 PM12/12/12

to Nick Coghlan, PJ Eby, Antoine Pitrou, Tim Delaney, Python-Dev

On 12/12/2012 01:15 AM, Nick Coghlan wrote:
> On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <di...@microsoft.com
> <mailto:di...@microsoft.com>> wrote:
>
> OTOH changing certain dictionaries in IronPython (such as keyword
> args) to be
> ordered would certainly be possible. Personally I just wouldn't
> want to see it
> be the default as that seems like unnecessary overhead when the
> specialized
> class exists.
>
>
> Which reminds me, I was going to note that one of the main gains with
> ordered keyword arguments, is their use in the construction of
> string-keyed objects where you want to be able to control the order of
> iteration (e.g. for serialisation or display purposes). Currently you
> have to go the path of something like namedtuple where you define the
> order of iteration in one operation, and set the values in another.

So here's a brand new argument against ordered dicts: The existence of
perfect hashing schemes. They fundamentally conflict with ordered dicts.

I played with using them for vtable dispatches in Cython this summer,
and they can perform really, really well for branch-predicted lookups in
hot loops, because you always/nearly always eliminate linear probing and
so there's no branch misses or extra comparisons. (The overhead of a
perfect hash table lookup over a traditional vtable lookup was only a
couple of cycles in my highly artificial fully branch-predicted
micro-benchmark.)

There's some overhead in setup; IIRC, ~20 microseconds for 64 elements,
2 GHz CPU, though that was a first prototype implementation and both
algorithmic improvements and tuning should be possible.

So it's not useful for everything, but perhaps for things like module
dictionaries and classes an "optionally perfect" dict can make sense.

Note: I'm NOT suggesting the use of perfect hashing, just making sure
it's existence is mentioned and that one is aware that if always-ordered
dicts become the language standard it precludes this option far off in
the future.

(Something like a sort() method could still work and make the dict
"unperfect"; one could also have a pack() method that made the dict
perfect again.).

That concludes the on-topic parts of my post.

--
Dag Sverre Seljebotn

APPENDIX

Going off-topic for those who are interested, here's the longwinded and
ugly details. My code [1] is based on the paper [2] (psuedo-code in
Appendix A), but I adapted it a bit to be useful for tens/hundreds of
elements rather than billions.

The ingredients:

1) You need the hash to be 32 bits (or 64) of good entropy (md5 or
murmurhash or similar). (Yes, that's a tall order for CPython, I'm just
describing the scheme.) (If the hash collides on all bits you *will*
collide, so some fallback is still necesarry, just unlikely.)

2) To lookup, the idea is (psuedo-code!)

typedef struct {
int m_f m_g, r, k;
int16_t d[k]; /* "small" int, like current proposal */
} table_header_t;

And then one computes index of an element with hash "h" using the function

((h >> tab->r) & tab->m_f) ^ tab->d[h & tab->m_g]

rather than the usual "h % n". While more arithmetic, arithmetic is
cheap and branch misses are not.

3) To set up/repack a table one needs to find the parameters. The
general idea is:

a) Partition the hashes into k bins by using "h & m_g". There will be
collisions, but the number of bins with many collisions will be very
small; most bins will have 2 or 1 or 0 elements.

b) Starting with the largest bin, distribute the elements according to
the hash function. If a bin collides with the existing contents, try
another value for d[binindex] until it doesn't.

The r parameter let's you try again 32 (or 64) times to find a solution.
In my testcases there was ~0.1% chance of not finding a solution (that
is, exhausting possible choices of r) with 64-bit hashes with 4 or 8
elements and no empty table elements. For any other number of elements,
or with some empty elements, the chance of failure was much lower.)

[1] It's not exactly a great demo, but it contains the algorithm. If
there's much interest I should clean it up and make a proper benchmark
demo out of it:

https://github.com/dagss/pyextensibletype/blob/perfecthash/include/perfecthash.h

[2] Pagh (1999)

http://www.brics.dk/RS/99/13/BRICS-RS-99-13.ps.gz

PJ Eby

unread,

Dec 12, 2012, 4:31:01 PM12/12/12

to Dag Sverre Seljebotn, Python-Dev, Nick Coghlan, Tim Delaney, Antoine Pitrou

On Wed, Dec 12, 2012 at 3:37 PM, Dag Sverre Seljebotn
<d.s.se...@astro.uio.no> wrote:
> On 12/12/2012 01:15 AM, Nick Coghlan wrote:
>>
>> On Wed, Dec 12, 2012 at 5:37 AM, Dino Viehland <di...@microsoft.com
>> <mailto:di...@microsoft.com>> wrote:
>>
>> OTOH changing certain dictionaries in IronPython (such as keyword
>> args) to be
>> ordered would certainly be possible. Personally I just wouldn't
>> want to see it
>> be the default as that seems like unnecessary overhead when the
>> specialized
>> class exists.
>>
>>
>> Which reminds me, I was going to note that one of the main gains with
>> ordered keyword arguments, is their use in the construction of
>> string-keyed objects where you want to be able to control the order of
>> iteration (e.g. for serialisation or display purposes). Currently you
>> have to go the path of something like namedtuple where you define the
>> order of iteration in one operation, and set the values in another.
>
>
> So here's a brand new argument against ordered dicts: The existence of
> perfect hashing schemes. They fundamentally conflict with ordered dicts.

If I understand your explanation, then they don't conflict with the
type of ordering described in this thread. Raymond's optimization
separates the "hash table" part from the "contents" part of a
dictionary, and there is no requirement that these two parts be in the
same size or the same order.

Indeed, Raymond's split design lets you re-parameterize the hashing
all you want, without perturbing the iteration order at all. You
would in fact be able to take a dictionary at any moment, and say,
"optimize the 'hash table' part to a non-colliding state based on the
current contents", without touching the 'contents' part at all.

(One could do this at class creation time on a class dictionary, and
just after importing on a module dictionary, for example.)

Dag Sverre Seljebotn

unread,

Dec 12, 2012, 5:06:18 PM12/12/12

to PJ Eby, Python-Dev, Nick Coghlan, Tim Delaney, Antoine Pitrou

I don't fully agree.

Perfect hashing already separates "hash table" from "contents" (sort
of), and saves the memory in much the same way.

Yes, you can repeat the trick and have 2 levels of indirection, but that
then requires an additional table of small ints which is pure overhead
present for the sorting; in short, it's no longer an optimization but
just overhead for the sortability.

Dag Sverre

Dag Sverre Seljebotn

unread,

Dec 12, 2012, 5:09:37 PM12/12/12

to PJ Eby, Python-Dev, Nick Coghlan, Tim Delaney, Antoine Pitrou

This was a bit inaccurate, but the point is: The perfect hash function
doesn't need any fill-in to avoid collisions, you can (except in
exceptional circumstances) fill the table 100%, so the memory is already
saved.

Dag Sverre

PJ Eby

unread,

Dec 13, 2012, 12:11:23 AM12/13/12

to Dag Sverre Seljebotn, Python-Dev, Nick Coghlan, Tim Delaney, Antoine Pitrou

On Wed, Dec 12, 2012 at 5:06 PM, Dag Sverre Seljebotn
<d.s.se...@astro.uio.no> wrote:
> Perfect hashing already separates "hash table" from "contents" (sort of),
> and saves the memory in much the same way.
>
> Yes, you can repeat the trick and have 2 levels of indirection, but that
> then requires an additional table of small ints which is pure overhead
> present for the sorting; in short, it's no longer an optimization but just
> overhead for the sortability.

I'm confused. I understood your algorithm to require repacking,
rather than it being a suitable algorithm for incremental change to an
existing dictionary. ISTM that that would mean you still pay some
sort of overhead (either in time or space) while the dictionary is
still being mutated.

Also, I'm not sure how 2 levels of indirection come into it. The
algorithm you describe has, as I understand it, only up to 12
perturbation values ("bins"), so it's a constant space overhead, not a
variable one. What's more, you can possibly avoid the extra memory
access by using a different perfect hashing algorithm, at the cost of
a slower optimization step or using a little more memory.

> Note: I'm NOT suggesting the use of perfect hashing, just making
> sure it's existence is mentioned and that one is aware that if
> always-ordered dicts become the language standard it precludes
> this option far off in the future.

Not really. It means that some forms of perfect hashing might require
adding a few more ints worth of overhead for the dictionaries that use
it. If it's really a performance benefit for very-frequently-used
dictionaries, that might still be worthwhile.

Dag Sverre Seljebotn

unread,

Dec 13, 2012, 2:26:47 AM12/13/12

to PJ Eby, Python-Dev, Nick Coghlan, Tim Delaney, Antoine Pitrou

On 12/13/2012 06:11 AM, PJ Eby wrote:
> On Wed, Dec 12, 2012 at 5:06 PM, Dag Sverre Seljebotn
> <d.s.se...@astro.uio.no> wrote:
>> Perfect hashing already separates "hash table" from "contents" (sort of),
>> and saves the memory in much the same way.
>>
>> Yes, you can repeat the trick and have 2 levels of indirection, but that
>> then requires an additional table of small ints which is pure overhead
>> present for the sorting; in short, it's no longer an optimization but just
>> overhead for the sortability.
>
> I'm confused. I understood your algorithm to require repacking,
> rather than it being a suitable algorithm for incremental change to an
> existing dictionary. ISTM that that would mean you still pay some
> sort of overhead (either in time or space) while the dictionary is
> still being mutated.

As-is the algorithm just assumes all key-value-pairs are available at
creation time.

So yes, if you don't reallocate when making the dict perfect then it
could make sense to combine it with the scheme discussed in this thread.

If one does leave some free slots open there's some probability of an
insertion working without complete repacking, but the probability is
smaller than with a normal dict. Hybrid schemes and trade-offs in this
direction could be possible.

>
> Also, I'm not sure how 2 levels of indirection come into it. The
> algorithm you describe has, as I understand it, only up to 12
> perturbation values ("bins"), so it's a constant space overhead, not a
> variable one. What's more, you can possibly avoid the extra memory
> access by using a different perfect hashing algorithm, at the cost of
> a slower optimization step or using a little more memory.

I said there's k perturbation values; you need an additional array

some_int_t d[k]

where some_int_t is large enough to hold n (the number of entries). Just
like what's proposed in this thread.

The paper recommends k > 2*n, but in my experiments I could get away
with k = n in 99.9% of the cases (given perfect entropy in the
hashes...). So the overhead is roughly the same as what's proposed here.

I think the most promising thing would be to have always have a single
integer table and either use it for indirection (usual case) or perfect
hash function parameters (say, after a pack() method has been called and
before new insertions).

>> Note: I'm NOT suggesting the use of perfect hashing, just making
>> sure it's existence is mentioned and that one is aware that if
>> always-ordered dicts become the language standard it precludes
>> this option far off in the future.
>
> Not really. It means that some forms of perfect hashing might require
> adding a few more ints worth of overhead for the dictionaries that use
> it. If it's really a performance benefit for very-frequently-used
> dictionaries, that might still be worthwhile.
>

As mentioned above the overhead is larger.

I think the main challenge is to switch to a hashing scheme with larger
entropy for strings, like murmurhash3. Having lots of zero bits in the
string for short strings will kill the scheme, since it needs several
attempts to succeed (the "r" parameter). So string hashing is slowed
down a bit (given the caching I don't know how important this is).

Ideally one should make sure the hashes 64-bit on 64-bit platforms too
(IIUC long is 32-bit on Windows but I don't know Windows well).

But the main reason I say I'm not proposing it is I don't have time to
code it up for demonstration and people like to have something to look
at when they get proposals :-)

Dag Sverre

Raymond Hettinger

unread,

Dec 18, 2012, 4:35:40 PM12/18/12

to Antoine Pitrou, pytho...@python.org

On Dec 11, 2012, at 1:13 AM, Antoine Pitrou <soli...@pitrou.net> wrote:

On Dec 10, 2012, at 2:48 AM, Christian Heimes <chri...@python.org>
wrote:

On the other hand every lookup and collision checks needs an
additional multiplication, addition and pointer dereferencing:

entry = entries_baseaddr + sizeof(PyDictKeyEntry) * idx

Currently, the dict implementation allows alternative lookup
functions based on whether the keys are all strings.
The choice of lookup function is stored in a function pointer.
That lets each lookup use the currently active lookup
function without having to make any computations or branches.

An indirect function call is technically a branch, as seen from the CPU
(and not necessarily a very predictable one, although recent Intel
CPUs are said to be quite good at that).

FWIW, we already have an indirection to the lookup function.

I would piggyback on that, so no new indirections are required.

My plan now is to apply the space compaction idea to sets.

That code is less complex than dicts, and set operations

stand to benefit the most from improved iteration speed.

The steps would be:

* Create a good set of benchmarks for set operations

for both size and speed.

* Start with the simplest version of the idea: separate the

entries table from the hash table. Keep the hash table at

Py_ssize_t, and pre-allocate the entry array to two-thirds the size

of the hash table. This should give about a 25% space savings

and speed-up iteration for all the set-to-set operations.

* If that all works out, I want to trim the entry table for frozensefs

so that the entry table has no over-allocations. This should

give a much larger space savings.

* Next, I want to experiment with the idea of using char/short/long

sizes for the hash table. Since there is already an existing

lookup indirection, this can be done with no additional overhead.

Small sets will get the most benefit for the space savings and

the cache performance for hash lookups should improve nicely

(for a hash table of size 32 could fit in a single cache line).

At each step, I'll run the benchmarks to make sure the expected

speed and space benefits are being achieved.

As a side-effect, sets without deletions will retain their insertion

order. If this is of concern, it would be no problem to implement

Antoine's idea of scrambling the entries during iteration.

Raymond

P.S. I've gotten a lot of suggestions for improvements to the

proof-of-concept code. Thank you for that. The latest version

is at: http://code.activestate.com/recipes/578375/

In that code, entries are stored in regular Python lists

and inherit their over-allocation characteristics (about

12.5% overallocated for large lists). There are many

other possible allocation strategies with their own

respective speed/space trade-offs.

Andrew Svetlov

unread,

Dec 18, 2012, 5:40:49 PM12/18/12

to Raymond Hettinger, Antoine Pitrou, Python Development List

Good plan!

> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:

> http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
>

--
Thanks,
Andrew Svetlov

Maciej Fijalkowski

unread,

Jan 3, 2013, 5:22:27 AM1/3/13

to Andrew Svetlov, Antoine Pitrou, Raymond Hettinger, Python Development List

Hello everyone.

Thanks raymond for writing down a pure python version ;-)

I did an initial port to RPython for experiments. The results (on
large dicts only) are inconclusive - it's either a bit faster or a bit
slower, depending what exactly you do. There is a possibility I messed
something up too (there is a branch rdict-experiments in PyPy, in a
very sorry state though).

One effect we did not think about is that besides extra indirection,
there is an issue with data locality - having to look in two different
large lists seems to be a hit. Again, while I tried, the approach is
not scientific at all, but unless someone points a clear flaw in the
code (it's in pypy/rlib/dict.py in rdict-experiments branch), I'm
probably abandoning this for now.

Cheers,
fijal

Antoine Pitrou

unread,

Jan 4, 2013, 6:34:40 PM1/4/13

to pytho...@python.org

On Thu, 3 Jan 2013 12:22:27 +0200
Maciej Fijalkowski <fij...@gmail.com> wrote:
> Hello everyone.
>
> Thanks raymond for writing down a pure python version ;-)
>
> I did an initial port to RPython for experiments. The results (on
> large dicts only) are inconclusive - it's either a bit faster or a bit
> slower, depending what exactly you do. There is a possibility I messed
> something up too (there is a branch rdict-experiments in PyPy, in a
> very sorry state though).

But what about the memory consumption? This seems to be the main point
of Raymond's proposal.

Regards

Antoine.

Maciej Fijalkowski

unread,

Jan 5, 2013, 4:03:42 PM1/5/13

to Antoine Pitrou, <python-dev@python.org>

On Sat, Jan 5, 2013 at 1:34 AM, Antoine Pitrou <soli...@pitrou.net> wrote:
> On Thu, 3 Jan 2013 12:22:27 +0200
> Maciej Fijalkowski <fij...@gmail.com> wrote:
>> Hello everyone.
>>
>> Thanks raymond for writing down a pure python version ;-)
>>
>> I did an initial port to RPython for experiments. The results (on
>> large dicts only) are inconclusive - it's either a bit faster or a bit
>> slower, depending what exactly you do. There is a possibility I messed
>> something up too (there is a branch rdict-experiments in PyPy, in a
>> very sorry state though).
>
> But what about the memory consumption? This seems to be the main point
> of Raymond's proposal.
>

Er. The memory consumption can be measured on pen and paper, you don't
actually need to see right?

After a lot more experimentation I came up with something that behaves
better for large dictionaries. This was for a long time a weak point
of PyPy, because of some GC details. So I guess I'll try to implement
it fully and see how it goes. Stay tuned, I'll keep you posted.

PS. PyPy does not have lots of small dictionaries because of maps (so
you don't have a dict per instance), hence their performance is not at
all that interesting for PyPy.

Cheers,
fijal

Kristján Valur Jónsson

unread,

Jan 6, 2013, 4:40:08 PM1/6/13

to Maciej Fijalkowski, Antoine Pitrou, <python-dev@python.org>

The memory part is also why I am interested in this approach.
Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%. Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter.
K

________________________________________
Frá: Python-Dev [python-dev-bounces+kristjan=ccpgam...@python.org] fyrir hönd Maciej Fijalkowski [fij...@gmail.com]
Sent: 5. janúar 2013 21:03
To: Antoine Pitrou
Cc: <pytho...@python.org>
Efni: Re: [Python-Dev] More compact dictionaries with faster iteration

Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com

Raymond Hettinger

unread,

Jan 6, 2013, 6:02:50 PM1/6/13

to Maciej Fijalkowski, Antoine Pitrou, Python Development List

On Jan 3, 2013, at 2:22 AM, Maciej Fijalkowski <fij...@gmail.com> wrote:

Hello everyone.

Thanks raymond for writing down a pure python version ;-)

Thanks for running with it.

I did an initial port to RPython for experiments. The results (on
large dicts only) are inconclusive - it's either a bit faster or a bit
slower, depending what exactly you do. There is a possibility I messed
something up too (there is a branch rdict-experiments in PyPy, in a
very sorry state though).

One effect we did not think about is that besides extra indirection,
there is an issue with data locality - having to look in two different
large lists seems to be a hit.

In pure python, I didn't see a way to bring the hash/key/value entries

side-by-side as they currently are in CPython. How does PyPy currently

handle this issue? Is there a change I could make to the recipe that

would restore data locality?

Raymond

Raymond Hettinger

unread,

Jan 6, 2013, 7:55:52 PM1/6/13

to Kristján Valur Jónsson, <python-dev@python.org>, Antoine Pitrou

On Jan 6, 2013, at 1:40 PM, Kristján Valur Jónsson <kris...@ccpgames.com> wrote:

The memory part is also why I am interested in this approach.
Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%.

Dicts resize when the table is over two-thirds full. Small dicts contain zero to five keys.

Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter.

I looked at this a few years ago and found that it hurt performance considerably. Uncle Timmy (and others) had done a darned fine job of tuning dictionaries.

Raymond

Andrea Griffini

unread,

Jan 7, 2013, 3:05:42 AM1/7/13

to Kristján Valur Jónsson, <python-dev@python.org>, Antoine Pitrou

On Sun, Jan 6, 2013 at 10:40 PM, Kristján Valur Jónsson
<kris...@ccpgames.com> wrote:

> A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter.

Something that I was thinking is if for small tables does make sense
to actually do the hashing... a small table could be kept with just
key/value pairs (saving also the hash space) with a linear scan first
for identity and then a second scan for equality.

Given the level of optimization of dicts however I'm 99% sure this was
already tried before tho.

Maciej Fijalkowski

unread,

Jan 7, 2013, 5:55:39 AM1/7/13

to Kristján Valur Jónsson, Antoine Pitrou, <python-dev@python.org>

On Sun, Jan 6, 2013 at 11:40 PM, Kristján Valur Jónsson
<kris...@ccpgames.com> wrote:
> The memory part is also why I am interested in this approach.
> Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%. Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter.
> K

If you're very interested in the memory footprint you should do what
PyPy does. It gives you no speed benefits without the JIT, but it
makes all the instances behave like they are having slots. The trick
is to keep a dictionary name -> index into a list on a class (you go
through hierarchy of such dicts as you add or remove attributes) and a
list of attributes on the instance.

We call it maps, V8 calls it hidden classes, it's well documented on
the pypy blog.

Cheers,
fijal
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Mark Shannon

unread,

Jan 7, 2013, 6:08:36 AM1/7/13

to pytho...@python.org

On 07/01/13 10:55, Maciej Fijalkowski wrote:
> On Sun, Jan 6, 2013 at 11:40 PM, Kristján Valur Jónsson
> <kris...@ccpgames.com> wrote:
>> The memory part is also why I am interested in this approach.
>> Another thing has been bothering me. This is the fact that with the default implementation, the smll table is only ever populated up to a certain percentage, I cant recall, perhaps 50%. Since the small table is small by definition, I think it ought to be worth investigating if we cannot allow it to fill to 100% before growing, even if it costs some collisions. A linear lookup in a handful of slots can't be that much of a bother, it is only with larger number of entries that the O(1) property starts to matter.
>> K

In 3.3 dicts no longer have a small table.

>
> If you're very interested in the memory footprint you should do what
> PyPy does. It gives you no speed benefits without the JIT, but it
> makes all the instances behave like they are having slots. The trick
> is to keep a dictionary name -> index into a list on a class (you go
> through hierarchy of such dicts as you add or remove attributes) and a
> list of attributes on the instance.
>
> We call it maps, V8 calls it hidden classes, it's well documented on
> the pypy blog.

They are called shared keys in CPython 3.3 :)

>
> Cheers,
> fijal
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> http://mail.python.org/mailman/listinfo/python-dev

> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org

Maciej Fijalkowski

unread,

Jan 7, 2013, 9:02:30 AM1/7/13

to Mark Shannon, <python-dev@python.org>

On Mon, Jan 7, 2013 at 1:08 PM, Mark Shannon <ma...@hotpy.org> wrote:
>
>
> On 07/01/13 10:55, Maciej Fijalkowski wrote:
>>
>> On Sun, Jan 6, 2013 at 11:40 PM, Kristján Valur Jónsson
>> <kris...@ccpgames.com> wrote:
>>>
>>> The memory part is also why I am interested in this approach.
>>> Another thing has been bothering me. This is the fact that with the
>>> default implementation, the smll table is only ever populated up to a
>>> certain percentage, I cant recall, perhaps 50%. Since the small table is
>>> small by definition, I think it ought to be worth investigating if we cannot
>>> allow it to fill to 100% before growing, even if it costs some collisions.
>>> A linear lookup in a handful of slots can't be that much of a bother, it is
>>> only with larger number of entries that the O(1) property starts to matter.
>>> K
>
>
> In 3.3 dicts no longer have a small table.
>
>>
>> If you're very interested in the memory footprint you should do what
>> PyPy does. It gives you no speed benefits without the JIT, but it
>> makes all the instances behave like they are having slots. The trick
>> is to keep a dictionary name -> index into a list on a class (you go
>> through hierarchy of such dicts as you add or remove attributes) and a
>> list of attributes on the instance.
>>
>> We call it maps, V8 calls it hidden classes, it's well documented on
>> the pypy blog.
>
>
> They are called shared keys in CPython 3.3 :)

shared keys don't go to the same lengths, do they?

>
>>
>> Cheers,
>> fijal
>> _______________________________________________
>> Python-Dev mailing list
>> Pytho...@python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org
>>
> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:

> http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

Maciej Fijalkowski

unread,

Jan 8, 2013, 9:49:52 AM1/8/13

to Raymond Hettinger, Python Dev

On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger
<raymond....@gmail.com> wrote:
> The current memory layout for dictionaries is
> unnecessarily inefficient. It has a sparse table of
> 24-byte entries containing the hash value, key pointer,
> and value pointer.
>
> Instead, the 24-byte entries should be stored in a
> dense table referenced by a sparse table of indices.
>
> For example, the dictionary:
>
> d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
>
> is currently stored as:
>
> entries = [['--', '--', '--'],
> [-8522787127447073495, 'barry', 'green'],
> ['--', '--', '--'],
> ['--', '--', '--'],
> ['--', '--', '--'],
> [-9092791511155847987, 'timmy', 'red'],
> ['--', '--', '--'],
> [-6480567542315338377, 'guido', 'blue']]
>
> Instead, the data should be organized as follows:
>
> indices = [None, 1, None, None, None, 0, None, 2]
> entries = [[-9092791511155847987, 'timmy', 'red'],
> [-8522787127447073495, 'barry', 'green'],
> [-6480567542315338377, 'guido', 'blue']]
>
> Only the data layout needs to change. The hash table
> algorithms would stay the same. All of the current
> optimizations would be kept, including key-sharing
> dicts and custom lookup functions for string-only
> dicts. There is no change to the hash functions, the
> table search order, or collision statistics.

>
> The memory savings are significant (from 30% to 95%
> compression depending on the how full the table is).
> Small dicts (size 0, 1, or 2) get the most benefit.
>

> For a sparse table of size t with n entries, the sizes are:
>
> curr_size = 24 * t
> new_size = 24 * n + sizeof(index) * t
>
> In the above timmy/barry/guido example, the current
> size is 192 bytes (eight 24-byte entries) and the new
> size is 80 bytes (three 24-byte entries plus eight
> 1-byte indices). That gives 58% compression.
>
> Note, the sizeof(index) can be as small as a single
> byte for small dicts, two bytes for bigger dicts and
> up to sizeof(Py_ssize_t) for huge dict.
>
> In addition to space savings, the new memory layout
> makes iteration faster. Currently, keys(), values, and
> items() loop over the sparse table, skipping-over free
> slots in the hash table. Now, keys/values/items can
> loop directly over the dense table, using fewer memory
> accesses.
>
> Another benefit is that resizing is faster and
> touches fewer pieces of memory. Currently, every
> hash/key/value entry is moved or copied during a
> resize. In the new layout, only the indices are
> updated. For the most part, the hash/key/value entries
> never move (except for an occasional swap to fill a
> hole left by a deletion).
>
> With the reduced memory footprint, we can also expect
> better cache utilization.
>
> For those wanting to experiment with the design,
> there is a pure Python proof-of-concept here:
>
> http://code.activestate.com/recipes/578375
>
> YMMV: Keep in mind that the above size statics assume a
> build with 64-bit Py_ssize_t and 64-bit pointers. The
> space savings percentages are a bit different on other
> builds. Also, note that in many applications, the size
> of the data dominates the size of the container (i.e.
> the weight of a bucket of water is mostly the water,
> not the bucket).
>
>
> Raymond

> _______________________________________________
> Python-Dev mailing list
> Pytho...@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

One question Raymond.

The compression ratios stay true provided you don't overallocate entry
list. If you do overallocate you don't really gain that much (it all
depends vastly on details), or even loose in some cases. What do you
think should the strategy be?

Christian Tismer

unread,

May 15, 2013, 7:32:39 AM5/15/13

to Raymond Hettinger, Python Dev

Hi Raymond,

What is the current status of this discussion?
I'd like to know whether it is a considered alternative implementation.

There is also a discussion in python-ideas right now where this
alternative is mentioned, and I think especially for small dicts
as **kwargs, it could be a cheap way to introduce order.

Is this going on, somewhere? I'm quite interested on that.

cheers - chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Stefan Drees

unread,

May 15, 2013, 8:01:31 AM5/15/13

to Christian Tismer, pytho...@python.org

Hi Chris,

On 15.05.13 13:32 Christian Tismer wrote:
> Hi Raymond,
>
> On 08.01.13 15:49, Maciej Fijalkowski wrote:
>> On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger
>> <raymond....@gmail.com> wrote:
>>> The current memory layout for dictionaries is
>>> unnecessarily inefficient. It has a sparse table of
>>> 24-byte entries containing the hash value, key pointer,
>>> and value pointer.
>>>

>>> ...

>>
>
> What is the current status of this discussion?
> I'd like to know whether it is a considered alternative implementation.
>
> There is also a discussion in python-ideas right now where this
> alternative is mentioned, and I think especially for small dicts
> as **kwargs, it could be a cheap way to introduce order.
>
> Is this going on, somewhere? I'm quite interested on that.

+1 I am also interested on the status. Many people seemed to have copied
the recipe from the activestate site (was it?) but I wonder if it maybe
was to cool to be progressed into "the field" or simply some
understandable lack of resources?

All the best,
Stefan

Christian Tismer

unread,

May 15, 2013, 8:36:35 AM5/15/13

to ste...@drees.name, pytho...@python.org

On 15.05.13 14:01, Stefan Drees wrote:
> Hi Chris,
>
> On 15.05.13 13:32 Christian Tismer wrote:
>> Hi Raymond,
>>
>> On 08.01.13 15:49, Maciej Fijalkowski wrote:
>>> On Mon, Dec 10, 2012 at 3:44 AM, Raymond Hettinger
>>> <raymond....@gmail.com> wrote:
>>>> The current memory layout for dictionaries is
>>>> unnecessarily inefficient. It has a sparse table of
>>>> 24-byte entries containing the hash value, key pointer,
>>>> and value pointer.
>>>>
>>>> ...
>>>
>>
>> What is the current status of this discussion?
>> I'd like to know whether it is a considered alternative implementation.
>>
>> There is also a discussion in python-ideas right now where this
>> alternative is mentioned, and I think especially for small dicts
>> as **kwargs, it could be a cheap way to introduce order.
>>
>> Is this going on, somewhere? I'm quite interested on that.
>
> +1 I am also interested on the status. Many people seemed to have
> copied the recipe from the activestate site (was it?) but I wonder if
> it maybe was to cool to be progressed into "the field" or simply some
> understandable lack of resources?
>

Right, found the references:
http://mail.python.org/pipermail/python-dev/2012-December/123028.html
http://stackoverflow.com/questions/14664620/python-dictionary-details
http://code.activestate.com/recipes/578375-proof-of-concept-for-a-more-space-efficient-faster/?in=user-178123

cheers - chris

--
Christian Tismer :^) <mailto:tis...@stackless.com>
Software Consulting : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/
14482 Potsdam : PGP key -> http://pgp.uni-mainz.de
phone +49 173 24 18 776 fax +49 (30) 700143-0023
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Maciej Fijalkowski

unread,

May 15, 2013, 10:31:40 AM5/15/13

to Christian Tismer, <python-dev@python.org>

> http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

I implemented one for pypy btw (it's parked on a branch for messiness reasons)

Cheers,
fijal

Raymond Hettinger

unread,

May 19, 2013, 1:27:36 AM5/19/13

to Christian Tismer, Python Dev

On May 15, 2013, at 4:32 AM, Christian Tismer <tis...@stackless.com> wrote:

What is the current status of this discussion?
I'd like to know whether it is a considered alternative implementation.

As far as I can tell, I'm the only one working on it (and a bit slowly at that).

My plan is to implement it for frozensets to see how it works out.

Frozensets are a nice first experiment for several reasons:

* The current implementation is cleaner than dictionaries

(which have become more complicated due to key-sharing).

* It will be easy to benchmark (by racing sets vs frozen sets)

for an apples-to-apples comparison.

* There is no need to have a list-like over-allocation scheme

since frozensets can't grow after they are created.

That will guarantee a significant space savings and

it will simplify the coding.

* I wrote the code for setobject.c so I know all the ins-and-outs.

There is also a discussion in python-ideas right now where this
alternative is mentioned, and I think especially for small dicts
as **kwargs, it could be a cheap way to introduce order.

The compaction of keys and values into a dense array was

intended to save space, improve cache performance, and

improve iteration speed. The ordering was just a side-effect

and one that is easily disturbed if keys ever get deleted.

So a compacted dict might be a cheap way to introduce order

for kwargs, but it would need special handling if the user decided

to delete keys.

BTW, I'm +1 on the idea for ordering keyword-args. It makes

it easier to debug if the arguments show-up in the order they

were created. AFAICT, no purpose is served by scrambling them

(which is exacerbated by the new randomized hashing security feature).

Raymond

Maciej Fijalkowski

unread,

May 19, 2013, 10:09:01 AM5/19/13

to Raymond Hettinger, Christian Tismer, Python Dev

The completely ordered dict is easy to get too - you mark deleted
entries instead of removing them (then all the keys are in order) and
every now and then you just compact the whole thing by removing all
the delted entries, presumably on the resize or so.

Cheers,
fijal

Serhiy Storchaka

unread,

Dec 31, 2014, 8:14:06 AM12/31/14

to pytho...@python.org

On 10.12.12 03:44, Raymond Hettinger wrote:
> The current memory layout for dictionaries is
> unnecessarily inefficient. It has a sparse table of
> 24-byte entries containing the hash value, key pointer,
> and value pointer.
>

> Instead, the 24-byte entries should be stored in a
> dense table referenced by a sparse table of indices.

FYI PHP 7 will use this technique [1]. In conjunction with other
optimizations this will decrease memory consumption of PHP hashtables up
to 4 times.

[1] http://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html

_______________________________________________
Python-Dev mailing list
Pytho...@python.org

https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Maciej Fijalkowski

unread,

Jan 1, 2015, 8:54:18 AM1/1/15

to Serhiy Storchaka, <python-dev@python.org>

On Wed, Dec 31, 2014 at 3:12 PM, Serhiy Storchaka <stor...@gmail.com> wrote:
> On 10.12.12 03:44, Raymond Hettinger wrote:
>>
>> The current memory layout for dictionaries is
>> unnecessarily inefficient. It has a sparse table of
>> 24-byte entries containing the hash value, key pointer,
>> and value pointer.
>>
>> Instead, the 24-byte entries should be stored in a
>> dense table referenced by a sparse table of indices.
>
>
> FYI PHP 7 will use this technique [1]. In conjunction with other
> optimizations this will decrease memory consumption of PHP hashtables up to
> 4 times.

"up to 4 times" is a bit of a stretch, given that most of their
savings come from:

* saving on the keeping of ordering
* other optimizations in zval

None of it applies to python

PHP does not implement differing sizes of ints in key dict, which
makes memory saving php-only (if we did the same thing as PHP, we
would save more or less nothing, depending on how greedy you are with
the list overallocation)

We implemented the same strategy in PyPy as of last year, testing it
to become a default "dict" and "OrderedDict" for PyPy in the next
release.

Cheers,
fijal

PS. I wonder who came up with the idea first, PHP or rhettinger and
who implemented it first (I'm pretty sure it was used in hippy before
it was used in Zend PHP)

Armin Rigo

unread,

Jan 3, 2015, 11:46:18 AM1/3/15

to Maciej Fijalkowski, Serhiy Storchaka, <python-dev@python.org>

Hi all,

On 1 January 2015 at 14:52, Maciej Fijalkowski <fij...@gmail.com> wrote:
> PS. I wonder who came up with the idea first, PHP or rhettinger and
> who implemented it first (I'm pretty sure it was used in hippy before
> it was used in Zend PHP)

We'd need to look more in detail to that question, but a quick look
made me find this Java code from 2012:

https://code.google.com/r/wassermanlouis-guava/source/browse/guava/src/com/google/common/collect/CompactHashMap.java?name=refs/remotes/gcode-clone/compact-maps

which implements almost exactly the original idea of Raymond. (It has
a twist because Java doesn't support arrays of (int, int, Object,
Object), and instead encodes it as one array of long and one array of
Objects. It also uses a chain of buckets instead of open addressing.)