Self reordering list in Python

Laszlo Zsolt Nagy

unread,

Sep 15, 2005, 9:14:09 AM9/15/05

to pytho...@python.org

Hello,

Do you know how to implement a really efficient self reordering list in
Python? (List with a maximum length. When an item is processed, it
becomes the first element in the list.) I would like to use this for
caching of rendered images. Of course I could implement this in pure
Python, I just wonder if there is a faster implementation that uses some
cool feature of the standard library. (Maybe a C implementation could be
added to the collections module?)

Les

Thomas Guettler

unread,

Sep 15, 2005, 11:00:39 AM9/15/05

to

Hi,

Maybe the bisect module is what you need:

"This module provides support for maintaining a list in sorted order
without having to sort the list after each insertion."

HTH,
Thomas

--
Thomas Güttler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
Spam Catcher: niemand....@thomas-guettler.de

Fredrik Lundh

unread,

Sep 15, 2005, 11:21:51 AM9/15/05

to pytho...@python.org

Laszlo Zsolt Nagy wrote:

> Do you know how to implement a really efficient self reordering list in
> Python? (List with a maximum length. When an item is processed, it
> becomes the first element in the list.) I would like to use this for
> caching of rendered images.

did you check the cheeseshop?

http://www.python.org/pypi/lrucache/0.2

</F>

Jules Dubois

unread,

Sep 16, 2005, 1:36:26 AM9/16/05

to

On Thursday 15 September 2005 07:14, Laszlo Zsolt Nagy
<gan...@geochemsource.com>
(<mailman.421.1126790...@python.org>) wrote:

> Do you know how to implement a really efficient self reordering list in
> Python?

Yes.

> (List with a maximum length. When an item is processed, it
> becomes the first element in the list.)

Search for "move to front list". If you want a much bigger improvement in
speed -- O(N * log N) instead of O(N**2) -- search for "splay tree"; of
course, the algorithm is more complex and the optimized algorithm is even
more complex.

> Of course I could implement this in pure Python, I just wonder if there is
> a faster implementation that uses some cool feature of the standard
> library. (Maybe a C implementation could be added to the collections
> module?)

Yes, you could write a C-language extension for more speed.

Kay Schluehr

unread,

Sep 16, 2005, 2:38:41 AM9/16/05

to

Laszlo Zsolt Nagy wrote:
> Hello,
>
> Do you know how to implement a really efficient self reordering list in
> Python? (List with a maximum length. When an item is processed, it
> becomes the first element in the list.) I would like to use this for
> caching of rendered images.

I wonder why you don't use a dictionary? The only time I used a
move-front algorithm I stored algorithms and had to evaluate a
condition to select the correct algo. That means no constant search key
was available for accessing the correct one. In case of an image list I
would implement a self-ordering list if I have to do some pattern
recognition in order to select the correct image. Holding a reference
as a search key a Python hash-table will always have a better average
time complexity no matter which language is used to implement
move-front. In either way I'd use Python as an implementation language.

Kay

Laszlo Zsolt Nagy

unread,

Sep 16, 2005, 7:03:33 AM9/16/05

to Kay Schluehr, pytho...@python.org

>I wonder why you don't use a dictionary? The only time I used a
>move-front algorithm I stored algorithms and had to evaluate a
>condition to select the correct algo. That means no constant search key
>was available for accessing the correct one. In case of an image list I
>would implement a self-ordering list if I have to do some pattern
>recognition in order to select the correct image. Holding a reference
>as a search key a Python hash-table will always have a better average
>time complexity no matter which language is used to implement
>move-front.
>

You are right in that holding a reference will have a better time
complexity. But holding a reference makes it impossible to free the
object. :-) As I said, my list has a maximum length. I just can't store
any number of images in memory. I need to keep only the most frequently
used ones. I do not know which ones will be used the most frequently,
this is why I need a self reordering list. Accessing an element makes it
"more imporant" than the others. I already implemented this in Python
and it was ten times faster compared to the original version. (200
frames per sec instead of 20 fps)

Probably my problem was a "no-problem". I realized that it does not
matter how fast is my list. The most time consuming part is still the
rendering of the images that are not in the cache. I need to speed up
rendering and have more RAM, of course. :-)

Les

Terry Hancock

unread,

Sep 16, 2005, 11:58:48 AM9/16/05

to pytho...@python.org

On Friday 16 September 2005 06:03 am, Laszlo Zsolt Nagy wrote:
> You are right in that holding a reference will have a better time
> complexity. But holding a reference makes it impossible to free the
> object. :-) As I said, my list has a maximum length. I just can't store
> any number of images in memory. I need to keep only the most frequently
> used ones. I do not know which ones will be used the most frequently,
> this is why I need a self reordering list. Accessing an element makes it
> "more imporant" than the others. I already implemented this in Python
> and it was ten times faster compared to the original version. (200
> frames per sec instead of 20 fps)

This is actually the use-case for an "LRU cache"="least recently used
cache", which is probably already implemented in Python somewhere (or
even as a fast extension). I'd do a google search for it. It almost
certainly would use a dictionary interface within Python, ISTM.

Cheers,
Terry

--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks http://www.anansispaceworks.com

Fredrik Lundh

unread,

Sep 16, 2005, 12:38:58 PM9/16/05

to pytho...@python.org

Terry Hancock wrote:

> This is actually the use-case for an "LRU cache"="least recently used
> cache", which is probably already implemented in Python somewhere (or
> even as a fast extension). I'd do a google search for it.

reposted, in case there are more people who cannot be bothered
to read other replies before posting:

http://www.python.org/pypi/lrucache/0.2

(it's a straight-forward and rather flexible implementation, which uses
a dictionary to quickly find existing objects, and a heap to keep track
of access times. if you have more control over the stuff that goes into
the cache, you can strip things down to 10-20 lines of code and code
it in about the same time it took me to write this post)

</F>

ABO

unread,

Sep 27, 2005, 4:41:33 AM9/27/05

to

LRU caches are nice and simple, but if you want something fancier, with
support for squid-like expiry models (ie, using mtime and atime to
estimate a "stale time", and IMS fetches), you can have a look at my
GCache;

http://minkirri.apana.org.au/~abo/projects/GCache

Even if you don't want something that fancy, it uses a PQueue priority
queue to achieve exactly what you want. I provide a pure-python
implementation using bisect, but recommend a C extension module with
the same name by Andrew Snare which uses a fibonacci heap
(python-pqueue is the Debian Package).

Note that python 2.3 introduced a heapq module that does for queue
lists what bisect does for heap lists. I am planning to modify my
PQueue to use it instead of bisect.

ABO

unread,

Sep 27, 2005, 1:57:59 PM9/27/05

to

Actually, after posting this I did some more work on the PQueue modules
I had, implementing both bisect and heapq versions. It turns out the
bisect version is heaps faster if you ever delete or re-set values in
the queue.

The problem is heapq is O(N) for finding a particular entry in the
Queue, and any time you change or remove something from it you need to
heapify it again, which is also O(N).

Andew Snare has a C PQueue extension module that is heaps faster from
all angles. It uses a fibonacci heap and gets O(lg N) deletes and
re-sets. I think it does this by using the dict to speed finding
entries it in the heap, and uses some properties of the heap to "assign
lower" efficiently.

The queue used in the lrucache will also suffer from the O(N) problem
when deleting or reseting values in the cache.

zooko

unread,

Sep 30, 2005, 5:54:08 AM9/30/05

to

I've implemented such an LRU Cache in Python. My technique was to
weave a doubly-linked list into the dict, so that it is O(dict) for all
LRU operations. I benchmarked it against someone's Python-list-based
implementation from the ActiveState cookbook and noted that on my
machine the better constant factors of the Python list win out when the
list is cache contains fewer than about 16000 elements. Of course,
once you exceed that cross-over point, the asymptotically worse
behavior of the list-based implementation becomes a big factor. If you
have more than 16000 or so elements then you really oughtn't use a
list-based LRU cache.

http://zooko.com/repos/pyutil/pyutil/pyutil/cache.py

I haven't benchmarked it against Evan Podromou's heap implementation
yet, but obviously inserting and removing things from a heapq heap is
O(N).

You can find unit tests and benchmarking tools in the pyutil/test
directory.

Regards,

Zooko

P.S. I read this list sporadically, so if you want me to read your
response, please Cc: zo...@zooko.com. Thanks.

Paul Rubin

unread,

Sep 30, 2005, 6:09:33 AM9/30/05

to

"zooko" <zo...@zooko.com> writes:
> I haven't benchmarked it against Evan Podromou's heap implementation
> yet, but obviously inserting and removing things from a heapq heap is
> O(N).

Good heavens, I should hope not. The whole point of heaps is that
those operations are O(log(N)).

zooko

unread,

Sep 30, 2005, 12:17:49 PM9/30/05

to

Oh, you are right -- Python's heapq implementation does not naively do
list.pop(0).

How silly of me to think that Kevin O'Connor, Tim Peters, and Raymond
Hettinger would be so silly.

Regards,

Zooko