Do you know how to implement a really efficient self reordering list in
Python? (List with a maximum length. When an item is processed, it
becomes the first element in the list.) I would like to use this for
caching of rendered images. Of course I could implement this in pure
Python, I just wonder if there is a faster implementation that uses some
cool feature of the standard library. (Maybe a C implementation could be
added to the collections module?)
Les
Hi,
Maybe the bisect module is what you need:
"This module provides support for maintaining a list in sorted order
without having to sort the list after each insertion."
HTH,
Thomas
--
Thomas Güttler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
Spam Catcher: niemand....@thomas-guettler.de
> Do you know how to implement a really efficient self reordering list in
> Python? (List with a maximum length. When an item is processed, it
> becomes the first element in the list.) I would like to use this for
> caching of rendered images.
did you check the cheeseshop?
http://www.python.org/pypi/lrucache/0.2
</F>
> Do you know how to implement a really efficient self reordering list in
> Python?
Yes.
> (List with a maximum length. When an item is processed, it
> becomes the first element in the list.)
Search for "move to front list". If you want a much bigger improvement in
speed -- O(N * log N) instead of O(N**2) -- search for "splay tree"; of
course, the algorithm is more complex and the optimized algorithm is even
more complex.
> Of course I could implement this in pure Python, I just wonder if there is
> a faster implementation that uses some cool feature of the standard
> library. (Maybe a C implementation could be added to the collections
> module?)
Yes, you could write a C-language extension for more speed.
I wonder why you don't use a dictionary? The only time I used a
move-front algorithm I stored algorithms and had to evaluate a
condition to select the correct algo. That means no constant search key
was available for accessing the correct one. In case of an image list I
would implement a self-ordering list if I have to do some pattern
recognition in order to select the correct image. Holding a reference
as a search key a Python hash-table will always have a better average
time complexity no matter which language is used to implement
move-front. In either way I'd use Python as an implementation language.
Kay
Probably my problem was a "no-problem". I realized that it does not
matter how fast is my list. The most time consuming part is still the
rendering of the images that are not in the cache. I need to speed up
rendering and have more RAM, of course. :-)
Les
This is actually the use-case for an "LRU cache"="least recently used
cache", which is probably already implemented in Python somewhere (or
even as a fast extension). I'd do a google search for it. It almost
certainly would use a dictionary interface within Python, ISTM.
Cheers,
Terry
--
Terry Hancock ( hancock at anansispaceworks.com )
Anansi Spaceworks http://www.anansispaceworks.com
> This is actually the use-case for an "LRU cache"="least recently used
> cache", which is probably already implemented in Python somewhere (or
> even as a fast extension). I'd do a google search for it.
reposted, in case there are more people who cannot be bothered
to read other replies before posting:
http://www.python.org/pypi/lrucache/0.2
(it's a straight-forward and rather flexible implementation, which uses
a dictionary to quickly find existing objects, and a heap to keep track
of access times. if you have more control over the stuff that goes into
the cache, you can strip things down to 10-20 lines of code and code
it in about the same time it took me to write this post)
</F>
http://minkirri.apana.org.au/~abo/projects/GCache
Even if you don't want something that fancy, it uses a PQueue priority
queue to achieve exactly what you want. I provide a pure-python
implementation using bisect, but recommend a C extension module with
the same name by Andrew Snare which uses a fibonacci heap
(python-pqueue is the Debian Package).
Note that python 2.3 introduced a heapq module that does for queue
lists what bisect does for heap lists. I am planning to modify my
PQueue to use it instead of bisect.
The problem is heapq is O(N) for finding a particular entry in the
Queue, and any time you change or remove something from it you need to
heapify it again, which is also O(N).
Andew Snare has a C PQueue extension module that is heaps faster from
all angles. It uses a fibonacci heap and gets O(lg N) deletes and
re-sets. I think it does this by using the dict to speed finding
entries it in the heap, and uses some properties of the heap to "assign
lower" efficiently.
The queue used in the lrucache will also suffer from the O(N) problem
when deleting or reseting values in the cache.
http://zooko.com/repos/pyutil/pyutil/pyutil/cache.py
I haven't benchmarked it against Evan Podromou's heap implementation
yet, but obviously inserting and removing things from a heapq heap is
O(N).
You can find unit tests and benchmarking tools in the pyutil/test
directory.
Regards,
Zooko
P.S. I read this list sporadically, so if you want me to read your
response, please Cc: zo...@zooko.com. Thanks.
Good heavens, I should hope not. The whole point of heaps is that
those operations are O(log(N)).
Oh, you are right -- Python's heapq implementation does not naively do
list.pop(0).
How silly of me to think that Kevin O'Connor, Tim Peters, and Raymond
Hettinger would be so silly.
Regards,
Zooko