which data structure to use?

Robert Voigtländer

unread,

Jan 21, 2014, 6:17:43 AM1/21/14

to

Hi,

which would be the best data structure to use for the following case?

I have objects like this:

class Node(object):
def __init__(self, pos, parent, g , h):
self.pos = pos
self.parent = parent
self.g = g
self.h = h
self.f = g+h

I need to build a "list" of these objects. The amount is unknown.
On this list I need to regularly

1. check if a specific item - identified by Node.pos - is in the list.
2. find the object with the lowest Node.f attribute and update or remove it

What would be a good approach. Using a list I always need to traverse the whole list to do one of the above actions.

Thanks
Robert

Chris Angelico

unread,

Jan 21, 2014, 6:27:53 AM1/21/14

to pytho...@python.org

On Tue, Jan 21, 2014 at 10:17 PM, Robert Voigtländer
<r.voigt...@gmail.com> wrote:
> 1. check if a specific item - identified by Node.pos - is in the list.
> 2. find the object with the lowest Node.f attribute and update or remove it

Are both those values constant once the Node is added? If so, the
easiest way would be to maintain a dictionary mapping pos to the Node
(or to a list of Nodes, if you can have multiple with the same pos),
and probably heapq for the f values. But if they change, you'll have
to update both data structures. If they change often, it's probably
not worth maintaining index structures - just search for what you
want, when you want it. And if you're working with a small list (say,
less than a thousand items), performance isn't going to matter at all,
so just do whatever looks cleanest in your code - probably you have
that already.

ChrisA

Ben Finney

unread,

Jan 21, 2014, 6:34:08 AM1/21/14

to pytho...@python.org

Robert Voigtländer <r.voigt...@gmail.com> writes:

> which would be the best data structure to use for the following case?

First up, I want to compliment you on asking exactly the right question.
Getting the data structure right or wrong can often shape the solution
dramatically.

> I have objects like this:
>
> class Node(object):
> def __init__(self, pos, parent, g , h):
> self.pos = pos
> self.parent = parent
> self.g = g
> self.h = h
> self.f = g+h

It's important to note a distinction: The class ‘Node’ doesn't have
attributes ‘pos’, ‘f’, ‘g’, ‘h’, etc. Those attributes are assigned to
each instance when the instance is initialised.

> I need to build a "list" of these objects. The amount is unknown.

Any built-in Python collection type seems good so far.

> On this list I need to regularly
>
> 1. check if a specific item - identified by Node.pos - is in the list.
> 2. find the object with the lowest Node.f attribute and update or
> remove it

So, in neither of those cases is it correct to talk of ‘Node.pos’ or
‘Node.f’, since those are attributes of the class, and the attribute
names don't exist (unless there is other code which you're not showing
us). They'll be attributes of a specific instance of the class in each
case.

> What would be a good approach. Using a list I always need to traverse
> the whole list to do one of the above actions.

If the ‘pos’ attribute is unique for all Node instances – as implied by
your “indentified by” clause above – then you could maintain a mapping
from ‘pos’ values to the Node instances:

nodes = dict()

root_node = Node(pos='root', parent=None, g=object(), h=object())
nodes[root_node.pos] = root_node

foo_node = Node(pos='foo', parent=root_node, g=object(), h=object())
nodes[foo_node.pos] = foo_node

As for finding the node with the lowest value of an attribute, I'd
recommend you just use brute force until you get concrete measurements
showing that that specific operation is occupying too much time. Write
the code correct and readable first, before trying to optimise what you
don't need to yet.

--
\ “Value your freedom or you will lose it, teaches history. |
`\ “Don't bother us with politics,” respond those who don't want |
_o__) to learn.” —Richard Stallman, 2002 |
Ben Finney

Oscar Benjamin

unread,

Jan 21, 2014, 6:49:53 AM1/21/14

to pytho...@python.org

Is the order of the items in the list significant?

If not you might try using a modification of this sorted dict recipe:
http://code.activestate.com/recipes/576998-sorted-dictionary/

You would want to use node.pos as the key and node as the value but modify the
_sorted_list so that it sorts keys according to Node.f.

Strictly speaking the sorted dict above has an O(N) overhead for insertion and
removal and O(NlogN) for creation. However these particular big-O's are
handled quite efficiently by the sort(), list.insert() and list.remove()
functions so it depends how big the list is.

If that's not okay then you may want the sorteddict from the blist package on
PyPI:
http://stutzbachenterprises.com/blist/sorteddict.html

That would give you O(logN) insertion/removal. The problem is that the sort
key() function only gets to operate on the dict key not the value so you'd
have to do something pretty funky to make it work. Perhaps:

from blist import sorteddict

def my_sorteddict(*args, **kwargs):
# There's a good chance this doesn't work...
def keyfunc(dictkey):
return d[dictkey].f
d = sorteddict(keyfunc, *args, **kwargs)
return d

Oscar

Robert Voigtländer

unread,

Jan 21, 2014, 8:38:34 AM1/21/14

to

> On Tue, Jan 21, 2014 at 03:17:43AM -0800, Robert Voigtlï¿½nder wrote:
>

> > I have objects like this:
>
> >
>
> > class Node(object):
>
> > def __init__(self, pos, parent, g , h):
>
> > self.pos = pos
>
> > self.parent = parent
>
> > self.g = g
>
> > self.h = h
>
> > self.f = g+h
>
> >
>
> >
>
> > I need to build a "list" of these objects. The amount is unknown.
>
> > On this list I need to regularly
>
> >
>
> > 1. check if a specific item - identified by Node.pos - is in the list.
>
> > 2. find the object with the lowest Node.f attribute and update or remove it
>

First thanks to all who responded. Although I only partially understand your answers as I am a Python starter.
What's clear to me is the difference between object and instance of an object. Just didn't explain it well.

Maybe I give some more info on what I need / want to do. I will also provide a working code example. Should have done this before.

I would very much appreciate a working example of what you mean. Then I have a chance to understand it. :-)

I would like to implement a class for a A* pathfinding algorithm. (there are ready libraries out there but I would like to learn it myself) This requires to maintain a list of nodes to be processed and nodes already processed. For new nodes I need to check if they are on one of the lists. I also need to regularly pick the node with the lowest value f from the list.

Here some working code. For one function I sill need a solution. Any better solution is welcome.

Thanks
Robert

---------------
class Node:

def __init__(self, pos, parent, g , h):
self.pos = pos
self.parent = parent
self.g = g
self.h = h
self.f = g+h

def isinlist(nodeToSeatch):
for item in openlist:
if item.pos == nodeToSeatch: return True
return False

def lowestF():
lowestF = ''
for item in openlist:
if item.f < lowestF: lowestF = item
return lowestF

def deleteItemWithPos(pos):
## need this function or different approach
pass

openlist=[]
openlist.append(Node((1,1),None,1,5))
openlist.append(Node((1,2),(1,1),4,6))
openlist.append(Node((1,3),(1,2),9,10))

for item in openlist: print item.pos, item.f

print isinlist((1,1))
print isinlist((1,5))

nextNode = lowestF()
print nextNode.pos, nextNode.f

Robert Voigtländer

unread,

Jan 21, 2014, 8:43:31 AM1/21/14

to

Am Dienstag, 21. Januar 2014 14:38:34 UTC+1 schrieb Robert Voigtländer:
> > On Tue, Jan 21, 2014 at 03:17:43AM -0800, Robert Voigtlï¿½nder wrote:
>
> >
>
>
>
> > > I have objects like this:
>
> >
>
> > >
>
> >
>
> > > class Node(object):
>
> >
>
> > > def __init__(self, pos, parent, g , h):
>
> >
>
> > > self.pos = pos
>
> >
>
> > > self.parent = parent
>
> >
>
> > > self.g = g
>
> >
>
> > > self.h = h
>
> >
>
> > > self.f = g+h
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > I need to build a "list" of these objects. The amount is unknown.
>
> >
>
> > > On this list I need to regularly
>
> >
>
> > >
>
> >
>
> > > 1. check if a specific item - identified by Node.pos - is in the list.
>
> >
>
> > > 2. find the object with the lowest Node.f attribute and update or remove it
>
> >
>
>
>
>
>
> First thanks to all who responded. Although I only partially understand your answers as I am a Python starter.
>
> What's clear to me is the difference between object and instance of an object. Just didn't explain it well.
>
>
>
> Maybe I give some more info on what I need / want to do. I will also provide a working code example. Should have done this before.
>
>
>
> I would very much appreciate a working example of what you mean. Then I have a chance to understand it. :-)
>
>
>
> I would like to implement a class for a A* pathfinding algorithm. (there are ready libraries out there but I would like to learn it myself) This requires to maintain a list of nodes to be processed and nodes already processed. For new nodes I need to check if they are on one of the lists. I also need to regularly pick the node with the lowest value f from the list.
>
>
>
> Here some working code. For one function I sill need a solution. Any better solution is welcome.
>
>
>
> Thanks
>
> Robert

Sorry - found a bug right after my post.
Here the corrected version.

class Node:
def __init__(self, pos, parent, g , h):
self.pos = pos
self.parent = parent
self.g = g
self.h = h
self.f = g+h

def isinlist(nodeToSeatch):
for item in openlist:
if item.pos == nodeToSeatch: return True
return False

def lowestF():
lowestF = ''
for item in openlist:
if item.f < lowestF:

lowestF = item.f
lowestFItem = item
return lowestFItem

Oscar Benjamin

unread,

Jan 21, 2014, 8:59:06 AM1/21/14

to pytho...@python.org

The function above is incorrect. I think it should be:

def lowestF():
lowestF = ''
for item in openlist:
if item.f < lowestF:

lowestF = item.f # Note the .f here

return lowestF

>
> def deleteItemWithPos(pos):
> ## need this function or different approach
> pass

def deleteItemWithPos(pos):
for n, item in enumerate(openlist):
if item.pos == pos:
del openlist[n]
return
else:
raise RuntimeError('Not in the list!')

>
> openlist=[]
> openlist.append(Node((1,1),None,1,5))
> openlist.append(Node((1,2),(1,1),4,6))
> openlist.append(Node((1,3),(1,2),9,10))
>
> for item in openlist: print item.pos, item.f
>
> print isinlist((1,1))
> print isinlist((1,5))
>
> nextNode = lowestF()
> print nextNode.pos, nextNode.f

My first suggestion would be to use a dict instead of a list:

class Node:
def __init__(self, pos, parent, g , h):
self.pos = pos
self.parent = parent
self.g = g
self.h = h
self.f = g+h

def isinlist(pos):
return pos in opendict

def lowestF():
return min(opendict.values(), key=lambda x: x.f)

def deleteItemWithPos(pos):
del opendict[pos]

nodes = [
Node((1,1),None,1,5),
Node((1,2),(1,1),4,6),
Node((1,3),(1,2),9,10),
]

opendict = {}
for node in nodes:
opendict[node.pos] = node

for item in opendict.values():

print item.pos, item.f

print isinlist((1,1))
print isinlist((1,5))

nextNode = lowestF()
print nextNode.pos, nextNode.f

The above is more efficient and simpler. It is still O(N) for the lowestF()
function. Changing data structure could make that more efficient.

Oscar

Peter Otten

unread,

Jan 21, 2014, 9:09:31 AM1/21/14

to pytho...@python.org

def lowestF():
return min(openlist, key=operator.attrgetter("f"))

> def deleteItemWithPos(pos):
> ## need this function or different approach
> pass
>
> openlist=[]
> openlist.append(Node((1,1),None,1,5))
> openlist.append(Node((1,2),(1,1),4,6))
> openlist.append(Node((1,3),(1,2),9,10))
>
> for item in openlist: print item.pos, item.f
>
> print isinlist((1,1))
> print isinlist((1,5))
>
> nextNode = lowestF()
> print nextNode.pos, nextNode.f

Here is an OO implementation of Chris Angelico's suggestion:

import heapq

class Node:
def __init__(self, pos, parent, g , h):
self.pos = pos
self.parent = parent
self.g = g
self.h = h
self.f = g+h

def __str__(self):
return "Node(pos={!r}, f={!r})".format(self.pos, self.f)

class Nodes():
def __init__(self):
self.lookup = {}
self.heap = []
def add(self, node):
self.lookup[node.pos] = node
heapq.heappush(self.heap, (node.f, node))
def __iter__(self):
return iter(self.lookup.values())
def __contains__(self, pos):
return pos in self.lookup
def lowest(self):
return self.heap[0][1]
def pop(self):
f, node = heapq.heappop()
del lookup[node.pos]
return node

nodes = Nodes()
nodes.add(Node((1,1), None, 1, 5))
nodes.add(Node((1,2), (1,1), 4, 6))
nodes.add(Node((1,3), (1,2), 9, 10))

for node in nodes:
print(node)

print((1,1) in nodes)
print((1,5) in nodes)

print(nodes.lowest())

Peter Otten

unread,

Jan 21, 2014, 9:19:54 AM1/21/14

to pytho...@python.org

Peter Otten wrote:

> def pop(self):
> f, node = heapq.heappop()
> del lookup[node.pos]
> return node

That should be

def pop(self):
f, node = heapq.heappop(self.heap)
del self.lookup[node.pos]
return node

Robert Voigtländer

unread,

Jan 21, 2014, 10:33:20 AM1/21/14

to

Hi Peter,

this works great. I will try to find some info about the functionality you used and to understnad what you did.
Maybe you can add a function to remove a node?

Thanks for your support and all tthe swift answers.

Robert

Robert Voigtländer

unread,

Jan 21, 2014, 10:37:47 AM1/21/14

to

> > > def pop(self):
> > > f, node = heapq.heappop()
> > > del lookup[node.pos]
> > > return node

> > That should be
>
> > def pop(self):
>
> > f, node = heapq.heappop(self.heap)
> > del self.lookup[node.pos]
> > return node
>
> Hi Peter,
> this works great. I will try to find some info about the functionality you used and to understnad what you did.
> Maybe you can add a function to remove a node?
> Thanks for your support and all tthe swift answers.
>
> Robert

Just realized what the pop function is for. I changed it from:
def pop(self):
to
def pop(self,pos):

and it works.

Now I go on with trying to understand it.

Mark Lawrence

unread,

Jan 21, 2014, 1:03:18 PM1/21/14

to pytho...@python.org

On 21/01/2014 13:43, Robert Voigtländer wrote:

[double spaced google disease snipped]

I'm pleased to see the regular contributors helping out as usual. In
response would you please be kind enough to read and action this
https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the
double line spacing that google inserts, thanks.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Peter Otten

unread,

Jan 21, 2014, 1:39:33 PM1/21/14

to pytho...@python.org

Unlikely. Are you sure that .heap and .lookup contents are still in sync
with your modification?

> Now I go on with trying to understand it.

The pop() method as posted can only remove the "lowest" node. If contrary to
your initial spec

> 2. find the object with the lowest Node.f attribute and update or remove
it

you want to remove arbitrary nodes a heap may not be the appropriate data
structure. The modified

import operator

#...

class Nodes():
def __init__(self):
self.lookup = {}

def add(self, node):
self.lookup[node.pos] = node

def __iter__(self):
return iter(self.lookup.itervalues())

def __contains__(self, pos):
return pos in self.lookup
def lowest(self):

return min(self, key=operator.attrgetter("f"))
def __delitem__(self, pos):
del self.lookup[pos]

nodes = Nodes()

nodes.add(Node((1,1), None, 1, 5))
nodes.add(Node((1,2), (1,1), 4, 6))
nodes.add(Node((1,3), (1,2), 9, 10))

del nodes[1, 2]

allows you to delete random nodes, but the lowest() method will slow down as
it has to iterate over all dict values.

One important caveat: both Nodes classes lead to data corruption if you
modify a .pos attribute while the node is in a Nodes instance. The same goes
for the first implementation and the .f attribute.

Robert Voigtländer

unread,

Jan 22, 2014, 4:38:37 AM1/22/14

to

> Unlikely. Are you sure that .heap and .lookup contents are still in sync
> with your modification?

No it's not. Atfer having read about heapq it's clear why.
Thanks for the hint.

> allows you to delete random nodes, but the lowest() method will slow down as
> it has to iterate over all dict values.

Thanks a lot for the alternative class. I will go with two lists, each using one of the aproaches. In one list I always need to have only hte object with lowest .f.
With the other list I may have to delete random nodes.

So I can make perfect use of both classes.

> One important caveat: both Nodes classes lead to data corruption if you
> modify a .pos attribute while the node is in a Nodes instance. The same goes
> for the first implementation and the .f attribute.

That's fine. pos is static and identifies the node.

Thanks again to all who responded. I learned a lot.