Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Doing set operation on non-hashable objects

13 views
Skip to first unread message

5lvq...@sneakemail.com

unread,
Dec 24, 2008, 2:16:59 PM12/24/08
to
Hi,

I'm writing an application which is structured roughly as follows:

"db" is a dict, where the values are also dicts.
A function searches through db and returns a list of values, each of
which is a dict as described above.
I need to perform set operations on these lists (intersection and
union)
However the objects themselves are not hashable, and therefore can't
be in a set, because they are dicts.
I'm not sure how large each set will be, but the point is I'm trying
to avoid anything looking like an O(n^2) algorithm, so I can't just
use naive double-looping to check for intersection/union on a pair of
lists.

The only way I can think of to do this right is to hash the dicts by
freezing them, turning them all into tuples, which can then be
hashed. But this is a problem because more dicts might be nested
inside. At any rate, I'd need to thaw them out when I'm done and turn
them back into dicts, which seems complicated and annoying, and all
this leads me to believe there is a better way.

What I really need is a set of pointers, so at the end of the
operation I have the actual objects pointed to. Can I somehow use the
object IDs as set elements, then recreate a list with the actual
objects they point to?

How do you get the object back from an ID in python?

thanks
Michael

Scott David Daniels

unread,
Dec 24, 2008, 3:22:07 PM12/24/08
to
5lvq...@sneakemail.com wrote:
> ... "db" is a dict, where the values are also dicts.

> A function searches through db and returns a list of values, each of
> which is a dict as described above.
> I need to perform set operations on these lists (intersection and
> union)
> However the objects themselves are not hashable, and therefore can't
> be in a set, because they are dicts.
> I'm not sure how large each set will be, but the point is I'm trying
> to avoid anything looking like an O(n^2) algorithm, so I can't just
> use naive double-looping to check for intersection/union on a pair of
> lists.

Well, if the lists are ordered, you can do intersection and union
in O(n) time. If they are orderable, you can sort each first, giving
O(n log n). Note you can do lst.sort(key=id) which will sort on ids.

> What I really need is a set of pointers, so at the end of the
> operation I have the actual objects pointed to. Can I somehow use the
> object IDs as set elements, then recreate a list with the actual
> objects they point to?
> How do you get the object back from an ID in python?

You don't; you remember the mapping in a dictionary and look it up.
If there were a way to go from id to object, the whole idea of garbage
collection and reference counts would fly out the window, leading to
nasty crashes (or you might get to an object that is the re-used id of
an older object).

--Scott David Daniels
Scott....@Acm.Org

Gabriel Genellina

unread,
Dec 24, 2008, 3:21:12 PM12/24/08
to pytho...@python.org
En Wed, 24 Dec 2008 17:16:59 -0200, <5lvq...@sneakemail.com> escribió:

> I'm writing an application which is structured roughly as follows:
>
> "db" is a dict, where the values are also dicts.
> A function searches through db and returns a list of values, each of
> which is a dict as described above.
> I need to perform set operations on these lists (intersection and
> union)
> However the objects themselves are not hashable, and therefore can't
> be in a set, because they are dicts.

See this thread from last week:
http://groups.google.com/group/comp.lang.python/t/d6818ff713bd4421

> What I really need is a set of pointers, so at the end of the
> operation I have the actual objects pointed to. Can I somehow use the
> object IDs as set elements, then recreate a list with the actual
> objects they point to?

If you *only* care about object identity, you might use a dictionary that
only compares by identity to anyone else:

class nc_dict(dict):
"A hashable dictionary that isn't equal to anyone else"

def __eq__(self, other):
return cmp(id(self),id(other))==0

def __hash__(self):
return hash(id(self))

d1 = nc_dict(a=1, b=2, c=3)
d2 = nc_dict(b=2, c=0, d=4)
d3 = nc_dict(a=1, c=3, e=5)
dd1 = nc_dict(x=d1, y=d2)
dd2 = nc_dict(x=d1, y=d3)
dd3 = nc_dict(y=d2, z=d3, w=d1)
l1 = [dd1, dd2]
l2 = [dd2, dd3]
s1 = set(l1)
s2 = set(l2)
print s1-s2
print s2-s1
print s1&s2

# instances of nc_dict with the same contents are NOT equal
d4 = nc_dict(a=1, b=2, c=3)
print d1, d4, d1==d4 # output: False

# but we can use this function to compare contents
def cmp_contents(d1, d2):
try: return cmp(sorted(d1.items()), sorted(d2.items()))
except Exception: return 1 # assume they're unequal

print cmp_contents(d1,d4)==0 # output: True

> How do you get the object back from an ID in python?

You don't :)

--
Gabriel Genellina

Carl Banks

unread,
Dec 24, 2008, 3:52:02 PM12/24/08
to

Quick and dirty way is to reference the dicts with a lightweight
hashable object that uses the dict's identity. For instance:


class Identity(object):
def __init__(self,obj):
self.obj = obj
def __hash__(self):
return hash(id(self.obj))
def __eq__(self,other):
return self.obj is other.obj


Then, instead of "s.add(d)" use "s.add(Identity(d))".

Instead of "d = s.pop()" use "d = s.pop().obj".

Instead of "d in s" use "Identity(d) in s".

And so on.


To do it a bit better, you could create an IdentitySet class that
subclasses set and wraps the methods so that they automatically apply
wrap and unwrap the arguments on their way in and out (I'd bet there's
already a cookbook recipe to do that).


Carl Banks

5lvq...@sneakemail.com

unread,
Dec 26, 2008, 8:28:10 PM12/26/08
to

Thank you, I like this idea a lot. It's close to what I've been
thinking but a bit cleaner.

Michael

5lvq...@sneakemail.com

unread,
Dec 28, 2008, 3:44:51 AM12/28/08
to
> > ... "db" is a dict, where the values are also dicts.
> > A function searches through db and returns a list of values, each of
> > which is a dict as described above.
> > I need to perform set operations on these lists (intersection and
> > union)
> > However the objects themselves are not hashable, and therefore can't
> > be in a set, because they are dicts.
> > I'm not sure how large each set will be, but the point is I'm trying
> > to avoid anything looking like an O(n^2) algorithm, so I can't just
> > use naive double-looping to check for intersection/union on a pair of
> > lists.
>
> Well, if the lists are ordered, you can do intersection and union
> in O(n) time.  If they are orderable, you can sort each first, giving
> O(n log n).  Note you can do lst.sort(key=id) which will sort on ids.

The lists are not ordered, since the elements of the list could be
arbitrarily complex things consisting of resistors, capacitors, sub-
circuits, etc. I'm trying do a little EDA program, taking the best
parts from the different EDA/cad tools I've used. I finally came up
with an idea using dictionaries in a certain way, so I'd like to be
able to do set operators on arbitrary things that may themselves not
be hashable.

> > How do you get the object back from an ID in python?
>
> You don't; you remember the mapping in a dictionary and look it up.

Exactly. A mapping between the ID and the thing itself.

> If there were a way to go from id to object, the whole idea of garbage
> collection and reference counts would fly out the window, leading to
> nasty crashes (or you might get to an object that is the re-used id of
> an older object).

Yup, this makes perfect sense. It was rattling around in my head for
a bit afer I wrote the original post, then this makes sense.

>

5lvq...@sneakemail.com

unread,
Dec 28, 2008, 3:54:51 AM12/28/08
to
On Dec 24, 12:21 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:

> En Wed, 24 Dec 2008 17:16:59 -0200, <5lvqbw...@sneakemail.com> escribió:
>
> > I'm writing an application which is structured roughly as follows:
>
> > "db" is a dict, where the values are also dicts.
> > A function searches through db and returns a list of values, each of
> > which is a dict as described above.
> > I need to perform set operations on these lists (intersection and
> > union)
> > However the objects themselves are not hashable, and therefore can't
> > be in a set, because they are dicts.
>
>

Good idea. I'll try this out. I don't think it's likely that I'll
have a case where the dicts have identical values, but different
identities. And if they do I probably don't care too much.

Thanks


0 new messages