Friday, August 14, 2009

TypeError: dict objects are unhashable

My mom told me to update my blog. Hi mom.

I've been wanting to write this one for awhile anyway.

In retrospect it was rather naive - but I wonder who hasn't at one time tried to create a set of dictionaries:

|
>>> parts = [
|... {'id':1, 'desc':'widget', 'detail':'rear widget'},
|... {'id':1, 'desc':'widget', 'detail':'front widget'},
|... {'id':2, 'desc':'gear', 'size':4},
|... {'id':3, 'desc':'cog', 'type':'green'},
|... ]
|>>> myset = set(parts)
|Traceback (most recent call last):
| File "", line 1, in ?
|TypeError: dict objects are unhashable


What did I expect it to do? I think the first time I tried this it seemed more reasonable. I think my list of dictionaries actually contained exactly the same keys, with some exact duplicates and I needed to uniquify the list. I acctually ended up doing something like the example at the bottom...

Dict objects are not hashable, read about hash tables and sets if it isn't obvious why it's important that objects in a set support a __hash__ method.

A hash function is really simple idea:
two equal objects MUST return the same hash
two un-equal objects should RARELY return the same key

But there's no really obvious reasonable way to implement hash on a dictionary of arbitrary keys and values.

Here's a couple dumb ideas for adding a hash function to a dict:


|def __hash__(self):
| key = 0

|def __hash__(self):
| id = 0
| for pair in sorted(self.items()):
| id += hash(pair)
| return hash(id)


And they "work" - at least in the sense that they remove the error:

|
>>> class Part(dict):
|... def __hash__(self):
|... return self['id']
|...
|>>> myset = set([Part(x) for x in parts])
|>>> for part in myset:
|... print part
|...
|{'desc': 'widget', 'detail': 'rear widget', 'id': 1}
|{'desc': 'gear', 'id': 2, 'size': 4}
|{'type': 'green', 'id': 3, 'desc': 'cog'}
|{'desc': 'widget', 'detail': 'front widget', 'id': 1}


And that's great, if my parts list contained EXACTLY equal dictionaries they would be removed - I could turn it back into a list an continue on with everything uniquified! It might be worth nothing that the two parts with 'id' = 1 were not considered equal just because they returned the same hash. When there is a hash collision, the inhereited __eq__() method recognized that 'rear widget' != 'left widget' and that the two parts were distinct.

But what's really interesting what I've done by making a mutable object hashable...

It can be used as a dictionary key with surprisingly bad results:

|>>> part.__class__ # part is an instance of my Part class
|&ltclass '__main__.Part'>
|>>> part # a mutable object with a hash function
|{'desc': 'widget', 'detail': 'front widget', 'id': 1}
|>>> mydict = {} # mydict is a plain dictionary
|>>> mydict[part] = 1 # i can use part as a key!
|>>> part['id'] = 2 # I then modify the key
|>>> mydict[part] # there is no value assigned to this new "modified" key
|Traceback (most recent call last):
|File "", line 1, in ?
| KeyError: {'desc': 'widget', 'detail': 'front widget', 'id': 2}
|>>> mydict # or is there?
|{{'desc': 'widget', 'detail': 'front widget', 'id': 2}: 1}

Here's a fairly reasonable attempt at making a custom class that is mostly mutable dictionary, but has a safe and reasonable hash function. I'll also over-ridden __eq__ to ignore minor differences in the 'detail' between objects.


|>>> class Part(dict):
|... def __init__(self, part_dict):
|... if 'id' not in part_dict:
|... raise TypeError("Parts must have an id")
|... dict.__init__(self, part_dict)
|... def __setitem__(self, key, value):
|... if key == 'id':
|... raise ValueError("Part id's can't change - create a new part")
|... return dict.__setitem__(self, key, value)
|... def __hash__(self):
|... return self['id']
|... def __eq__(self, other):
|... a = self.copy()
|... del a['detail']
|... b = other.copy()
|... del b['detail']
|... return a == b
|...
|>>> for part in set([Part(x) for x in parts]):
|... print part
|...
|{'desc': 'widget', 'detail': 'rear widget', 'id': 1}
|{'desc': 'gear', 'id': 2, 'size': 4}
|{'type': 'green', 'id': 3, 'desc': 'cog'}

So if parts is a list of build materials, and you wanted to know how many distinct parts it takes to build this thing... the above Part class might be the right track. Aside from the ambiguity of added by over-riding "==" like that... dose anyone see any other problems with this?

3 comments:

John said...

If you think you must use dicts, check out http://johnandkaren.com/deep_update.txt and see if it works for you in this case.

Otherwise, I think it would make a lot more sense to just write your own Part class, inherited from object instead of dict.

clayg said...

If your Parts have a small set of a predefined object attributes - there may not be need to use a dictionary. Granted, the "parts" example given is pretty trivial.

But why would re-implementing the behavior of dict that you want to keep (__setitem__, __getitem__, keys, __repr__, items, etc.) make make "more sense" than subclassing dict and overriding the behaviors you want to change?

Either way, you'd end up re-implement your own equality check and hash method.

I think what was bothering me was the equality over-ride. __eq__ should be explicit and also invertible. I should have done it like this:

|def __eq__(self, other):
| if isinstance(other, self.__class__):
| return self['id'] == other['id']
| else:
| return False
|def __ne__(self, other):
| return not self == other

Thanks for the feedback!

Unknown said...

thank you, it helped me