Reference equality for dictionaries in Python

time to read 1 min | 149 words

Implementing a unit of work in Python can be an interesting challenge. Consider the following code:

class Holder(object):
def __init__(self):
self.items = dict()
def try_set(self, key, name):
if key in self.items:
return
self.items[key] = name
def try_get(self, key):
if key in self.items:
return self.items[key]
return None
view raw Holder.py hosted with ❤ by GitHub

This is about as simple a code as possible, to associate a tag to an object, right?

However, this code will fail for the following scenario:

@dataclass
class Item:
name: str
holder = Holder()
cup = Item(name="Cup")
holder.try_set(cup, "cups/1")
view raw cups.py hosted with ❤ by GitHub

You’ll get a lovely: “TypeError: unhashable type: 'Item'” when you try this. This is because data classes in Python has a complicated relationship with __hash__().

An obvious solution to the problem is to use:

class Holder(object):
def __init__(self):
self.items = dict()
# this is bad
def try_set(self, key, name):
if id(key) in self.items:
return
self.items[id(key)] = name
# this is bad
def try_get(self, key):
if id(key) in self.items:
return self.items[id(key)]
return None
view raw Holder2.py hosted with ❤ by GitHub

However, the id() in Python is not guaranteed to be unique. Consider the following code:

cup = Item(name="Cup")
print(id(cup))
cup = None
cup = Item(name="Cup")
print(id(cup)) # different instance
view raw opps.py hosted with ❤ by GitHub

On my machine, running this code gives me:

124597181219840
124597181219840

In other words, the id has been reused. This makes sense, since this is just the pointer to the value. We can fix that by holding on to the object reference, like so:

class RefEq(object):
def __init__(self, ref):
self.ref = ref
def __eq__(self, other):
if id(self.ref) == id(other):
return True
if not isinstance(other, RefEq):
return False
return id(self.ref) == id(other.ref)
def __hash__(self):
return id(self.ref)
class Holder(object):
def __init__(self):
self.items = dict()
def try_set(self, key, name):
if RefEq(key) in self.items:
return
self.items[RefEq(key)] = name
def try_get(self, key):
if RefEq(key) in self.items:
return self.items[RefEq(key)]
return None
view raw Holder3.py hosted with ❤ by GitHub

With this approach, we are able to implement proper reference equality and make sure that we aren’t mixing different values.