Reference equality for dictionaries in Python
Implementing a unit of work in Python can be an interesting challenge. Consider the following code:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
class Holder(object): def __init__(self): self.items = dict() def try_set(self, key, name): if key in self.items: return self.items[key] = name def try_get(self, key): if key in self.items: return self.items[key] return None
This is about as simple a code as possible, to associate a tag to an object, right?
However, this code will fail for the following scenario:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
@dataclass class Item: name: str holder = Holder() cup = Item(name="Cup") holder.try_set(cup, "cups/1")
You’ll get a lovely: “TypeError: unhashable type: 'Item'” when you try this. This is because data classes in Python has a complicated relationship with __hash__().
An obvious solution to the problem is to use:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
class Holder(object): def __init__(self): self.items = dict() # this is bad def try_set(self, key, name): if id(key) in self.items: return self.items[id(key)] = name # this is bad def try_get(self, key): if id(key) in self.items: return self.items[id(key)] return None
However, the id() in Python is not guaranteed to be unique. Consider the following code:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
cup = Item(name="Cup") print(id(cup)) cup = None cup = Item(name="Cup") print(id(cup)) # different instance
On my machine, running this code gives me:
124597181219840
124597181219840
In other words, the id has been reused. This makes sense, since this is just the pointer to the value. We can fix that by holding on to the object reference, like so:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
class RefEq(object): def __init__(self, ref): self.ref = ref def __eq__(self, other): if id(self.ref) == id(other): return True if not isinstance(other, RefEq): return False return id(self.ref) == id(other.ref) def __hash__(self): return id(self.ref) class Holder(object): def __init__(self): self.items = dict() def try_set(self, key, name): if RefEq(key) in self.items: return self.items[RefEq(key)] = name def try_get(self, key): if RefEq(key) in self.items: return self.items[RefEq(key)] return None
With this approach, we are able to implement proper reference equality and make sure that we aren’t mixing different values.
Comments
Comment preview