Will this series on using Distributed Hash Tables (DHT) ever end?
The previous ones are:
One of the problems of using a DHT is that the only access strategy that you have is the key. And while key based access is fast, it is also fairly limited. We can't perform queries on anything but the key. And that sucks. As it turn out, we can provide our own indexes by defining groups keyed by values.
Let us say that we need to find all the users which have not verified their email address. In SQL, we will write:
Using a DHT, it is slightly more complex than that. What we need to do is to create (and maintain, sadly) the index ourselves. In other words, here is the process of writing a User to the DHT:
PUT "User #15", user ADD_TO_LIST "User_EmailVerified_" + user.EmailVerified, "User #15"
Note that I added a new operation, add_to_list, which is usually does not exists in DHT, but will significantly simply the code, checkout the post about groups to see how we can implement that ourselves.
Note how we structure the name of the list. We will have two such lists in the system, "User_EmailVerified_True" and "User_EmailVerified_False". We can get the list of all the users that match that property by simply getting the list of values from the list and resolving each of the keys.
This is extremely simple approach, but it does come at a cost for managing that, and remembering that we need to manage that. One problem with this approach is that this is a highly specific one. We can lookup by range, only by specific value.
On my next post, I'll cover range queries.