Here is another post in my series on using Distributed Hash Tables (DHT).
The previous ones are:
Now I want to talk about locality, and why it is important. First, the idea of locality is very simple. Put related items together, so getting them from the DHT can be done in a single call.
Let us say that we have the following objects in our application: User, Shopping Cart and Session. We can just put them in the DHT, and they may land wherever they want, but that is not the most optimized way to treat them. A DHT call is cheap, but it is still a remote call, and we really want to minimize that. Assuming that we understand the data access patterns in the application, we can do a bit better than making three remote calls to the DHT.
Just about any DHT will support the idea of multi key get, so we can ask the following:
GET 'User #1', 'User #1: Session', 'User #1: Shopping Cart'
We need to ensure that when we are writing to the DHT, we will understand that the meaningful key name for node selection is the part that comes before the colon. And the same for reading.
Now, reading all three items, is a single remote call. And that has significant performance implications. Note that you shouldn't rely on that too much, otherwise all your data will end up in a single node, but it is a good model to use in many cases, just don't go overboard with this.