Here is the original post, and now let us get down to solving it…
The key part of solving this issue is knowing that if you wait for the actual cluster change, there is very little that you can do. Technically speaking, if you have the keys, you can try to figure out all the ranges that fallunder the new nodes in the cluster, and then query on all those ranges. It will work, but anyone who answers that is going to be hit when there are multiple concurrent additions to the cluster. (Adding 1 node, then 3, then 5, etc). That is something that is incredibly common when you start going up, and if you having moved all the data yet, you don’t want to have to wait until you do that before you start responding to the current workload. There there are things like “what happens if we reboot midway through”, etc?
A much simpler alternative is to move some of the work to write time for each data item. We already need to compute which node a particular item is going to reside on, after all. What we are also going to compute is when that is going to change. And we are going to record that. When the cluster size grows above that size, we can simple query for all the data items that are going to be moved when the cluster size is higher. This way, we gain a couple of interesting properties.
We don’t need to worry about adding multiple nodes concurrently, just doing “WHERE NeedToMoveWhenSizeIsGreaterThan < :ClusterSize” is going to be enough to find all the data items that needs to be moved, it is resilient to restarts / errors, and can gracefully handle the multiple moves scenario.
Oh, and how to find the next cluster size where this particular data item is going to have to move? Well, I got to let the future candidates with googling skillz something to actually handle.