Let’s talk about Gary, and Gary’s Shoes. Gary runs a chain of shoes stores across the nation. As part of refreshing their infrastructure, Gary want to update all the software across the entire chain. The idea is to have a unified billing, inventory, sales and time tracking for the entire chain.
Gary doesn’t spend a lot of time on this (after all, he has to sell shoes), he just installed a sync service between all the stores and HQ to sync up all the data. Well, I call in sync service. What it actually turn out to be is that the unified system is a set of Excel files on a shared DropBox folder.
Feel free to go and wash your face, have a drink, take Xanax. I know this might be a shock to see something like this.
Surprisingly enough, this isn’t the topic of my post. Instead, I want to talk about data ownership here.
Imagine that one of Gary’s stores in Chicago sold a bunch of shoes, then issued an invoice to the customer. They dutifully recorded the order in the Orders.xlsx file with the status “Pending Payment”.
That customer, however, accidently sent the check to the wrong store. No biggie, right? The clerk at the second store can just go ahead and update the order in the shared system, marking it as “Paid in full”.
As it turns out, this is a problem. And the easiest way to explain why is data ownership. The owner of this particular record is the original store. You might say that this doesn’t matter, after all, the change happened in the same system. But the problem is that this is almost always not the case.
In addition to the operation “system” that you can see on the right, there are other things. The store manager still have a PostIt note to call that customer and ask about the missing payment. The invoice that was generated need to be closed, etc. Just updating it in the system isn’t going to cause all of that to happen.
The proper way to handle that is to call the owner of the data (the original store) and let them know that the check arrived to the wrong store. At this point, the data owner can decide how to handle that new information, apply whatever workflows need to be done, etc.
I intentionally used what looks like a toy example, because it is easy to get bogged down in the details. But in any distributed system, there are local processes that happen which can be quite important. If you go ahead and update their information behind their back, you are guaranteed to break something. And I haven’t even began to talk about the chance for conflicts… of course.