Inventory management in MongoDB: A design philosophy I find baffling
I’m reading MongoDB in Action right now. It is an interesting book and I wanted to learn more about the approach to using MongoDB, rather then just be familiar with the feature set and what it can do. But this post isn’t about the book, it is about something that I read, and as I was reading it I couldn’t help but put down the book and actually think it through.
More specifically, I’m talking about this little guy. This is a small Ruby class that was presented in the book as part of an inventory management system. In particular, this piece of code is supposed to allow you to sell limited inventory items and ensure that you won’t sell stuff that you don’t have. The example is that if you have 10 rakes in the stores, you can only sell 10 rakes. The approach that is taken is quite nice, by simulating the notion of having a document per each of the rakes in the store and allowing users to place them in their cart. In this manner, you prevent the possibility of a selling more than you actually have.
What I take strong issue with is the way this is implemented. MongoDB doesn’t have multi document transactions, but the solution presented requires it. Therefor, the approach outlined in the book is to try to build transactional semantics from the client side. I write databases for a living, and I find that concept utterly baffling. Clients shouldn’t try to do stuff like that, not only would they most likely get it wrong, but they’ll do that extremely inefficiently.
Let us consider the following tidbit of code:
The idea here is that the fetcher is supposed to be able to atomically add the products to the order. If there aren’t enough available products to be added, the entire thing is supposed to be rolled back. As a business operation, this make a lot of sense. The actual implementation, however, made me wince.
What it does, if it was SQL, is the following:
I intentionally used SQL here, both to simplify the issue for people who aren’t familiar with MongoDB and to explain the major dissonance that I have with this approach. That little add_to_cart call that we had earlier resulted in no less than eight network roundtrips. That is in the happy case. There is also the failure mode to consider, which involved resetting all the work done so far.
The thing that really bothers me is that I can’t believe that this is something that you’ll actually want to do except as an intellectual exercise. I mean, sure, how we can pretend to get transactions from non transactional store is interesting, but given the costs of doing this or the possibility of failure or the fact that this is a non atomic state transition or… you get my point, right?
In the case of this code, the whole process is non atomic. That means that outside observers can see the changes as they are happening. It also opens you up for a lot of Bad Stuff in terms of abusing the system. If the user is malicious, they can use the fact that this “transaction” is going to be running back and forth to the database (and thus taking a lot of time) and just open another tab to initiate an action while this is going on, resulting in operations on invalid state. In the example that the book give, we can use that to force purchases of invalid items.
If you think that this is unrealistic, consider this page, which talks about doing things like making money appear from thin air using just this sort of approaches.
Another thing that really bugged me about this code is that it has “error handling” I use that in quotes because it is like a security blanket for a 2 years old. Having it there might calm things up, but it doesn’t actually change anything. In particular, this kind of error handling looks right, but it is horribly broken if you consider what kind of actual errors can happen here. If the process running this code failed for any reason, the “transaction” is going to stay in an invalid state. It is possible that one of your rake will just disappear into thin air, as a result. It is supposed to be in someone’s cart, but it isn’t. The same can be the case if the server had an issue midway or just a regular network hiccup.
I’m aware that this is code that was written explicitly to be readable and easy to explain, rather then be able to withstand the vagaries of production, but still, this is a very dangerous thing to do.
As an aside, not quite related to the topic of this post, but one thing that really bugged me in the book so far is the number of remote requests that are commonly required to do things. Is there an assumption that the database in question is nearby or very cheap to access, because the entire design philosophy I use is to assume that going over the network is expensive, so let us give the users a lot of ways to reduce that cost. In contrast, at least in the book, there is a lot of stuff that is just making remote calls like there is a fire sale that will close in 5 minutes.
To be fair to the book, it notes that there is a possibility of failure here and explain how to handle one part of it (it missed the error conditions in the error handling) and call this out explicitly as something that should be done with consideration.