Large, interconnected, in memory model

time to read 5 min | 962 words

I got into an interesting discussion about Event Sourcing in the comments for a post and that was interesting enough to make a post all of its own.

Basically, Harry is suggesting (I’m paraphrasing, and maybe not too accurately) a potential solution to the problem of having the computed model from all the events stored directly in memory. The idea is that you can pretty easily get machines with enough RAM to store stupendous amount of data in memory. That will give you all the benefits of being able to hold a rich domain model without any persistence constraints. It is also likely to be faster than any other solution.

And to a point, I agree. It is likely to be faster, but that isn’t enough to make this into a good solution for most problems. Let me to point out a few cases where this fails to be a good answer.

If the only way you have to build your model is to replay your events, then that is going to be a problem when the server restarts. Assuming a reasonably size data model of 128GB or so, and assuming that we have enough events to build something like that, let’s say about 0.5 TB of raw events, we are going to be in a world of hurt. Even assuming no I/O bottlenecks, I believe that it would be fair to state that you can process the events at a rate of 50 MB/sec. That gives us just under 3 hours to replay all the events from scratch. You can try to play games here, try to read in parallel, replay events on different streams independently, etc. But it is still going to take time.

And enough time that this isn’t a good technology to have without a good backup strategy, which means that you need to have at least a few of these machines and ensure that you have some failover between them. But even ignoring that, and assuming that you can indeed replay all your state from the events store, you are going to run into other problems with this kind of model.

Put simply, if you have a model that is tens or hundreds of GB in size, there are two options for its internal structure. On the one hand, you may have a model where each item stands on its own, with no relations to other items. Or if there are any relations to other items, they are well scoped to the a particular root. Call it the Root Aggregate model, with no references between aggregates. You can make something like that work, because you have a good isolation between the different items in memory, so you can access one of them without impacting another. If you need to modify it, you can lock it for the duration, etc.

However, if your model is interconnected, so you may traverse between one Root Aggregate to another, you are going to be faced with a much harder problem.

In particular, because there are no hard breaks between the items in memory, you cannot safely / easily mutate a single item without worrying about access from another item to it. You could make everything single threaded, but that is a waste of a lot of horsepower, obviously.

Another problem with in memory models is that they don’t do such a good job of allowing you to rollback operations. If you run your code mutating objects and hit an exception, what is the current state of your data?

You can resolve that. For example, you can decide that you have only immutable data in memory and replace that atomically. That… works, but it requires a lot of discipline and make it complex to program against.

Off the top of my head, you are going to be facing problems around atomicity, consistency and isolation of operations. We aren’t worried about durability because this is purely in memory solution, but if we were to add that, we would have ACID, and that does ring a bell.

The in memory solution sounds good, and it is usually very easy to start with, but it suffer from major issues when used in practice. To start with, how do you look at the data in production? That is something that you do surprisingly often, to figure out what is going on “behind the scenes”. So you need some way to peek into what is going on. If your data is in memory only, and you haven’t thought about how to explore it to the outside, your only option is to attach a debugger, which is… unfortunate. Given the reluctance to restart the server (startup time is high) you’ll usually find that you have to provide some scripting that you can run in process to make changes, inspect things, etc.

Versioning is also a major player here. Sooner or later you’ll probably put the data inside a memory mapped to allow for (much) faster restarts, but then you have to worry about the structure of the data and how it is modified over time.

None of the issues I have raised is super hard to figure out or fix, but in conjunction? They turn out to be a pretty big set of additional tasks that you have to do just to be in the same place you were before you started to put everything in memory to make things easier.

In some cases, this is perfectly acceptable. For high frequency trading, for example, you would have an in memory model to make decisions on as fast as possible as well as a persistent model to query on the side. But for most cases, that is usually out of scope. It is interesting to write such a system, though.