Time series feature designScale out / high availability
I expect (at a bare minimum), to be able to do about 25,000 sustained writes per second on a single machine. I actually expect to be able to do significantly more. But let us go with that amount for now as something that if it drops below that value, we are in trouble. Over a day, that is 2.16 billion records. I don’t think this likely, but that is a lot of data to work with.
That leads to interesting questions on scale out story. Do we need one? Well, probably not for performance reasons (said the person who haven’t written the code, much less benchmarked it) at least not if my hope for the actual system performance comes about.
However, just writing the data isn’t very interesting, we also need to read and work with it. One of the really nice thing about time series data is that it is very easily sharded, because pretty much all the operations you have are naturally accumulative. You have too much data for a single server, go ahead and split it. On query time, you can easily merge it all back up again and be done with it.
However, a much more interesting scenario for us is the high availability / replication story. This is where things gets… interesting. With RavenDB, we do async server side replication. With time series data, we can do that (and we have the big advantage of not having to worry about conflicts), but the question is how.
The underlying mechanism for all the replication in RavenDB is the notion of etags, of a way to track the order of changes to the database. In time series data, that means tracking the changes that happens in some form of a sane fashion.
It means having to track, per write, at least another 16 bytes. (8 bytes for a monotonically increasing etag number, another 8 for the time that was changed). And we haven’t yet spoken of things like replication of deletes, replication of series tag changes, etc. I can tell you that dealing with replication of deletes in RavenDB, for example, was no picnic.
A much simpler alternative would be to not handle this at the data level. One nice thing about Voron is that we can actually do log shipping. That would trivially solve pretty much the entire problem set of replication, because it would happen at the storage layer, and take care of all of it.
However, it does come with its own set of issues. In particular, it means that the secondary server has to be read only, it cannot accept any writes. And by that I mean that it would physically reject them. That leads to a great scale out story with read replicas, but it means that you have to carefully consider how you are dealing the case of the primary server going down for a duration, or any master/master story.
I am not sure that I have any good ideas about this at this point in time. I would love to hear suggestions.
More posts in "Time series feature design" series:
- (04 Mar 2014) Storage replication & the bee’s knees
- (28 Feb 2014) The Consensus has dRafted a decision
- (25 Feb 2014) Replication
- (20 Feb 2014) Querying over large data sets
- (19 Feb 2014) Scale out / high availability
- (18 Feb 2014) User interface
- (17 Feb 2014) Client API
- (14 Feb 2014) System behavior
- (13 Feb 2014) The wire format
- (12 Feb 2014) Storage