It is easy to forget that a database isn’t just about storing and retrieving data. A lot of work goes into the actual behavior of the system beyond the storage aspect.
In the case of time series database, just storing the data isn’t enough, we very rarely actually want to access all the data. If we have a sensor that send us a value once a minute, that comes to 43,200 data points per month. There is very little that we actually want to do for that amount of data. Usually we want to do things over some rollup of the data. For example, we might want to see the mean per day, or the standard deviation on a weakly basis, etc.
We might also want to do some down sampling. By that I mean that we take a series whose value is stored on a per minute / second basis and we want to store just the per day totals and delete the old data to save space.
The reason that I am using time series data for this series of blog posts is that there really isn’t all that much that you can do for a time series data, to be honest. You store it, aggregate over it, and… that is about it. Users might be able to do derivations on top of that, but that is out of scope for a database product.
Can you think about any other behaviors that the system needs to provide?
More posts in "Time series feature design" series:
- (04 Mar 2014) Storage replication & the bee’s knees
- (28 Feb 2014) The Consensus has dRafted a decision
- (25 Feb 2014) Replication
- (20 Feb 2014) Querying over large data sets
- (19 Feb 2014) Scale out / high availability
- (18 Feb 2014) User interface
- (17 Feb 2014) Client API
- (14 Feb 2014) System behavior
- (13 Feb 2014) The wire format
- (12 Feb 2014) Storage