One of the things that Voron does very well is the ability to read a lot of data fast. One of the interesting scenarios we deal with is when we want to deal with time series data.
For example, let us say that we have a bunch of sensors reporting on the temperature metrics within an area (said while the heaviest storm in 5 decades is blowing outside). Every minute, we have some data coming in. For fun, we will make the following assumptions:
- We have do deal with late writes (a sensor sending us updates from 1 hour ago because of communication update).
- Dates aren’t unique.
- All queries will take into account the dates.
First, let me show you the full code for that, then we can talk about how it works:
In line 14, we create the data tree, which will hold the actual time series data, and the last-key, which I’ll explain in a bit.
The AddRange method in line 23 is probably the most interesting. We create a key that is composed of the date of the entry, and an incrementing number. Note that we use big endian encoding because that allow easy byte string sorting. The implications of this sort of key is that the values are actually sorted by the date, but if we have multiple values for the same millisecond, we don’t overwrite the data. Along with adding the actual data, we record the change in the incrementing counter ,so if we need to restart, we’ll continue from where we left off.
Finally, we have the actual ScanRange method. Here we basically start from the minimum value for the start date, and set the MaxKey as the stop condition for the maximum value for the end date. And then it is just getting the values out.
Pretty simple, I think.