RavenDB Time Series Webinar recording (and benchmark results)
The RavenDB Time Series webinar is now available, and as usual, I would love your feedback. This webinar had the most questions yet, including a few curve balls that I had to field midway through. It was fun.
I also gave some bad information during the webinar, and I want to apologize for that. I mentioned some benchmark results and it appear that I didn’t wait for the complete work to be done.
Using the time series benchmark set, RavenDB can get some really nice numbers. All the numbers were run on i3.xlarge machine:
The first benchmark was for a single value, across 100 different time series. RavenDB is actually faster here than the other two contenders combined. This is the most common scenario that we envisioned, and it was heavily optimized.
For the other scenarios (100 time series, with 10 measurements on each tick and 4,000 time series with 10 measurements with each tick) RavenDB does very nicely. It is significantly faster than InfluxDB, but not as fast as TimescaleDB in this scenario.
That said, when it comes to queries, we have a whole different ballgame. Here we have the following scenarios:
single-groupby-1-1-1 | Simple aggregate (MAX) on one metric for 1 host, every 5 mins for 1 hour |
single-groupby-1-1-12 | Simple aggregate (MAX) on one metric for 1 host, every 5 mins for 12 hours |
single-groupby-1-8-1 | Simple aggregate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour |
Here we are running this on a small data set, with 100 time series with a single measurement. You see it properly, RavenDB is able to run this fast enough that we cannot measure the speed of the query.
When we have 10 measurements per time series, we put a little more effort of RavenDB, but it is still either the fastest or very nearly so.
When we increase the size of the data to 4,000 time series each with 10 measurements per tick, we see that RavenDB’s performance is effectively constant.
This is because RavenDB is being smart about how it runs queries and is able to do a lot of work upfront, significantly improving query time.
And this is before we gotten to the fact that you can run your indexes on time series data as well. Take a look at the webinar recording, I think you’ll be impressed.
Comments
Those charts look nice, but numbers without unit mean nothing. Please add units to Y axis.
Jesus,
For ingest, the value is inserts / sec. - Higher is better For queries, the value is ms for the query to complete. Lower is better.
Regarding the "sparse time series" question, perhaps this was the scenario being referred to:
https://stackoverflow.com/questions/60504825/working-with-sparse-timeseries-data-in-influxdb
Peter,
If that is the case, then the answer is that you'll want to put the data in. If the data is the same as the last time, it is actually cheaper to record that rather than skipping a value.
That said, maybe you have sensors that record only when things change, and you want to artificially insert values in specific times?
Not sure what "cheaper" you mean exactly, but in our datawarehouse (sql) for certain tables the DBAs decided to only store a row when there is a change from the previous day. For example daily snapshots of all customer stock holdings - a few million rows each day. I assume they did it to save disk space in the SAN. The outcome is that querying is much more complex and slow, especially ad-hoc ones that don't have the optimum index.
Peter,
Cheaper as in, queries are faster and storage is mostly negligible. Just for reference, let's say that you had a time series of a stock market, covering 100,000 symbols. You had a data feed giving you their values once per 5 seconds. Assuming that the data does not change often for any particular symbol, how much would it cost to store it in RavenDB timeseries. Storing 17,280 samples (once every 5 sec) would cost you ~8KB.
As you mentioned, figuring out the values after the fact will probably be harder. You can do that, by specifying a repeating period. Here is a blog post that shows the technique: https://ayende.com/blog/190818-C/dealing-with-sparse-values-in-ravendb-timeseries?key=71dbcade680f486f985a78b4b183d770
Comment preview