Ayende @ Rahien

My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969


Posts: 5,947 | Comments: 44,541

filter by tags archive

Time series feature designClient API

We have gone over the system behavior, the wire protocol and how we actually store the data on disk. Now, let us talk about the actual client API. The entry point is going to the TimeSeries class, which will have the following behavior:

Stateless operations:

  • Queries:
    • timeSeries.Query(“sensor1.heat”, “sensor1.flow”)
         .Aggergation(AggergateBy.Max, AggergateBy.Min, AggergateBy.Mean);
    • timeSeries.SeriesBy(“temp:C”);
  • Operations:
    • timeSeries.Delete(“sensor1.heat”, start, end);
    • timeSeries.Tag(“sensor1.heat”, “temp:C”);

Those types of operations have no state, require nothing beyond just knowing where the server is located and can be immediately executed without requiring any external state. The returned results aren’t tracked or managed  by us in any way, so there is no need for a session. 

Stateful operation - The only stateful operation we have (at least so far) is adding data to the database. We do that using the connection abstraction. This is very close to the actual on the wire representation, which is always good. We have something like:

   1: using(var con = timeSeries.OpenConnection(waitForServerFlush: true))
   2: {
   3:     using(var series = con.AddToSeries("sensor1.heat"))
   4:     {
   5:         for(var i = 0; i < 100; i++) 
   6:         {
   7:             series.Add(time.AddMinutes(i), value + i);
   8:         }
   9:     }
  10: }

This is a bit of an awkward API, but it serves a purpose, it is very close to the way the on-wire format is, and it is optimized for performance, not for being nice.

We can also have:

con.Add(“sensor1.heat”, time, value);

But if you are mixing things up (add sensor1.heat, sensor1.flow and then sensor1.heat again, etc), it probably won’t be as efficient. (It is important to be able to expose those optimizations all the way from the disk to the wire to the client API. Most times, they don’t matter, which is why we have the higher level API, but when they do, they really do.

And… this is pretty much it.

The API will probably be an async one, to keep up with the times, but those are pretty much the high level things that we have here.

More posts in "Time series feature design" series:

  1. (04 Mar 2014) Storage replication & the bee’s knees
  2. (28 Feb 2014) The Consensus has dRafted a decision
  3. (25 Feb 2014) Replication
  4. (20 Feb 2014) Querying over large data sets
  5. (19 Feb 2014) Scale out / high availability
  6. (18 Feb 2014) User interface
  7. (17 Feb 2014) Client API
  8. (14 Feb 2014) System behavior
  9. (13 Feb 2014) The wire format
  10. (12 Feb 2014) Storage


Khalid Abuhakmeh

The client API for tags doesn't make a lot of sense to me. You will rarely ever add tags on the fly, instead you are likely to just add all of them at the same time at creation time. "timeSeries.Tags.Add(key, value)" or "timeSeries.Tags.AddRange(Dictionary)". You said "This is a bit of an awkward API", well there is no need to be that awkward :)

Ayende Rahien

Khalid, You cannot assume that users will first create the series, then add values. Instead, we allow to create the series implicitly by just creating it.

The awkward client API I was refering to was the batch operation stuff.

Khalid Abuhakmeh

I see what you are saying, you want series to "come online" even if they were never explicitly created. I guess to me it would still be useful not to have the string "temp:C" as a tag, but instead have a key value pair of "temp" and "C".

You might also want a metadata dictionary on a time series that you could pull and use for processing. Things you might store in metadata include Coordinates, sensor Id (External Database Id), Owner, Etc... You might never group by them, but you might pull the metadata and do something with the data.

A scenario for metadata is "A series hit a threshold, now notify the customer that they hit it."

Overall I like the API and it looks promising. Interested to read the rest of the posts.

Ayende Rahien

Khalid, It is much easier to handle tags if they are just arbitrary strings that the user brings meaning to. With conventions of temp:C, temp:F, etc.

The problem with "metadata" is that the moment you start doing that, you are starting to talk about doing more and more complex things. In that rate, put the actual series behavioral aspects in RavenDB document, and just use the timeseries for the time series stuff.

Juan Lopes

Hi, Ayende.

I work at a brazilian company called Intelie (http://www.intelie.com/en/), and most of my work is to write an event processing language that does exactly what you are doing by writing an internal DSL.

This language is written in Java and unfortunatelly is closed source but the main idea is to chain processing steps. The syntax looks like:

type:sensor => avg(temp1), avg(temp2) by sensor every week

Basically: lucene-ish query [=> transfomration or aggregation]*

It's just the basic syntax, it has many more useful constructions. But our language is specialized in dealing with realtime data, so we have some builtins to deal with output rate vs time window.

Also, the whole language is designed to be distributed, with the aggregations storing not only its result, but also the information needed to merge it with other nodes results.

If you wish, I can provide you with more details.


Juan, be careful with announcing publicly that you're ready to give out company product details. Someone might overreact. Maybe it's not a secret, but you never know..

David Cuccia

Typo: aggregate/aggregation

Juan Lopes

Rafal, it's not a secret. In fact, talking about it is one of the company's objectives. I've been giving talks about it for the last year (all in portuguese, unfortunately). If you're going to CeBIT this year, some of my colleagues will be there also talking about it (we're one the CODE_n finalists).

We really want to opensource it, but still struggling with proper documentation and licensing.


ok, looks like your company is not a corporation ;)


Are you guys planning to provide an OPC HDA driver for ravendb timeseries data?

Ayende Rahien

Hpcd, I don't understand what this is, so I don't know. Reading about it, this looks like a DCOM interface, which should be doable, but is probably complex / hard. I am not sure how worth it this would be.


Hi Ayende:

Well, I am reading that you are implementing a timeseries--historian type functionality in RavenDB--which is great!

In process industry (Oil/gas, wood, chemical, etc), a whole lot has been invested in the OPC standard for communicating data from devices. Typically, you would expose your ravendb as an OPC HDA/DA server, and then SCADA and other software can connect to it and expose the data or write data. Sometimes, the facility to execute C# client api might not be available:

Check http://www.opcfoundation.org/. Most mainstream historians have HDA server for exposing their historians. Makes it more competitive.

I know their is timeseries data for more business data, etc; but having OPC hda interface to your ravendb historian, would be appealing to process industry.


I don't know what was the inspiration for the ravendb timeseries db, but I assumed it to be process historian data--since I came from that background.

Anyways, its perhaps a whole other topic. Typical things some OPC hda servers will allow you to, is create virtual tags that generate historical computed data, as if it were coming from the device. So, as your "incoming", realtime device temperature and pressure changes, it triggers a computation on another virtual tag to store come relevant computed value--that is reported as if it were a tag from the device. These extra bits may not be part of the OPC HDA standard, but vendors add extra functionality to one up each other...

Ayende Rahien

hpcd, I am just talking about design principals at the moment, I am not really getting down to implemen tthis.

Beside, this looks expensive: http://www.advosol.com/pc-17-4-opc-hda-net-server-toolkit.aspx

At any rate, if/when we get around to doing this, that will be something to consider.

Ayende Rahien

Hpcd, How do you define/ create the computation?


The computation is defined in your OPC HDA server--by the user/engineer.

This OPC server can connecto your ravendb historian to get the current tags. It will need to have some UI to configure simple math relations.

It is not uncommon to have some scripting support with predefined functions, attached to your virtual tag.

Leaving OPC servers, etc aside. A useful feature for your api, would be to allow

a. Raw query b. Delta query c. Filter options for data quality.

For a., if a sensor produces 100 deg C value for 10000 samples, and then it changes to 102 deg C for one value, you will read 10001 values.

For b, you only read 2 values time1 and 100 deg, time2 and 102 deg.

Matt Johnson

Hi Oren. In regards to Rollup.Weekly, keep in mind that not everyone defines their start of week by the same criteria. For that matter, not everyone will agree on when the day begins and ends, not just because of time zones, but because a business day might roll over into a different calendar date depending on what kind of business you're operating. How will you tackle these issues in the time series API? Hopefully in some way that is highly configurable and/or extensible?

Ayende Rahien

Matt, Yes... you are quite correct. For that matter, it would be crazy to do this in a daily fashion as well. Considering things like daylight savings, etc.

Maybe I'll just do that on an hourly basis, you can define 24 hours, 168 hours, etc.

To my knowledge, we don't have a lot of issues with different definitions of what an hour is. And I don't care about Martian time

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats