Time series feature designClient API

time to read 5 min | 868 words

We have gone over the system behavior, the wire protocol and how we actually store the data on disk. Now, let us talk about the actual client API. The entry point is going to the TimeSeries class, which will have the following behavior:

Stateless operations:

  • Queries:
    • timeSeries.Query(“sensor1.heat”, “sensor1.flow”)
         .Range(start,end)
         .Rollup(Rollup.Weekly)
         .Aggergation(AggergateBy.Max, AggergateBy.Min, AggergateBy.Mean);
    • timeSeries.SeriesBy(“temp:C”);
  • Operations:
    • timeSeries.Delete(“sensor1.heat”, start, end);
    • timeSeries.Tag(“sensor1.heat”, “temp:C”);

Those types of operations have no state, require nothing beyond just knowing where the server is located and can be immediately executed without requiring any external state. The returned results aren’t tracked or managed  by us in any way, so there is no need for a session. 

Stateful operation - The only stateful operation we have (at least so far) is adding data to the database. We do that using the connection abstraction. This is very close to the actual on the wire representation, which is always good. We have something like:

   1: using(var con = timeSeries.OpenConnection(waitForServerFlush: true))
   2: {
   3:     using(var series = con.AddToSeries("sensor1.heat"))
   4:     {
   5:         for(var i = 0; i < 100; i++) 
   6:         {
   7:             series.Add(time.AddMinutes(i), value + i);
   8:         }
   9:     }
  10: }

This is a bit of an awkward API, but it serves a purpose, it is very close to the way the on-wire format is, and it is optimized for performance, not for being nice.

We can also have:

con.Add(“sensor1.heat”, time, value);

But if you are mixing things up (add sensor1.heat, sensor1.flow and then sensor1.heat again, etc), it probably won’t be as efficient. (It is important to be able to expose those optimizations all the way from the disk to the wire to the client API. Most times, they don’t matter, which is why we have the higher level API, but when they do, they really do.

And… this is pretty much it.

The API will probably be an async one, to keep up with the times, but those are pretty much the high level things that we have here.

More posts in "Time series feature design" series:

  1. (04 Mar 2014) Storage replication & the bee’s knees
  2. (28 Feb 2014) The Consensus has dRafted a decision
  3. (25 Feb 2014) Replication
  4. (20 Feb 2014) Querying over large data sets
  5. (19 Feb 2014) Scale out / high availability
  6. (18 Feb 2014) User interface
  7. (17 Feb 2014) Client API
  8. (14 Feb 2014) System behavior
  9. (13 Feb 2014) The wire format
  10. (12 Feb 2014) Storage