Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 79

filter by tags archive

Time series feature designThe wire format

time to read 10 min | 1835 words

Now that we actually have the storage interface nailed down, let us talk about how we are going to actually expose this over the wire. The storage interface we defined is really low level and not something that I would really care to try working with if I was writing user level code.

Because we are in the business of creating databases, we want to allow this to run as a server, not just as a library. So we need to think about how we are going to expose this over the wire.

There are several options here. We can expose this as a ZeroMQ service, sending messages around. This looks really interesting, and would certainly be a great research to do, but it is a foreign concept for most people, and it would have to be exposed as such through any API we use, so I am afraid we won’t be doing that.

We can write a custom TCP protocol, but that has its own issues. Chief among which, just like ZeroMQ, we would have to deal, from scratch with things like:

  • Authentication
  • Tracing
  • Debugging
  • Hosting
  • Monitoring
  • Easily “hackable” format
  • Replayability

And probably a whole lot more on top of that.

I like building the wire interface over some form of HTTP interface (I intentionally don’t use the overly laden REST word). We get a lot of things for free, in particular, we get easy ability to do things like pipe stuff through Fiddler so we can easily debug them. That also influence decisions such as how to design the wire format itself (hint, human readable is good).

I want to be able to easily generate requests and see them in the browser. I want to be able to read the fiddler output and figure out what is going on

We actually have several different requirements that we need to do.

  • Enter data
  • Query data
  • Do operations

Entering data is probably the most interesting aspect. The reason is that we need to handle inserting many entries in as efficient a manner as possible. That means reduce the number of server round trips. At the same time, we want to keep other familiar aspects, such as the ability to easily predict when the data is “safe” and the storage transaction has been committed to disk.

Querying data and handling one off operations is actually much easier. We can handle it via a simple REST interface:

  • /timeseries/query?start=2013-01-01&end=2014-01-01&id=sensor1.heat&id=sensor1.flow&aggregation=avg&period=weekly
  • /timeseries/delete?id=sensor1.heat&id=sensor1.flow&start=2013-01-01&end=2014-01-01
  • /timeseries/tag?id=sensor1.heat&tag=temp:C

Nothing much to it, so far. We expose it as JSON endpoints, and that is… it.

For entering data, we need something a bit more elaborate. I chose to use websockets for this, mostly because I want two way communication and that pretty much force my hand. The reason that I want to use two way communication mechanism is that I want to enable the following story:

  • The client send the data to the server as fast as it is able to.
  • The server will aggregate the data and will decide, base on its own consideration, when to actually commit the data to disk.
  • The client need some way of knowing when the data it just sent is actually flushed so it can rely on it.
  • the connection should stay open (potentially for a long period of time) so we can keep sending new stuff in without having to create a new connection.

As far as the server is concerned, by the way, it will decide to flush the pending changes to disk whenever:

  • over 50 ms passed waiting for the next value to arrive, or
  • it has been at least 500 ms from the last flush, or
  • there are over 16,384 items pending, or
  • the client indicated that it wants to close the connection, or
  • the moon is full on a Tuesday

However, how does the server communicate that to the client?

The actual format we’ll use is something like this (binary, little endian format):

   1: enter-data-stream =  
   2:    data-packet+
   3:  
   4: data-packet = 
   5:    series-name-len [4 bytes]
   6:    series-name [series-name-len, utf8]
   7:    entries
   8:    no-entries
   9:  
  10: entries =
  11:    count [2 bytes]
  12:    entry[count]
  13:  
  14: no-entries =
  15:    count [2 bytes] = 0
  16:  
  17: entry =
  18:    date [8 bytes]   
  19:    value [8 bytes]

The basic idea is that this format should be pretty good in conveying just enough information to send the data we need, and at the same time be easy to parse and work with. I am not sure if this is a really good idea, because this might be premature optimization and we can just send the data as json strings. That would certainly be more human readable. I’ll probably go with that and leave the “optimized binary approach” for when/if we actually see a real need for it. The general idea is that computer time is much cheaper than human time, so it is worth the time to make things human readable.

The server keep tracks of the number of entries that were sent to it for each connection, and whenever it is flushing the buffered data to disk, it will send notification to that effect to the client. The client can then decide “okay, I can notify my caller that everything that I have sent has been properly saved to disk”. Or it can just ignore it (usual case if they have a long running connection).

And that is about it as far as the wire protocol is concerned. Next, we will take a look at some of the operations that we need.

More posts in "Time series feature design" series:

  1. (04 Mar 2014) Storage replication & the bee’s knees
  2. (28 Feb 2014) The Consensus has dRafted a decision
  3. (25 Feb 2014) Replication
  4. (20 Feb 2014) Querying over large data sets
  5. (19 Feb 2014) Scale out / high availability
  6. (18 Feb 2014) User interface
  7. (17 Feb 2014) Client API
  8. (14 Feb 2014) System behavior
  9. (13 Feb 2014) The wire format
  10. (12 Feb 2014) Storage

Comments

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The industry at large - 9 minutes from now
  2. The insidious cost of allocations - about one day from now
  3. Buffer allocation strategies: A possible solution - 4 days from now
  4. Buffer allocation strategies: Explaining the solution - 5 days from now
  5. Buffer allocation strategies: Bad usage patterns - 6 days from now

And 2 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    01 Sep 2015 - The case of the lying configuration file
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats