Ayende @ Rahien

My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 5,949 | Comments: 44,548

filter by tags archive

Time series feature designThe wire format

Now that we actually have the storage interface nailed down, let us talk about how we are going to actually expose this over the wire. The storage interface we defined is really low level and not something that I would really care to try working with if I was writing user level code.

Because we are in the business of creating databases, we want to allow this to run as a server, not just as a library. So we need to think about how we are going to expose this over the wire.

There are several options here. We can expose this as a ZeroMQ service, sending messages around. This looks really interesting, and would certainly be a great research to do, but it is a foreign concept for most people, and it would have to be exposed as such through any API we use, so I am afraid we won’t be doing that.

We can write a custom TCP protocol, but that has its own issues. Chief among which, just like ZeroMQ, we would have to deal, from scratch with things like:

  • Authentication
  • Tracing
  • Debugging
  • Hosting
  • Monitoring
  • Easily “hackable” format
  • Replayability

And probably a whole lot more on top of that.

I like building the wire interface over some form of HTTP interface (I intentionally don’t use the overly laden REST word). We get a lot of things for free, in particular, we get easy ability to do things like pipe stuff through Fiddler so we can easily debug them. That also influence decisions such as how to design the wire format itself (hint, human readable is good).

I want to be able to easily generate requests and see them in the browser. I want to be able to read the fiddler output and figure out what is going on

We actually have several different requirements that we need to do.

  • Enter data
  • Query data
  • Do operations

Entering data is probably the most interesting aspect. The reason is that we need to handle inserting many entries in as efficient a manner as possible. That means reduce the number of server round trips. At the same time, we want to keep other familiar aspects, such as the ability to easily predict when the data is “safe” and the storage transaction has been committed to disk.

Querying data and handling one off operations is actually much easier. We can handle it via a simple REST interface:

  • /timeseries/query?start=2013-01-01&end=2014-01-01&id=sensor1.heat&id=sensor1.flow&aggregation=avg&period=weekly
  • /timeseries/delete?id=sensor1.heat&id=sensor1.flow&start=2013-01-01&end=2014-01-01
  • /timeseries/tag?id=sensor1.heat&tag=temp:C

Nothing much to it, so far. We expose it as JSON endpoints, and that is… it.

For entering data, we need something a bit more elaborate. I chose to use websockets for this, mostly because I want two way communication and that pretty much force my hand. The reason that I want to use two way communication mechanism is that I want to enable the following story:

  • The client send the data to the server as fast as it is able to.
  • The server will aggregate the data and will decide, base on its own consideration, when to actually commit the data to disk.
  • The client need some way of knowing when the data it just sent is actually flushed so it can rely on it.
  • the connection should stay open (potentially for a long period of time) so we can keep sending new stuff in without having to create a new connection.

As far as the server is concerned, by the way, it will decide to flush the pending changes to disk whenever:

  • over 50 ms passed waiting for the next value to arrive, or
  • it has been at least 500 ms from the last flush, or
  • there are over 16,384 items pending, or
  • the client indicated that it wants to close the connection, or
  • the moon is full on a Tuesday

However, how does the server communicate that to the client?

The actual format we’ll use is something like this (binary, little endian format):

   1: enter-data-stream =  
   2:    data-packet+
   4: data-packet = 
   5:    series-name-len [4 bytes]
   6:    series-name [series-name-len, utf8]
   7:    entries
   8:    no-entries
  10: entries =
  11:    count [2 bytes]
  12:    entry[count]
  14: no-entries =
  15:    count [2 bytes] = 0
  17: entry =
  18:    date [8 bytes]   
  19:    value [8 bytes]

The basic idea is that this format should be pretty good in conveying just enough information to send the data we need, and at the same time be easy to parse and work with. I am not sure if this is a really good idea, because this might be premature optimization and we can just send the data as json strings. That would certainly be more human readable. I’ll probably go with that and leave the “optimized binary approach” for when/if we actually see a real need for it. The general idea is that computer time is much cheaper than human time, so it is worth the time to make things human readable.

The server keep tracks of the number of entries that were sent to it for each connection, and whenever it is flushing the buffered data to disk, it will send notification to that effect to the client. The client can then decide “okay, I can notify my caller that everything that I have sent has been properly saved to disk”. Or it can just ignore it (usual case if they have a long running connection).

And that is about it as far as the wire protocol is concerned. Next, we will take a look at some of the operations that we need.

More posts in "Time series feature design" series:

  1. (04 Mar 2014) Storage replication & the bee’s knees
  2. (28 Feb 2014) The Consensus has dRafted a decision
  3. (25 Feb 2014) Replication
  4. (20 Feb 2014) Querying over large data sets
  5. (19 Feb 2014) Scale out / high availability
  6. (18 Feb 2014) User interface
  7. (17 Feb 2014) Client API
  8. (14 Feb 2014) System behavior
  9. (13 Feb 2014) The wire format
  10. (12 Feb 2014) Storage



If you want to avoid creating a connection, why are you creating it? I mean, is connection necessary when you only want to send 16 bytes of information?

Ayende Rahien

Rafal, A large part of the time is spent in just establishing the connection. In TCP, you have to do the 3 way dance, in HTTP, you have to do auth from scratch, etc. If you keep the stream open, you can bypass a lot of those costs.


I read everywhere that connection should be opened at the last time and closed asap. How will you manage when you'll have 500 client connected to your server ? And is "human readable" really an advantage here ? Ticks are not human readable anyway.

Ayende Rahien

Remi, Human readable is a huge advantage in things like debugging, devops, troubelshooting, etc. The actual on disk format is rarely of interest. But the actual wire format is essential for human readable.

And 500 clients what? In this case, we are actually talking about persistent connections per server. It is unlikely that you'll have direct connections from 500 machines. You usually go through a web server for that. And even if you do, so what? It isn't hard to hold that many connections.


My hint was to use a connectionless protocol instead of TCP and its handshake overhead.

Ayende Rahien

Rafal, Connectionless protocol. You mean UDP? Or Raw sockets?

In both cases, you need to have some way of getting replies back, to ensure that the server got your values. It also means that you need to handle retries, etc.

It is also far harder to support & debug. It is also far more likely to have issues with firewalls, resulting in needing to support & debug.

Jahmai Lay

What about XMPP?




@Ayende, @Rafal, not to mention network congestion.

Ayende Rahien

Jahmai, That is over TCP, so there isn't really any additional benefit here that I can see.


I understand the ease of debugging via the wire with a human readable format, intercepting the message as it moves across the HTTP wire. I have two problems with this approach. First, you sacrifice performance by serializing and deserializing binary data to readable text at client and then server, respectively. Sometimes this is a huge cost. Second, my personal opinion is that debugging on the wire is the wrong place to debug. You're not in the code on the client or the server, so you really cannot diagnose the problem. The only thing you can do is determine if your serialization is working and whether your message is correct. And I would argue that this is better done on the client or the server, rather than in the middle. Just my two cents.



Just to make things interesting... what would be the conditions where you want to use raw TCP to expose a server capability?

Kijana Woodard

A thought.

What if the wire format is binary, and a fiddler add-on is used to decode it to a human readable format.

Ayende Rahien

Tyler, Sure, there is some cost associated with debuggability. But being able to observe requests & response tells you a LOT about the behavior of the system. It is also very easily done.

You can always add an optimized binary protocol later, but starting with text is the way to go.

For that matter, this web page was served using a human readable protocol.

Ayende Rahien

Patrick, Proven performance optimization would do that. If it was significant enough.

Kijana, A fiddler add-on would do it, but it would have to be good reason. No need to tack on needing to support something else if we have something that already works.


@Tyler, the wire in absolutely one of the most important place to debug, it's an integration point and thus will need the bulk of debugging facilities there.

Integration testing often cost up to 80% of all testing/debugging time, over the full lifecycle of a product, and for good reason.

In contrast, UT and UAT are often fire and forget, you invest a lot of initial time on them, but after that it's pretty much done (all green/user approval).


Ayende and Karhgath, I guess we'll have to agree to disagree on this one. For me, it seems that observing requests and responses is more properly done at one end or the other with logging. Debugging, at least for me, connotes being able to observe execution flow, including the use of break points and watching specific values. The value of observing data across the wire in human readable format when I have proper logging enabled and switchable, seems redundant and of little value, constraining your choices as to wire format. And with respect to tacking on an optimized binary protocol later, it has been my experience that such intentions are rarely ever realized.


You should go with ZeroMQ. 1. Performance is crazy. 2. Separating data stream from acks stream is piece of cake, you just need to created more sockets. 3. Almost all buffering is handled for you (e.g, if you write 1M messages from the client, they are sent as batches on the wire). 4. The support for message parts allow you to create a binary format which is very fast and very simple to work with for the data insertion. 5. For the query interface, you could use json over the sockets. 6. Will be easy to scale to multiple machines. 7. Has support for UDP connections.

Ayende Rahien

Ohadr, Does it have a Fiddler like interface, so I can see the message exchanges?



For sending message, I simply use python. For diagnostics, There is currently no simple solution, although a wireshark plugin does comes to mind.

Anyway, since you are gonna be using a binary format (for the data, at least), Fiddler will not suit the use case either...

Also, they added auth & encryption in the latest version.

Ayende Rahien

Ohadr, You'll be surprised just how much you can figure out just from the request URI, when you are doing debugging. If there is no simple solution, there really isn't no option for me here. I consider this to be pretty much a top priority requirement.

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats