Ayende @ Rahien

Refunds available at head office

Time series feature design: The wire format

Now that we actually have the storage interface nailed down, let us talk about how we are going to actually expose this over the wire. The storage interface we defined is really low level and not something that I would really care to try working with if I was writing user level code.

Because we are in the business of creating databases, we want to allow this to run as a server, not just as a library. So we need to think about how we are going to expose this over the wire.

There are several options here. We can expose this as a ZeroMQ service, sending messages around. This looks really interesting, and would certainly be a great research to do, but it is a foreign concept for most people, and it would have to be exposed as such through any API we use, so I am afraid we won’t be doing that.

We can write a custom TCP protocol, but that has its own issues. Chief among which, just like ZeroMQ, we would have to deal, from scratch with things like:

  • Authentication
  • Tracing
  • Debugging
  • Hosting
  • Monitoring
  • Easily “hackable” format
  • Replayability

And probably a whole lot more on top of that.

I like building the wire interface over some form of HTTP interface (I intentionally don’t use the overly laden REST word). We get a lot of things for free, in particular, we get easy ability to do things like pipe stuff through Fiddler so we can easily debug them. That also influence decisions such as how to design the wire format itself (hint, human readable is good).

I want to be able to easily generate requests and see them in the browser. I want to be able to read the fiddler output and figure out what is going on

We actually have several different requirements that we need to do.

  • Enter data
  • Query data
  • Do operations

Entering data is probably the most interesting aspect. The reason is that we need to handle inserting many entries in as efficient a manner as possible. That means reduce the number of server round trips. At the same time, we want to keep other familiar aspects, such as the ability to easily predict when the data is “safe” and the storage transaction has been committed to disk.

Querying data and handling one off operations is actually much easier. We can handle it via a simple REST interface:

  • /timeseries/query?start=2013-01-01&end=2014-01-01&id=sensor1.heat&id=sensor1.flow&aggregation=avg&period=weekly
  • /timeseries/delete?id=sensor1.heat&id=sensor1.flow&start=2013-01-01&end=2014-01-01
  • /timeseries/tag?id=sensor1.heat&tag=temp:C

Nothing much to it, so far. We expose it as JSON endpoints, and that is… it.

For entering data, we need something a bit more elaborate. I chose to use websockets for this, mostly because I want two way communication and that pretty much force my hand. The reason that I want to use two way communication mechanism is that I want to enable the following story:

  • The client send the data to the server as fast as it is able to.
  • The server will aggregate the data and will decide, base on its own consideration, when to actually commit the data to disk.
  • The client need some way of knowing when the data it just sent is actually flushed so it can rely on it.
  • the connection should stay open (potentially for a long period of time) so we can keep sending new stuff in without having to create a new connection.

As far as the server is concerned, by the way, it will decide to flush the pending changes to disk whenever:

  • over 50 ms passed waiting for the next value to arrive, or
  • it has been at least 500 ms from the last flush, or
  • there are over 16,384 items pending, or
  • the client indicated that it wants to close the connection, or
  • the moon is full on a Tuesday

However, how does the server communicate that to the client?

The actual format we’ll use is something like this (binary, little endian format):

   1: enter-data-stream =  
   2:    data-packet+
   3:  
   4: data-packet = 
   5:    series-name-len [4 bytes]
   6:    series-name [series-name-len, utf8]
   7:    entries
   8:    no-entries
   9:  
  10: entries =
  11:    count [2 bytes]
  12:    entry[count]
  13:  
  14: no-entries =
  15:    count [2 bytes] = 0
  16:  
  17: entry =
  18:    date [8 bytes]   
  19:    value [8 bytes]

The basic idea is that this format should be pretty good in conveying just enough information to send the data we need, and at the same time be easy to parse and work with. I am not sure if this is a really good idea, because this might be premature optimization and we can just send the data as json strings. That would certainly be more human readable. I’ll probably go with that and leave the “optimized binary approach” for when/if we actually see a real need for it. The general idea is that computer time is much cheaper than human time, so it is worth the time to make things human readable.

The server keep tracks of the number of entries that were sent to it for each connection, and whenever it is flushing the buffered data to disk, it will send notification to that effect to the client. The client can then decide “okay, I can notify my caller that everything that I have sent has been properly saved to disk”. Or it can just ignore it (usual case if they have a long running connection).

And that is about it as far as the wire protocol is concerned. Next, we will take a look at some of the operations that we need.

Comments

Rafal
02/13/2014 10:30 AM by
Rafal

If you want to avoid creating a connection, why are you creating it? I mean, is connection necessary when you only want to send 16 bytes of information?

Ayende Rahien
02/13/2014 10:48 AM by
Ayende Rahien

Rafal, A large part of the time is spent in just establishing the connection. In TCP, you have to do the 3 way dance, in HTTP, you have to do auth from scratch, etc. If you keep the stream open, you can bypass a lot of those costs.

Rémi BOURGAREL
02/13/2014 11:00 AM by
Rémi BOURGAREL

I read everywhere that connection should be opened at the last time and closed asap. How will you manage when you'll have 500 client connected to your server ? And is "human readable" really an advantage here ? Ticks are not human readable anyway.

Ayende Rahien
02/13/2014 11:11 AM by
Ayende Rahien

Remi, Human readable is a huge advantage in things like debugging, devops, troubelshooting, etc. The actual on disk format is rarely of interest. But the actual wire format is essential for human readable.

And 500 clients what? In this case, we are actually talking about persistent connections per server. It is unlikely that you'll have direct connections from 500 machines. You usually go through a web server for that. And even if you do, so what? It isn't hard to hold that many connections.

Rafal
02/13/2014 11:24 AM by
Rafal

My hint was to use a connectionless protocol instead of TCP and its handshake overhead.

Ayende Rahien
02/13/2014 11:28 AM by
Ayende Rahien

Rafal, Connectionless protocol. You mean UDP? Or Raw sockets?

In both cases, you need to have some way of getting replies back, to ensure that the server got your values. It also means that you need to handle retries, etc.

It is also far harder to support & debug. It is also far more likely to have issues with firewalls, resulting in needing to support & debug.

Jahmai Lay
02/13/2014 11:45 AM by
Jahmai Lay

What about XMPP?

Rafal
02/13/2014 12:24 PM by
Rafal

UDP

Scooletz
02/13/2014 12:40 PM by
Scooletz

@Ayende, @Rafal, not to mention network congestion.

Ayende Rahien
02/13/2014 12:52 PM by
Ayende Rahien

Jahmai, That is over TCP, so there isn't really any additional benefit here that I can see.

Tyler
02/13/2014 05:02 PM by
Tyler

I understand the ease of debugging via the wire with a human readable format, intercepting the message as it moves across the HTTP wire. I have two problems with this approach. First, you sacrifice performance by serializing and deserializing binary data to readable text at client and then server, respectively. Sometimes this is a huge cost. Second, my personal opinion is that debugging on the wire is the wrong place to debug. You're not in the code on the client or the server, so you really cannot diagnose the problem. The only thing you can do is determine if your serialization is working and whether your message is correct. And I would argue that this is better done on the client or the server, rather than in the middle. Just my two cents.

Patrick
02/13/2014 05:53 PM by
Patrick

@Ayende

Just to make things interesting... what would be the conditions where you want to use raw TCP to expose a server capability?

Kijana Woodard
02/13/2014 07:31 PM by
Kijana Woodard

A thought.

What if the wire format is binary, and a fiddler add-on is used to decode it to a human readable format.

Ayende Rahien
02/14/2014 02:20 PM by
Ayende Rahien

Tyler, Sure, there is some cost associated with debuggability. But being able to observe requests & response tells you a LOT about the behavior of the system. It is also very easily done.

You can always add an optimized binary protocol later, but starting with text is the way to go.

For that matter, this web page was served using a human readable protocol.

Ayende Rahien
02/14/2014 02:21 PM by
Ayende Rahien

Patrick, Proven performance optimization would do that. If it was significant enough.

Kijana, A fiddler add-on would do it, but it would have to be good reason. No need to tack on needing to support something else if we have something that already works.

Karhgath
02/14/2014 03:08 PM by
Karhgath

@Tyler, the wire in absolutely one of the most important place to debug, it's an integration point and thus will need the bulk of debugging facilities there.

Integration testing often cost up to 80% of all testing/debugging time, over the full lifecycle of a product, and for good reason.

In contrast, UT and UAT are often fire and forget, you invest a lot of initial time on them, but after that it's pretty much done (all green/user approval).

Tyler
02/14/2014 04:19 PM by
Tyler

Ayende and Karhgath, I guess we'll have to agree to disagree on this one. For me, it seems that observing requests and responses is more properly done at one end or the other with logging. Debugging, at least for me, connotes being able to observe execution flow, including the use of break points and watching specific values. The value of observing data across the wire in human readable format when I have proper logging enabled and switchable, seems redundant and of little value, constraining your choices as to wire format. And with respect to tacking on an optimized binary protocol later, it has been my experience that such intentions are rarely ever realized.

ohadr
02/15/2014 07:20 AM by
ohadr

You should go with ZeroMQ. 1. Performance is crazy. 2. Separating data stream from acks stream is piece of cake, you just need to created more sockets. 3. Almost all buffering is handled for you (e.g, if you write 1M messages from the client, they are sent as batches on the wire). 4. The support for message parts allow you to create a binary format which is very fast and very simple to work with for the data insertion. 5. For the query interface, you could use json over the sockets. 6. Will be easy to scale to multiple machines. 7. Has support for UDP connections.

Ayende Rahien
02/15/2014 10:11 AM by
Ayende Rahien

Ohadr, Does it have a Fiddler like interface, so I can see the message exchanges?

ohadr
02/15/2014 11:41 AM by
ohadr

Ayende,

For sending message, I simply use python. For diagnostics, There is currently no simple solution, although a wireshark plugin does comes to mind.

Anyway, since you are gonna be using a binary format (for the data, at least), Fiddler will not suit the use case either...

Also, they added auth & encryption in the latest version.

Ayende Rahien
02/16/2014 05:22 AM by
Ayende Rahien

Ohadr, You'll be surprised just how much you can figure out just from the request URI, when you are doing debugging. If there is no simple solution, there really isn't no option for me here. I consider this to be pretty much a top priority requirement.

Comments have been closed on this topic.