The Guts n’ Glory of Database InternalsThe communication protocol
With all the focus on writing to disk and ensuring the high speed of the database, I almost missed a very important aspect of databases, how do we actually communicate with the outside world.
In many cases, there is no need for that. Many databases just run either embedded only (LevelDB, LMDB, etc) or just assume that this is someone else’s problem, pretty much. But as it turn out, if you aren’t an embedded database, there is quite a bit that you need to get right when you are working on the network code. For the rest of this post, I’m going to assume that we are talking about a database that clients connect to through the network. It can be either locally (shared memory / pipes / UNIX sockets / TCP) or remotely (pretty much only TCP at this point).
The actual protocol format don’t really matter. I’m going to assume that there is some sort of a streaming connection (think TCP socket) and leave it at that. But the details of how the database communicate are quite important.
In particular, what kind of communication do we have between client and server. Is each request independent of one another, are there conversations, what kind of guarantees do we give for the lifetime of the connection, etc?
Relational databases use a session protocol, in which the client will have a conversation with the server, composed of multiple separate request / replies. Sometimes you can tell the server “just process the requests as fast as you can, and send the replies as well, but don’t stop for sending replies”, which helps, but this is probably the worst possible way to handle this. The problem is that you need a connection per client, and those conversation can be pretty long, so you hold up to the connection for a fairly long duration, which means that you have to wait for the next part of the conversation to know what to do next.
A better approach is to dictate that all operations are strictly one way, so once the server has finished processing a request, it can send the reply to the client and move on to process the next request (from the same client or the other one) immediately. In this case, we have several options, we can pipeline multiple requests on the same channel (allowing out of order replies for requests / replies), we can use the serial nature of TCP connections and have a pool of connections that we can share among clients.
An even better method is if you can combine multiple requests from the server to a single reply from the server. This can be useful if you know that the shape of all the commands are going to be same (for example, if you are asked in insert a million records, you can process them as soon as possible, and only confirm that you accepted every ten thousands or so). It may sound silly, but those kind of things really add up, because the client can just pour the data in and the server will process it as soon as it can.
Other considerations that you have to take into account in the communication protocol is security and authentication. If you are allowing remote clients, then they need some way to authenticate. And that can be expensive (typically authentication required multiple round trips between server & client to verify the credentials, and potentially and out of server check in something like LDAP directory), so you want to authenticate once and then keep on to the same connection for as long as you can.
A more problematic issues comes when talking about the content of your conversation with the database, you typically don’t want to send everything over plain text, so you need to encrypt the channel, which has its own costs. For fun, this means dealing with things like certificates, SSL and pretty complex stuff that very few properly understand. If you can get away with letting something else do that for you, see if you can. Setting up a secured connection that cannot be easily bypassed (man in the middle, fake certs, just weak encryption, etc) is none trivial, and should be left for experts in their field.
Personally, I like to just go with HTTPS for pretty much all of that, because it take a lot of the complexity away, and it means that all the right choices has probably already been made for me. HTTPS, of course, is a request/reply, but things like connection pooling, keep-alive and pipelining can do quite a lot to mitigate that in the protocol level. And something like Web Sockets allows to have most of the benefits of TCP connections, with a lot of the complexity handled for you.
More posts in "The Guts n’ Glory of Database Internals" series:
- (08 Aug 2016) Early lock release
- (05 Aug 2016) Merging transactions
- (03 Aug 2016) Log shipping and point in time recovery
- (02 Aug 2016) What goes inside the transaction journal
- (18 Jul 2016) What the disk can do for you
- (15 Jul 2016) The curse of old age…
- (14 Jul 2016) Backup, restore and the environment…
- (11 Jul 2016) The communication protocol
- (08 Jul 2016) The enemy of thy database is…
- (07 Jul 2016) Writing to a data file
- (06 Jul 2016) Getting durable, faster
- (01 Jul 2016) Durability in the real world
- (30 Jun 2016) Understanding durability with hard disks
- (29 Jun 2016) Managing concurrency
- (28 Jun 2016) Managing records
- (16 Jun 2016) Seeing the forest for the trees
- (14 Jun 2016) B+Tree
- (09 Jun 2016) The LSM option
- (08 Jun 2016) Searching information and file format
- (07 Jun 2016) Persisting information
I always liked SCTP because it is what everyone really wants. Nobody wants TCP semantics, they all want multiple streams. The streams are cheap, as opposed to TCP connections. Each message can be a stream. Almost always do people want to send messages. Nobody wants a single stream of bytes.
SCTP does not work because it is not supported by a lot of things. My idea always was to emulate SCTP on top of TCP. Is it maybe possible to do the same thing on top of HTTPS? Websockets manage to turn the connection into a bidirectional channel so it seems possible.
Tobi, Web Sockets are just TCP + some overhead. And the problem is that in order to do multiple streams properly (with timeouts, error handling, etc), you need to do quite a lot to support it.
HTTP/2 is actually much closer to that, I think.
Just some ideas :-)
You also need to consider the firewall. I have seen the firewall break the connection once every day and restart was needed.
Bandwidth/latency on the network: If everything is located on a Hypervisor (e.g. ESX) you have lower latency compared to several servers across the world.
Moreover considering a 3-tier architecture instead of just client/server: https://en.wikipedia.org/wiki/Multitier_architecture
Carsten, Something that happens once a day isn't really relevant for the protocol. It is just known issue that you have to deal with, like any network condition.
On the database side, you have to consider the protocol, and on the client side, you have to consider how to implement it. For example, client side connection pooling sometimes require server side support. That doesn't really impact the protocol as per the post, though
I forgot about HTTP 2! That seems nice as well. What do you think about that? Would that not map perfectly to the request/response nature of Ravens communication?
Tobi, HTTP/2 is very specific for web browsing. I don't think that it has good match here. Using a single TCP connection, you can multiplex multiple requests / replies on the same channel, quite efficiently (that is what HTTP 2 does, after all), and it just require that your protocol understand that you have some sort of request id that responses carry.