Over the wire protocol design
I’ve had a couple of good days in which I was able to really sit down and code. Got a lot of cool stuff done. One of the minor requirements we had was to send some data over the wire. That always lead to a lot of interesting decisions.
How to actually send it? One way? RPC? TCP? HTTP?
I like using HTTP for those sort of things. There are a lot of reasons for that, it is easy to work with and debug, it is easy to secure, it doesn’t require special permissions or firewall issues.
Mostly, though, it is because of one overriding priority, it is easy to manage in production. There are good logs for that. Because this is my requirement, it means that this applies to the type of HTTP operations that I’m making.
Here is one such example that I’m going to need to send:
public class RequestVoteRequest : BaseMessage
{
public long Term { get; set; }
public string CandidateId { get; set; }
public long LastLogIndex { get; set; }
public long LastLogTerm { get; set; }
public bool TrialOnly { get; set; }
public bool ForcedElection { get; set; }
}
A very simple way to handle that would be to simply POST the data as JSON, and you are done. That would work, but it would have a pretty big issue. It would mean that all requests would look the same over the wire. That would mean that if we are looking at the logs, we’ll only be able to see that stuff happened ,but won’t be able to tell what happened.
So instead of that, I’m going with a better model in which we will make request like this one:
http://localhost:8080/raft/RequestVoteRequest?&Term=19&CandidateId=oren&LastLogIndex=2&LastLogTerm=8&TrialOnly=False&ForcedElection=False&From=eini
That means that if I’m looking at the HTTP logs, it would be obvious both what I’m doing. We will also reply using HTTP status codes, so we should be able to figure out the interaction between servers just by looking at the logs, or better yet, have Fiddler capture the traffic.
Comments
Why not just use a log aggregation tool like kibana that gives you greater querying capabilities over your logs. You can then post JSON and stick to your ideal wire format.
Steve, Because that doesn't allow me to replay scenarios
Of course, putting everything in the URL means anybody can read the entire message and HTTPS is useless...
You mentioned "easy to secure" so I'm assuming you've considered this. How would you secure such a service?
Chris, That isn't the way HTTPs works. See: http://stackoverflow.com/questions/323200/is-an-https-query-string-secure
You're right, as long as you're controlling everything.
I'm always worried about potential leakage -- if someone shares the URL or a referrer gets sent (even though its not supposed to be). Or even logs being accessed -- many times people treat logs with much less care and security than "real data".
Chris, That is a _service_, it isn't a web app.
That will teach me to comment before the first cup of coffee ;-)
I worry about stuff showing up in logs, even with services. Perhaps I'm being too conservative here, but I've seen too many badly configured systems that end up inadvertently leaking data through logs or error recording.
Chris, There isn't any actual data showing up here, the data we send is configuration for the server, and that will allow us to reconstruct seq of operation, which can be very helpful. When capturing via Fiddler, we get full traces, which is very useful.
Hard to tell looking at the code, but if the call has side effects then using a GET over a POST just for the sake of logging sounds insane.
@Johnny B
As far as I know POST requests with query string and empty body are legal.
I'd be vaguely concerned with query string length; IIRC IIS has a query string length limit of 16,384 (although I believe this is configurable, and hell the limit may have been raised, my info may be out of date).
It also seems a bit short-sighted to consider that the only logs of interest would be logs pertaining to payload (i.e. the only error condition is a malformed or incorrect payload). Having stack traces / logs of the component processing the event on the receiving end in combination with message payload information would be pretty helpful, and once you have that implemented you are a very short step away from logging the payload itself.
Plus, using a wire format that's more widely accessible makes client domain model serialization to your payload format much more approachable and less error-prone (i.e. lots of stuff can take a POCO and spit out JSON, and you wind up with a more expressive format to boot.)
But who knows what you are doing with this, maybe that's all future-proofing clap trap that doesn't matter a fig.
Bryan
Bryan, The idea is that you are sending some core stuff through the query string, you aren't sending full data or anything like that. Note that in practice, query length size is roughly 2KB, and we have seen it as small as 1KB in some systems.
Comment preview