The bare minimum a distributed system developer should know aboutnetworking
As a developer, it is easy to operate at the level of message passing between systems, utilizing sophisticated infrastructure to communicate between nodes and applications. Everything works, everything in its place and we can focus on the Big Picture.
That works great, because while everything is humming, you don’t need to know any of the crap that lies beneath the surface. Unfortunately, if you are developing distributed systems, you kinda of really need to know these things. As in, you can’t do proper job with them if you can’t.
We’ll start from the basics. Any networked service needs to listen on the network, which means that need to provide the service with the details on what to listen on.
Now, typically you’ll write something like http://my-awesome-service and leave it as that, not thinking about this any further, but let break this apart for a bit. When you hand the tcp listener an address such as this one, it doesn’t really know what it can do with this. At the TCP level, such a thing is meaningless.
At the TCP level, we deal with IP and ports. For simplicity’s sake, I’m going to ignore IPv4 vs. IPv6, because they don’t matter that much at this level (except that they do, but we’ll get to that later). This means that we need to have some way to translate “my-awesome-service” into an IP address. At this point, you are probably recalling about such a thing as DNS (Domain Name System), which is exactly the solution for that. And you would be right, almost.
The problem is that we aren’t dealing with a flat network view. In other words, we can’t assume that the IP address that “my-awesome-service” is mapped to is actually available on the end machine our server is running on. But how can that be? The whole point is that I can just point my client there and the server is there.
The following is a non exhaustive list (but certainly exhausting) of… things that are in the way.
- NAT (Network Address Tables)
- Load balancers
- Monitoring tools
- Security filtering
In the simplest scenario, imagine that you have pointed my-awesome-service to IP address 220.127.116.11. However, instead of your service being there, there is a proxy that will forward any connections on port 80 to an internal address at 10.0.12.11 at port 8080. That server is actually doing on traffic analysis, metering and billing for the cloud provider you are using, after which it will pass on the connection to the actual machine you are using, at 10.0.15.23 on port 8888. You might think that the journey is over, but you are actually forgot that you are running your service as a container. So the host machine needs to forward that to the guest container on 127.0.0.3 on port 38880. And believe it or not, this is probably skipping half a dozen steps that actually occur in production.
If you want to look at the network route, you can do that using “traceroute” of “tracert”. Here is what a portion of the output looks like from my home to ayende.com. Note that this is how long it takes me to get to the IP address that the DNS says is hosting ayende.com, not actually routing from that server to the actual code that runs this blog.
This stuff is complex, but usually, you don’t need to dive that deeply into this. At certain point, you are going to call to the network engineers and let them figure it out. We are focused on the developer aspect of understanding distributed systems.
Now, here is the question, to test if you are paying attention. What did your service bind to, to be able to listen to over the network?
If you just gave it “http://my-awesome-service” as the configuration value, there isn’t much it can do. It cannot bind to 18.104.22.168, since that URL does not exist on the container. So the very first thing that we need to understand as distributed systems developers is that “what do I bind to” can be very different from “what do I type to get to the service”.
This is the first problem, and it is usually a major hurdle to grok. Mostly because when we are developing internally, you’ll typically use either machine names or IP addresses and you’ll typically consider a flat network view, not the segmented one that is actually in place. Docker and containers actually make you think about some of that a lot more, but even so most people don’t consider this too much.
I’m actually skipping on a bunch of details here. For example, a server may want to listen to multiple IPs (internal & external), maybe with different behaviors on each. A server may actually have multiple network cards and want to listen to both (for more bandwidth). It is also quite common to have a dedicate operations network, so the server will listen to the public network to respond to general queries, but things like SNMP or management interface is only exposed to the (completely separated) ops network. And sometimes things are even weirder, with crazy topologies that look Escher paintings.
This post has gone on long enough, but I’ll have at least another post in this series.