In my previous post, I explained an API design that give the user the option to perform an immediate operation, use the default fire and forget or use an explicit bulk mechanism. The idea is that most operations are small, and that the cost of actually going over the network is going to dominate the cost of the entire operation. In this case, we want to give the user the option of selecting the “I want to know the operation completed” or “I just want to try the best, I’m fine if there is a failure” modes.
If I understand Eli properly, the idea is that we’ll just expose an API like this:
This is certainly an option, but in my considered opinion, it is a pretty bad one. It has nothing to do with the distributed architecture, it has to do with the burden we put on the user. The actual semantics of “go all the way to the server and confirm the operation” vs “let us do a bulk insert kind of thing” are pretty easy. Each of them has a pretty well defined behavior.
But what happens when you want to do an operation per page view? From the point of view of your code, you are making a single operation (incrementing the counters for a particular entity). From the point of view of the system as a whole, you are generating a whole lot of individual requests that would be much better off as a single bulk request.
Having a scripting endpoint gives the user the option of doing that, sure, but then they need to handle:
- Error recovery
- Multi threading
- Flushing on batch size / time
And many more that I’m probably forgetting. By providing the users with the option of making an informed choice about speed vs. safety, we avoid putting the onus of the actual implementation on them.
More posts in "API Design" series:
- (04 Dec 2017) The lack of a method was intentional forethought
- (27 Jul 2016) robust error handling and recovery
- (20 Jul 2015) We’ll let the users sort it out
- (17 Jul 2015) Small modifications over a network
- (01 Jun 2012) Sharding Status for failure scenarios–Solving at the right granularity
- (31 May 2012) Sharding Status for failure scenarios–explicit failure management doesn’t work
- (30 May 2012) Sharding Status for failure scenarios–explicit failure management
- (29 May 2012) Sharding Status for failure scenarios–ignore and move on
- (28 May 2012) Sharding Status for failure scenarios