API DesignWe’ll let the users sort it out

time to read 3 min | 452 words

In my previous post, I explained an API design that give the user the option to perform an immediate operation, use the default fire and forget or use an explicit bulk mechanism. The idea is that most operations are small, and that the cost of actually going over the network is going to dominate the cost of the entire operation. In this case, we want to give the user the option of selecting the “I want to know the operation completed” or “I just want to try the best, I’m fine if there is a failure” modes.

Eli asks:

Trying to understand why this isn't just a scripting case. In your example you loop over CSV and propose an increment call per part which gets batched up and committed outside the user's control. Why not define a JavaScript on the server that takes an array ("a batch") of parts sent over the wire and iterates over it on the server instead? This way the user gets fine grained control over the batch. I'm guessing the answer has something to do with your distributed counter architecture...

If I understand Eli properly, the idea is that we’ll just expose an API like this:

Increment(“users/1/visits”, 1);

And provide an endpoint where a user can POST a JavaScript code that will call this. The idea is that the user will be able to decide whatever to call this once, or send a whole batch of updates in one go.

This is certainly an option, but in my considered opinion, it is a pretty bad one. It has nothing to do with the distributed architecture, it has to do with the burden we put on the user. The actual semantics of “go all the way to the server and confirm the operation” vs “let us do a bulk insert kind of thing” are pretty easy. Each of them has a pretty well defined behavior.

But what happens when you want to do an operation per page view? From the point of view of your code, you are making a single operation (incrementing the counters for a particular entity). From the point of view of the system as a whole, you are generating a whole lot of individual requests that would be much better off as a single bulk request.

Having a scripting endpoint gives the user the option of doing that, sure, but then they need to handle:

  • Error recovery
  • Multi threading
  • Flushing on batch size / time
  • Failover

And many more that I’m probably forgetting. By providing the users with the option of making an informed choice about speed vs. safety, we avoid putting the onus of the actual implementation on them.

More posts in "API Design" series:

  1. (04 Dec 2017) The lack of a method was intentional forethought
  2. (27 Jul 2016) robust error handling and recovery
  3. (20 Jul 2015) We’ll let the users sort it out
  4. (17 Jul 2015) Small modifications over a network
  5. (01 Jun 2012) Sharding Status for failure scenarios–Solving at the right granularity
  6. (31 May 2012) Sharding Status for failure scenarios–explicit failure management doesn’t work
  7. (30 May 2012) Sharding Status for failure scenarios–explicit failure management
  8. (29 May 2012) Sharding Status for failure scenarios–ignore and move on
  9. (28 May 2012) Sharding Status for failure scenarios