Putting JSON in a block chain? First decide what your JSON is…

time to read 2 min | 393 words

RavenDB is a document database, as such, it stores data in JSON format. We have had a few cases of users that wanted to use RavenDB as the backend of various blockchains. I’m not going to touch on their reasoning. I think that a blockchain is a beautiful construct, but one that is searching for a good niche to solve.

The reason for this post, however, is that we need to consider one of the key problems that you have to deal with the blockchain, how to compute the signature of a JSON document. That is required so we’ll be able to build a merkle tree, which is at the root of all blockchains.

There are things such as JWS and JOSE to handle that, of course. And rolling your own signature scheme is not advisable. However, I want to talk about a potentially important aspect of signing JSON, and that is that there isn’t really a proper canonical form of JSON. For example, consider the following documents:

All of those documents have identical output. Admittedly, you could argue about the one using multiple Rating properties, but in general, they are the same. But if we look at the byte level representation, that is very far from the case.

A proper way to sign such messages would require that we’ll:

  • Minify the output to remove any extra whitespace.
  • Error on multiple properties with the same key. That isn’t strictly required, but is going to make everything easier.
  • Output them in a sorted order.
  • Normalize the string encoding to a single format.
  • Normalize numeric encoding (for example, whatever you support only double precision floats or arbitrary sized numbers).

Only then can you actually perform the actual signature on the raw bytes. That also means that you can’t just pipe the data to sha256() and call it a day.

Another alternative is to ignore all of that and decide that the only thing that we actually care about in this case is the raw bytes of the JSON document. In other words, we’ll validate the data as raw binary, without caring about the semantic differences. In this case, the output of all the documents above will be different.

Here is a simple example of cleaning up a JSON object to return a stable hash:

That answer the above criteria and is pretty simple to run and work with. Including from other platforms and environments.