RavenDB 4.0 Unsung HeroesIndexing related data

time to read 3 min | 506 words

treasure-map-153425_640RavenDB is a non relational database, which means that you typically don’t model documents as having strong relations. A core design principle for modeling documents is that they should be independent, isolated and coherent, or more specifically,

  • Independent – meaning a document should have its own separate existence from any other documents.
  • Isolated – meaning a document can change independently from other documents.
  • Coherent – meaning a document should be legible on its own, without referencing other documents.

That said, even when following proper modeling procedures there are still cases when you want to search a document by its relationship. For example, you might want to search for all for all the employees whose manage name is John, and you don’t care if this is John Doe or John Smith for some reason.

RavenDB allows you to handle this scenario by using LoadDocument during the index phase. That creates a relationship between the two documents and ensures that whenever the referenced document is updated, the referencing documents will be re-indexed to catch up to the new details. It is quite an elegant feature, even if I say so myself, and I’m really proud of it.

It is also the source of much abuse in the wild. If you don’t model properly, it is often easy to paper over that using LoadDocument in the indexes.

The problem is that in RavenDB 3.x an update to a document that was referenced using LoadDocument was also required to touch all of the referencing documents. This slowed down writes, which is something that we generally try really hard to avoid and could also caused availability issues if there were enough referencing documents (as in, all of them, which happened more frequently then you might think).

With RavenDB 4.0, we knew that we had to do better. We did this by completely changing how we are handling LoadDocument tracking. Instead of having to re-index all the relevant values globally, we are now tracking them on a per index basis. In each index, we track the relevant references on a per collection basis, and as part of the indexes we’ll check if there has been any updates to any of the documents that we have referenced. If we do have an document that has a lot of referencing documents, it will still take some time to re-index all of them, but that cost is now limited to just the index in question.

You can still create an index and slow it down in this manner, but the pay to play model is much nicer and there is no affect on the write speed for documents and no general impact on the whole database, which is pretty sweet. The only way you would ever run into this feature is if you run into this problem in 3.x and try to avoid it, which is now not necessary for the same reason (although the same modeling concerns apply).

More posts in "RavenDB 4.0 Unsung Heroes" series:

  1. (30 Oct 2017) automatic conflict resolution
  2. (05 Oct 2017) The design of the security error flow
  3. (03 Oct 2017) The indexing threads
  4. (02 Oct 2017) Indexing related data
  5. (29 Sep 2017) Map/reduce
  6. (22 Sep 2017) Field compression