reWhy You Should Never Use MongoDB

time to read 3 min | 570 words

I was pointed at this blog post, and I thought that I would comment, from a RavenDB perspective.

TL;DR summary:

If you don’t know what how to tie your shoes, don’t run.

The actual details in the posts are fascinating, I’ve never heard about this Diaspora project. But to be perfectly honest, the problems that they run into has nothing to do with MongoDB or its features. They have a lot to do with a fundamental lack of understanding on how to model using a document database.

In particular, I actually winced in sympathetic pain when the author explained how they modeled a TV show.

They store the entire thing as a single document:

Hell, the author even talks about General Hospital, a show that has 50+ sessions and 12,000 episodes in the blog post. And at no point did they stop to think that this might not be such a good idea?

A much better model would be something like this:

  • Actors
    • [episode ids]
  • TV shows
    • [seasons ids]
  • Seasons
    • [review ids]
    • [episode ids]
  • Episodes
    • [actor ids]

Now, I am not settled in my own mind if it would be better to have a single season document per season, containing all episodes, reviews, etc. Or if it would be better to have separate episode documents.

What I do know is that having a single document that large is something that I want to avoid. And yes, this is a somewhat relational model. That is because what you are looking at is a list of independent aggregates that have different reasons to change.

Later on in the post the author talks about the problem when they wanted to show “all episodes by actor”, and they had to move to a relational database to do that. But that is like saying that if you stored all your actors information as plain names (without any ids), you would have hard time to handle them in a relational database.

Well, duh!

Now, as for the Disapora project issues.  They appear to have been doing some really silly things there. In particular, let us store store the all information multiple times:

You can see for example all the user information being duplicated all over the place.  Now, this is a distributed social network, but that doesn’t call for it to be stupid about it.  They should have used references instead of copying the data.

Now, to be fair, there are very good reasons why you’ll want to duplicate the data, when you want a point in time view of it. For example, if I commented on a post, I probably want my name in that post to remain frozen. It would certainly make things much easier all around. But if changing my email now requires that we’ll run some sort of a huge update operation on the entire database… well, it ain’t the db fault. You are doing it wrong.

Now, when you store references to other documents, you have a few options. If you are using RavenDB, you have Include support, so you can get the associated documents easily enough. If you are using mongo, you have an additional step in that you have to call $in(the ids), but that is about it.

I am sorry, but this is blaming the dancer blaming the floor.

More posts in "re" series:

  1. (02 Jun 2022) BonsaiDb performance update
  2. (14 Jan 2022) Are You Sure You Want to Use MMAP in Your Database Management System?
  3. (09 Dec 2021) Why IndexedDB is slow and what to use instead
  4. (23 Jun 2021) The performance regression odyssey
  5. (27 Oct 2020) Investigating query performance issue in RavenDB
  6. (27 Dec 2019) Writing a very fast cache service with millions of entries
  7. (26 Dec 2019) Why databases use ordered indexes but programming uses hash tables
  8. (12 Nov 2019) Document-Level Optimistic Concurrency in MongoDB
  9. (25 Oct 2019) RavenDB. Two years of pain and joy
  10. (19 Aug 2019) The Order of the JSON, AKA–irresponsible assumptions and blind spots
  11. (10 Oct 2017) Entity Framework Core performance tuning–Part III
  12. (09 Oct 2017) Different I/O Access Methods for Linux
  13. (06 Oct 2017) Entity Framework Core performance tuning–Part II
  14. (04 Oct 2017) Entity Framework Core performance tuning–part I
  15. (26 Apr 2017) Writing a Time Series Database from Scratch
  16. (28 Jul 2016) Why Uber Engineering Switched from Postgres to MySQL
  17. (15 Jun 2016) Why you can't be a good .NET developer
  18. (12 Nov 2013) Why You Should Never Use MongoDB
  19. (21 Aug 2013) How memory mapped files, filesystems and cloud storage works
  20. (15 Apr 2012) Kiip’s MongoDB’s experience
  21. (18 Oct 2010) Diverse.NET
  22. (10 Apr 2010) NoSQL, meh
  23. (30 Sep 2009) Are you smart enough to do without TDD
  24. (17 Aug 2008) MVC Storefront Part 19
  25. (24 Mar 2008) How to create fully encapsulated Domain Models
  26. (21 Feb 2008) Versioning Issues With Abstract Base Classes and Interfaces
  27. (18 Aug 2007) Saving to Blob
  28. (27 Jul 2007) SSIS - 15 Faults Rebuttal
  29. (29 May 2007) The OR/M Smackdown
  30. (06 Mar 2007) IoC and Average Programmers
  31. (19 Sep 2005) DLinq Mapping