I was pointed at this blog post, and I thought that I would comment, from a RavenDB perspective.
If you don’t know what how to tie your shoes, don’t run.
The actual details in the posts are fascinating, I’ve never heard about this Diaspora project. But to be perfectly honest, the problems that they run into has nothing to do with MongoDB or its features. They have a lot to do with a fundamental lack of understanding on how to model using a document database.
In particular, I actually winced in sympathetic pain when the author explained how they modeled a TV show.
They store the entire thing as a single document:
Hell, the author even talks about General Hospital, a show that has 50+ sessions and 12,000 episodes in the blog post. And at no point did they stop to think that this might not be such a good idea?
A much better model would be something like this:
- [episode ids]
- TV shows
- [seasons ids]
- [review ids]
- [episode ids]
- [actor ids]
Now, I am not settled in my own mind if it would be better to have a single season document per season, containing all episodes, reviews, etc. Or if it would be better to have separate episode documents.
What I do know is that having a single document that large is something that I want to avoid. And yes, this is a somewhat relational model. That is because what you are looking at is a list of independent aggregates that have different reasons to change.
Later on in the post the author talks about the problem when they wanted to show “all episodes by actor”, and they had to move to a relational database to do that. But that is like saying that if you stored all your actors information as plain names (without any ids), you would have hard time to handle them in a relational database.
Now, as for the Disapora project issues. They appear to have been doing some really silly things there. In particular, let us store store the all information multiple times:
You can see for example all the user information being duplicated all over the place. Now, this is a distributed social network, but that doesn’t call for it to be stupid about it. They should have used references instead of copying the data.
Now, to be fair, there are very good reasons why you’ll want to duplicate the data, when you want a point in time view of it. For example, if I commented on a post, I probably want my name in that post to remain frozen. It would certainly make things much easier all around. But if changing my email now requires that we’ll run some sort of a huge update operation on the entire database… well, it ain’t the db fault. You are doing it wrong.
Now, when you store references to other documents, you have a few options. If you are using RavenDB, you have Include support, so you can get the associated documents easily enough. If you are using mongo, you have an additional step in that you have to call $in(the ids), but that is about it.
I am sorry, but this is blaming the dancer blaming the floor.
More posts in "re" series:
- (23 Jun 2021) The performance regression odyssey
- (27 Oct 2020) Investigating query performance issue in RavenDB
- (27 Dec 2019) Writing a very fast cache service with millions of entries
- (26 Dec 2019) Why databases use ordered indexes but programming uses hash tables
- (12 Nov 2019) Document-Level Optimistic Concurrency in MongoDB
- (25 Oct 2019) RavenDB. Two years of pain and joy
- (19 Aug 2019) The Order of the JSON, AKA–irresponsible assumptions and blind spots
- (10 Oct 2017) Entity Framework Core performance tuning–Part III
- (09 Oct 2017) Different I/O Access Methods for Linux
- (06 Oct 2017) Entity Framework Core performance tuning–Part II
- (04 Oct 2017) Entity Framework Core performance tuning–part I
- (26 Apr 2017) Writing a Time Series Database from Scratch
- (28 Jul 2016) Why Uber Engineering Switched from Postgres to MySQL
- (15 Jun 2016) Why you can't be a good .NET developer
- (12 Nov 2013) Why You Should Never Use MongoDB
- (21 Aug 2013) How memory mapped files, filesystems and cloud storage works
- (15 Apr 2012) Kiip’s MongoDB’s experience
- (18 Oct 2010) Diverse.NET
- (10 Apr 2010) NoSQL, meh
- (30 Sep 2009) Are you smart enough to do without TDD
- (17 Aug 2008) MVC Storefront Part 19
- (24 Mar 2008) How to create fully encapsulated Domain Models
- (21 Feb 2008) Versioning Issues With Abstract Base Classes and Interfaces
- (18 Aug 2007) Saving to Blob
- (27 Jul 2007) SSIS - 15 Faults Rebuttal
- (29 May 2007) The OR/M Smackdown
- (06 Mar 2007) IoC and Average Programmers
- (19 Sep 2005) DLinq Mapping