You’ll pry transactions from my dead, cold, broken hands

time to read 6 min | 1153 words

“We tried using NoSQL, but we are moving to Relational Databases because they are easier…”

That was the gist of a conversation that I had with a client. I wasn’t quite sure what was going on there, so I invited myself to their offices and took a peek at the code. Their actual scenario is classified, so we will use the standard blog model to show a similar example. In this case, we have there entities, the BlogPost, the User and the Comment. What they wanted is to ensure is that when a user is commenting on a blog post, it will update the comments’ count on the blog post, update the posted comments count on the user and insert the new comment.

The catch was that they wanted the entire thing to be atomic, to either happen completely or not at all. The other catch was that they were using MongoDB. The code looked something like this:

public ActionResult AddComment(string postId, string userId, Comment comment)
{
    int state = 0;
    var blogPost = database.GetCollection<BlogPost>("BlogPosts").FindOneById(postId);
    var user = database.GetCollection<User>("Users").FindOneById(userId);
    try
    {
        database.GetCollection<Comment>("Comments").Save(comment);
        state = 1;

        blogPost.CommentsCount++;
        database.GetCollection<BlogPost>("BlogPosts").Save(blogPost);
        state = 2;

        user.PostecCommentsCount++;
        database.GetCollection<User>("Users").Save(user);
        state = 3;


        return Json(new {CommentAdded = true});
    }
    catch (Exception)
    {
         // if (state == 0) //nothing happened yet, don't need to do anything

        if (state >= 1)
        {
            database.GetCollection<Comment>("Comments").Remove(Query.EQ("_id", comment.Id), RemoveFlags.Single);
        }
        if (state >= 2)
        {
            blogPost.CommentsCount--;
            database.GetCollection<BlogPost>("BlogPosts").Save(blogPost);
        }
        if (state >= 3)
        {
            user.PostecCommentsCount--;
            database.GetCollection<User>("Users").Save(user);
        }

        throw;
    }
}

Take a moment or two to go over the code and figure out what was going on in there. It took me a while to really figure that one out.

Important, before I continue with this post, I feel that I need to explain what the problem is and why it is there. Put simply, MongoDB doesn’t support multi document transactions. The reason that it doesn’t support multi document transactions is that the way MongoDB auto sharding works, different documents may be on different shards, therefor requiring synchronization between different machines, which no one has managed to make scalable an efficient. MongoDB choose, for reasons of scalability and performance, to not implement this feature. This is document and well know part of the product.

It makes absolute sense, except that it leads to code like the one above, when the user really do want to have atomic multi document writes. Just to be certain that the point has been hammered home. The code above still does not ensures atomic multi document writes. For example, if the server shuts down between immediately after setting state to 2, there is nothing that the code can do to revert the previous writes (after all, they can’t contact the server to tell it that it to revert them).

And there are other problems with this approach, the code is ugly, and it is extremely brittle. It is very easy to update one part and not the other… but at this point I think that I am busy explaining why horse excrement isn’t suitable for gourmet food.

The major problem with this code is that it is trying to do something that the underlying database doesn’t support. I sat down with the customer and talked about the advantages and disadvantages of staying with a document database vs. moving to a relational database. A relational database would handle atomic multi row writes easily, but would require many reads and many joins to show a single page.

That was the point where I put the disclaimer “I am speaking about my own product, and I am likely biased, be aware of that”.

The same code in RavenDB would be:

public ActionResult AddComment(string postId, string userId, Comment comment)
{
    using(var session = documentStore.OpenSession())
    {
        session.Save(comment);
        session.Load<BlogPost>(postId).CommentsCount++;
        session.Load<User>(userId).PostedCommentCount++;

        session.SaveChanges(); // Atomic, either all are saved or none are
    }
    
    return Json(new { CommentAdded = true });

}

There are a couple of things to note here:

RavenDB supports atomic multi document writes without anything required.
This isn’t the best RavenDB code, ideally I wouldn’t have to create the session here, but in the infrastructure, but you get the point.

We also support change tracking for loaded entities, so we didn’t even need to tell it to save the loaded instances. All in all, I also think that the code is prettier, easier to follow and would produce correct results in the case of an error.

Tweet Share Share 40 comments

Tags:

Raven
NoSQL

Comments

20 Jun 2011
09:38 AM

Bob

Isn't this solution useless if two people post a comment at the same time?

I would have thought the best solution would be to create a postwithcomments view to provide the count.

20 Jun 2011
09:43 AM

Ayende Rahien

Bob, This is really something that depend on your usage scenarios. In RavenDB, you can tell it to fail the transaction because of concurrency conflict, or you can do patching (still within the same transaction), etc. You got options, and a lot of them are really good ones.

20 Jun 2011
10:39 AM

Teleo

Ayende, you should explain what makes this possible in RavenDB, and why transactions are possible with multiple documents in a sharded setup.

20 Jun 2011
10:56 AM

Ayende Rahien

Teleo, There are several things involved here. a) For a single server, we support atomic multi document writes natively. (note that this isn't the case for Mongo even for a single server). b) For multiple servers, we strongly recommend that your sharding strategy will localize documents, meaning that the actual update is only happening on a single server. c) For multi server, multi document atomic updates, we rely on distributed transactions.

The last is not really recommended for common use, because it has known scalability issues.

20 Jun 2011
11:16 AM

Khalid Abuhakmeh

This is a great article, but you could still go one step further. I mean why even increment the counts on any other documents. You could have blown your clients mind by showing them your ability to project count through indexes. By doing so, it reduces the ultimate solution down to 1 or 2 lines.

RavenDB just keeps getting better!

20 Jun 2011
11:32 AM

Ayende Rahien

Khalid, Yes, that is not a good solutions in terms of RavenDB, but it is mostly focused on demonstrating a very specific feature.

20 Jun 2011
12:04 PM

Rafal

How's it possible that your client had chosen MongoDB knowing that they will need transactional processing? This is not a shortcoming of MongoDB, it's by design - mongo authors dropped transaction support in favor of performance. MongoDB wasn't too difficult for your client - designing an application was. Probably sticking to good old SQL was the only good decision to make in their case, I don't think using Raven instead of Mongo would improve their chances of success.

20 Jun 2011
12:10 PM

Ayende Rahien

Rafal, They run into this requirement about a year after starting working on the system, it wasn't something that they initially had to worry about. It was a requirement that came out of new features popping up that weren't foreseen.

20 Jun 2011
12:33 PM

Rafal

Well, it had to be 'the straw that broke the camel's back' if they decided to throw away the underlying database in order to handle a new requirement. Wonder how many problems do they have now and how it will affect their ability to deliver anything working anytime soon?

20 Jun 2011
12:35 PM

tobi

Ayende, you are always good at choosing which features you support and which you don't. Transactions were a good choice.

I do not know how anyone could actually use MongoDB in production without transactions. That must bite you all the time. Basically, every bug in you app, that causes a request to crash mid-way, has the potential to corrupt data. I consider this to be completely unacceptable for most types of application.

20 Jun 2011
13:19 PM

Ajai

Nice, that comment that I spent half hour typing out was rolled back because some stupid counts could not be updated :)

I'd blame your client here, load and save User, Post to update some count?? Mongo $inc anyone?

And by the way was surprised to read Mongo has a global reader writer lock across collections! Choose wisely, but guess if it's good enough for Foursquare it is good enough for rest of us....

Ajai

20 Jun 2011
13:24 PM

Ayende Rahien

Ajai, As I said, that is really an issue of how you want to deal with things. In their scenario, it absolutely made sense to have it happen in this fashion. The blog model is a very simple one, and one that is very easy to work with and explain, but it is not something that you can say: "NEVER lose this data". The actual information scenario did require them to have all or nothing semantics, please do not try to read too much into the sample scenario, it is intentionally simplified to make it easy to understand.

20 Jun 2011
15:05 PM

Mike McG

You forgot to pass 'userId' into 'session.Load<User>()'.

20 Jun 2011
16:35 PM

Ayende Rahien

Mike, Thanks, minor detail, but I fixed it.

20 Jun 2011
17:05 PM

Peter

So the general idea here is MongoDB does not support multiple document transactions, RavenDB does. However, as you mention, if you have to shard and don't/can't localize your documents, you have to use distributed transactions, which you seem to recommend against.

If somebody reached that point with RavenDB, many shards, non local documents, what would you recommend? You'd have to change the model right? Or if possible just use map-reduces for counts and the like.

In other words, if your data gets big enough, you'll probably run against this issue anyway, Mongo or Raven?

20 Jun 2011
17:14 PM

Ayende Rahien

Peter,

RavenDB has the notion of locality, so even when you have multiple servers, you can still keep related data local, so you can get multi doc transactions.
In multiple servers, you still have DTC, it isn't recommended, yes. But it might be suitable for rare cases.

Note that RavenDB allows you to grow from a single server (itself able to serve a lot of data) to multiple servers. That growth means changes to your application, certainly. But I think that it is better than saying "this feature is hard to implement using shards, we won't allow it ever"

20 Jun 2011
17:54 PM

Peter

Ayende, Thanks for the response. I was genuinely curious, not trying to score a point on either side. I think you're right on and in practice it is much better to support multi-doc transactions in those scenarios. From a purely theoretical modeling standpoint, do you think it's fair to say that if you are using a document database and have a lot of mult doc transactions, that's probably a warning sign?

20 Jun 2011
18:22 PM

Ayende Rahien

Peter, That really depend, usually we are talking about document model in aggregates, but there are a lot of associations between those aggregates. In most scenarios, you probably are wrong to require mutli docs transactions, because it is okay to do this without multi doc transactions in most cases. Most of the time it is an indication of bad aggregates boundaries, but there are good reasons to want to have multiple documents (for example, different reasons for updating something in the same aggregate means that this is split to two documents) that you then need to modify in tandem. This is usually the case of practical reasons causing the splitting of a single aggregate.

20 Jun 2011
19:20 PM

jimmy zimmerman

but at this point I think that I am busy explaining why horse excrement isn’t suitable for gourmet food

Greatest. Comment. Ever. +D

20 Jun 2011
22:29 PM

SSII Lyon

<<we are moving to Relational Databases because they are easier>>
Doesn't that simply come from the fact that 99.9% of everything taught in school and in books is on the relational model? Of course if the skillset and mindset of IT people are formatted to be relational, relational is going to seem easier.

Does anyone know of a good primer on document DBs that could help alleviate this?

21 Jun 2011
01:51 AM

Demis Bellot

“We tried using NoSQL, but we are moving to Relational Databases because they are easier…”

I think this article should probably make mention that there are other NoSQL databases that support transactions since it's a little misleading as-is:

http://nosql.mypopescu.com/post/6732339201/multi-document-transactions-in-ravendb-vs-other-nosql

In the NoSQL space, there are a couple of other solutions that support transactions:

Google Megastore
Redis has two mechanisms that come close to transactions: MULTI/EXEC/DISCARD and pipelining —this one is exemplified in this Redis based triplestore database implementation
many of the graph databases (Neo4j, HyperGraphDB, InfoGrid)

21 Jun 2011
03:31 AM

Neil

I'm not particularly well versed in either NoSQL or DDD/CQRS but would this scenario be a candidate for event sourcing?

It seems as if storing a PostAdded document could offload the state management and transactional logic to some other process. If interrupted, said process could simply pick up where it left off. Not truly transactional I know, but could remove some of the issues with original code snippet.

21 Jun 2011
03:51 AM

meisinger

@Neil +1 I was thinking the same exact thing... why not "event source" it and be done? And while it is not in a nice "transaction" it could still be err... transactional. It almost sounds like they were trying to get a "consistent read" out of it...

21 Jun 2011
07:26 AM

Ayende Rahien

Demis, You are correct that there are some other NoSQL dbs out there that offer transactions, but most often, one of the laments against NoSQL is that there are no transactions

21 Jun 2011
07:28 AM

Ayende Rahien

Neil, The example is intentionally over simplified, to make a point. Yes, there are better ways of doing that, as for "not truly transactional", that is a scary concept. Having transactions is like being pregnant, you can't be half & half.

21 Jun 2011
07:38 AM

Daniel Lang

Ayende,

I get the point that Raven ist probably the best document-db, but what do you mean with "This isn’t the best RavenDB code, ideally I wouldn’t have to create the session here"? I would like to know, how I could write the same code even better / shorter?

Many thanks in advance!

21 Jun 2011
08:18 AM

Ayende Rahien

Daneil, Take a look at RaccoonBlog sample app (which is also powering this blog), this is an example of what I consider to be well designed RavenDB application. The basic idea is that you don't really need to worry about session life cycle and calling save changes in the controller.

21 Jun 2011
10:48 AM

Neil

Ayende, Understood that it's a simplification of the real problem. As you say, the original snippet is ugly and brittle and I agree that trying to approximate transactions is a not an ideal solution. How then, would you propose working around the issue if changing the entire persistence mechanism is not a justifiable option?

21 Jun 2011
10:50 AM

Ayende Rahien

Neil, I am not. You can't simulate transactions if the db doesn't support it. Then, you are left with either: - Avoid requiring transactions (which can be hard, but is possible) - Choose a db that supports them

21 Jun 2011
11:13 AM

Neil

Thanks Ayende. Makes sense really, choose your db based on your essential requirements.

21 Jun 2011
15:56 PM

Chris Wright

You can definitely write transaction support on top of a non-transactional data store. It just takes far too much time to be useful for most people whose product is not a transactional data store.

21 Jun 2011
16:00 PM

Ayende Rahien

Chris, Well, yes, but while it is also possible to walk from Los Angeles to Chicago, you don't see people do that very often. In fact, by most people perception, "you can't walk from Los Angeles to Chicago" is a true statement.

22 Jun 2011
06:42 AM

Marcel Konnegen

Hi Ayende, I usually don't post very often, but this time, I must say that I am absolutely shocked by how any serious software developer could produce such a "horse excrement" (as you mentioned) in a productive environment. Has the above code really been implemented in a productive environment???

22 Jun 2011
06:57 AM

Ayende Rahien

Marcel, Yes, this has been implemented for production To be fair, it is a stopgap measure while they were researching a better alternative

24 Jun 2011
13:56 PM

Refer to http://www.mongodb.org/display/DOCS/two-phase+commit for 10gen's suggested way to handle multi-doc transaction with MongoDB.

24 Jun 2011
16:55 PM

Ayende Rahien

AJ, You are kidding, right? This still doesn't solve the problem of crashing midway, consider the case of a failure in the middle of step 2.

More to the point, this is a LOT of code, it is VERY complicated, it has tons of failure scenarios, hard to detect bugs, etc.

Sorry, the fact that you can hop on one leg from New York to Las Vegas doesn't imply that this is a viable means of transportation.

24 Jun 2011
17:14 PM

Step 2 is idempotent. A failure of it midway can be simply restarted. It only pushes if not already pushed. So a repeat of step 2 will not harm (by duplication).

All the steps are either atomic or idempotent. Crash is handled either through restart or rollback at each step (detailed in the documentation). Sure it is not pretty or easy. But for rare transactional need on a non-transactional database, one can give it a serious thought.

I am not comparing to RavenDB (or other RDBMS) true transactional feature. Just that it was not tried out well enough in MongoDB by your client.

24 Jun 2011
21:18 PM

Ayende Rahien

AJ, who is going to restart this step? Where is the transaction coordinator? Where is the information about the tx itself is stored?

24 Jun 2011
21:43 PM

The documentation says 'These "repair" jobs should be run at application startup and possibly at regular interval to catch any unfinished transaction.'

any failure between after step 1 and before step 3: Application should get a list of transactions in state "pending" and resume from step 2.
any failure after step 3 and before step 5: Application should get a list of transactions in state "applied" and resume from step 4.

Repair jobs will repair if something is in failed state. If not, no action is taken by that repair step. I will expand on it later.

24 Jun 2011
22:19 PM

Ayende Rahien

AJ, Let us assume that you have just crashed in the middle of step 2. But, let us assume that you have more than a single server running. That means that you can't just "get list of pending tx, or applied tx", because there are other processes that are going to be actually processing it.

What it goes down to is that because MongoDB doesn't have transactions, you have to build your own distributed transaction coordinator with the basic blocks of atomic swap. I am sorry, but I see no point in which it make sense to do something like that for real software. I am willing to bet that most people's attempt to write a transaction manager are going to be riddled with holes for a variety of edge cases, and that is even before we include the fast that they actually recommend adding business logic to the transaction handler part.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB