Set based operations with RavenDB
One of the most common problems that people run into with NoSQL databases is the inability to perform set based operations. A typical example of that would be:
DELETE FROM Users WHERE LastLogin < '2009-01-01' UPDATE Users SET IsActive = 0 WHERE LastLogin < '2010-01-01'
The problem is that most NoSQL solutions really have no notion about set based operations, but they are something that:
- Users really like
- Can significantly simplify users’ code
- Drastically reduce the number of remote calls
With all of that, I decided that I really want RavenDB to support SET based operations. And now it does :-)
The secret for set based operations is that you need two things:
- What should you execute the operation on? (defining the set that you are working on). This is the WHERE clause.
- What is the operation?
We will start with DELETE, because that is that operation is very simple to understand. So we only need to figure out what the set we operate on is. Well, Raven already have a way to express WHERE clauses, using its Indexing system. There is no reason why we can’t use the same mechanism for that.
Here is how we would execute the first statement using Raven. First, we need to define the appropriate index:
// Users/LastLogin from user in docs.Users select new { user.LastLogin }
And now we can issue an HTTP DELETE call:
DELETE /bulk_docs/Users/LastLogin?query=LastLogin:[00010101000000000 TO 20090101000000000]&allowStale=True
This asks Raven do delete all the documents where the LastLogin is between 01-01-0001 and 01-01-2009.
There is one important thing to note here, Raven’s indexes allow stale reads. But SET based operations will not work on stale indexes by default.
In this case, I am pretty sure that the index is capable of handling requests that are that far in the past, so I can safely add the allowStale=True. If we remove that, or specify allowStale=False, the operation will only succeed if the index is up to date. We can also specify a cutoff date for the stale check. This is similar to how this works everywhere with Raven.
So far, this has been pretty easy, what about the next step? Updating a set of documents? That presented some challenges, how do you define the operation?
Luckily, Raven has support for the PATCH command, which means that you can specify an update command very easily. Using the same index, we can specify:
PATCH /bulk_docs/Users/LastLogin?query=LastLogin:[00010101000000000 TO 20100101000000000]&allowStale=True
[ { "Type": "Set", "Name": "IsActive", "Value": false } ]
Which will set the IsActive property to false on all the documents whose LastLogin is before 2010.
And yes, you can execute those operations using the Client API as well.
Comments
Very nice, that will definitely be useful. But, those are trivial examples. What about more complex ones, e.g. where there is a join involved or "dynamic" values.
e.g. a update operation like
update x set version = version + 1, deleted_date = getutcdate() where id in (select xid from y)
typo in your docs:
"As you can see though, the IsStale flag is still false, because in the mean time somebody else has added yet another blog entry."
that should be "still TRUE"
Dennis,
There is no such things as joins for Raven.
For version = version +1, you can use Increment
For id in (...), you just pass the list of ids
Fixed, thanks
Offtopic, but do you have any plans to talk about your experiences with MEF? I was surprised when I saw you were using it in RavenDB... When I dig into the docs I feel like its just another IoC container except with different naming semantics. well and the docs feel mostly like promotional materials too much fluff. It would be good to get your take on it.
I think one of the primary reasons that most of the NoSQL providers dont provide set operations is because they're actually NoACID providers. What I mean here is that because for distributed scenarios we'll have varying levels of consistency, and as a result, an operation like "UPDATE a WHERE b" is bound to run into issues. You deal with this to some degree with the fact that we can have stale indicies, but for a multi-server scenario, you'd need to do one of the following:
1) always have allowStale = TRUE
2) implement some form of consistent reads. To do this you'd have to have some form of lock server (BigTable has a distributed lock server, i'm assuming SimpleDB does the same)
Have you thought about the multi-server scenario much Ayende? It seems like you've focused RavenDB more on being a document DB than a distributed DB.
Is there a chance that RavenDB works in MS Azure? Or do you have planned to support MS Azure as a plattform for RavenDB?
schorsch,
Yes, it should
Comment preview