Raven & Event Sourcing

time to read 8 min | 1454 words

I keep trying to work on the replication bundle for Raven, but I keep getting distracted with more interesting stuff.

In this case, I kept coming back to several discussions that I had with people who want to use Raven for storing events, and were thinking about how to go from a stream of events to a complete aggregate. I kept thinking that Raven should already be able to handle that. And indeed it can, quite easily, it turns out.

Raven is already capable of running operations over a stream of document to produce a value, to go from there to event stream producing an aggregate is easy. The only problem was that we needed to support external views. That was easy enough to do, so let me show what we have now.

Let us assume that we have the events shown on the right stored in Raven, as you can see, this is a stream of events for a shopping cart. What we want to have is to go from there to an actual shopping cart.

We define the following view:

I am showing the class diagram here to show you all the types that are involved here. Note that ShoppingCart has AddToCart and RemoveFromCart method, which has the typical implementation.

Now, let us look at the actual view code:

    [DisplayName("Aggregates/ShoppingCart")]
    public class ShoppingCartEventsToShopingCart : AbstractViewGenerator
    {
        public ShoppingCartEventsToShopingCart()
        {
            MapDefinition = docs => docs.Where(document => document.For == "ShoppingCart");
            GroupByExtraction = source => source.ShoppingCartId;
            ReduceDefinition = Reduce;

            Indexes.Add("Id", FieldIndexing.NotAnalyzed);
            Indexes.Add("Aggregate", FieldIndexing.No);
        }

        private static IEnumerable<object> Reduce(IEnumerable<dynamic> source)
        {
            foreach (var events in source
                .GroupBy(@event => @event.ShoppingCartId))
            {
                var cart = new ShoppingCart { Id = events.Key };
                foreach (var @event in events.OrderBy(x => x.Timestamp))
                {
                    switch ((string)@event.Type)
                    {
                        case "Create":
                            cart.Customer = new ShoppingCartCustomer
                            {
                                Id = @event.CustomerId,
                                Name = @event.CustomerName
                            };
                            break;
                        case "Add":
                            cart.AddToCart(@event.ProductId, @event.ProductName, (decimal)@event.Price);
                            break;
                        case "Remove":
                            cart.RemoveFromCart(@event.ProductId);
                            break;
                    }
                }
                yield return new
                {
                    cart.Id,
                    Aggregate = JObject.FromObject(cart)
                };
            }
        }
}

We are doing several interesting things happening in the constructor:

The display name is the name of the index.
In the constructor, we define the map part as filtering for events for the shopping cart.
We will create a shopping cart per shopping cart id, so we specify the group by extraction method. Raven will use that to optimize updates.
Note the indexes definition, we want to id to be stored as as a primary key, and the aggregate data to be stored, not analyzed for searching.

Now, let us talk about the interesting bits, the Reduce function.

That function should be pretty easy to follow, I think. We are getting a stream of events, grouping them by their shopping cart id. Then, for each shopping cart, we sort the events by date, and proceed to build the aggregate.

Finally, we return the data so Raven will store it in the index.

The result, by the way, is this:

I think this is cool.

Using this approach, Raven will automatically keep the aggregate definition up to date with the event streams coming on. Furthermore, that aggregate will only be computed when a change happen, so accessing it is very cheap.

Finally, if we have a storm of events on a particular shopping cart, we can choose whatever to wait and see it in its most version, or get a potentially stale view of it really fast.

Tweet Share Share 22 comments

Tags:

Raven

Comments

30 May 2010
09:37 AM

Uriel Katz

this is something like couchdb materialized views?

30 May 2010
10:37 AM

Markus Zywitza

This is really cool and tips my balances in favor of RavenDB against NH again.

I have one remark though: You should mention that you violate OCP voluntarily to keep the code simple and that in production the switch statement should be replaced by an extensible approach, i.e visitor or strategy patterns.

30 May 2010
11:02 AM

Ayende Rahien

Uriel,

All of Raven's indexes are similar to materialized views

30 May 2010
11:03 AM

Ayende Rahien

Markus,

Yeah, pretty much. There is a much more extensive example that does it in a manner that is much nicer that is currently working up.

This is just to show how things are working

30 May 2010
11:13 AM

Uriel Katz

So what is the difference between this and a regular raven index?

30 May 2010
12:33 PM

Ayende Rahien

Uriel,

It isn't, really. It is just a different way to build the index, that is all.

30 May 2010
13:19 PM

tobi

I'd be interested in what amount of information in the materialized view/index will be recalculated if the underlying "table" changes. Say, we were to use raven for logging all of our http requests and we had an index "count of requests per day". Would the index be updated incrementally?

30 May 2010
14:54 PM

Thomas Krause

In which process does this view-generator run? Since raven needs to keep the index up to date, I'm assuming it has to run in raven's process?

If so, do you have something like a plugin/extension system where you drop some dll somewhere or how does it work?

How would you define this index using ravens webinterface?

I have to add, that I never tried raven or looked at the code and so I don't have a lot of knowledge of its internal structures besides the post of yours. I'm just curious ;-)

30 May 2010
15:15 PM

Rob

Beautiful.

30 May 2010
16:38 PM

zihotki

Yes, this code is beautiful, really beautiful. And my sql oriented brain almost blew up when I tried to understand the process.

30 May 2010
17:27 PM

Ayende Rahien

Tobi,

Yes, the index update on every new commit, and it is updated in an efficient incremental manner.

In your case, it would only update the current day.

30 May 2010
17:29 PM

Ayende Rahien

Thomas,

This runs in Raven's process.

Yes, Raven has a plugin mechanism, drop a dll to Plugins/, and everything works.

Defining indexes via the web interface:

www.ravendb.net/documentation/docs-http-indexes

30 May 2010
18:08 PM

tobi

Could you comment a bit more on why you prefer using raven in a document oriented way instead of in a relational way? It seems to me that it would result in less code if you were to keep everything normalized and dry. I know that performance might not be that good but disregarding that I think it would be a superior approach. You would just define indexes for all access path' to the data that you have so it still would be lightning fast. (you might notice that development speed is my major concern because I do not develop apps with huge data sizes. This is certainly a common scenario that might justify a dedicated post... ;-) ).

30 May 2010
18:10 PM

tobi

And I want to add that this series is really starting to get my attention! This project is certainly having a pragmatic lead.

30 May 2010
18:11 PM

Ayende Rahien

Tobi,

Try writing something like this on a relational backend.

The performance difference would be pretty big.

Moreover, the amount of effort that you'll have to go through to get things done is extremely non trivial.

Then try to add three new events, and see what happens.

30 May 2010
18:36 PM

tobi

Ok, I get it for event processing, but I am more interested in the day to day work of users, orders, products... Some of us still have to get their hand dirty with boring stuff to pay the rent ;-)

30 May 2010
18:42 PM

Ayende Rahien

Tobi,

Take a look at my other posts on the subject.

Look at the NoSQL category

30 May 2010
19:47 PM

Demis Bellot

That's a pretty sweet and elegant example, hard to get it done with any less effort.

Though I'm not sure about modifying the Raven Process with custom application logic .dll's - seems like a pretty fragile/state-full approach that would cause some pain during deployment, versioning, backing up, upgrading etc.

Just trying to work out how efficient this solution is since I can't see the optimal code-path, i.e. when a user adds an Item to an existing cart does it rebuild the entire aggregate for all users shopping carts or just the one modified? or is that what the 'GroupByExtraction' lambda is for?

30 May 2010
19:54 PM

Ayende Rahien

Demis

Versioning is actually pretty easy, you just maintain the old index and deploy a new dll with the new index under a different name.

The code needs to be written to be able to handle all users, but in practice, what we are doing is only re-executing the documents for the current user, not for all users.

The reason that the code is written this way is that wew want to allow ourselves the option to do major optimizations down the road, where we can just shove data through this without having to do sorting up front.

14 Jun 2010
21:08 PM

Greg Young

Oren,

Looks good but generally I would run far far away from using TimeStamps to order and would instead use version numbers.

Cheers,

Greg

14 Jun 2010
22:09 PM

Greg Young

to be clear the problem is actually shown in your sample events. you have two with the same timestamp (create/add) and providing deterministic ordering in such a case is problematic.

16 Jun 2010
06:14 AM

Ayende Rahien

Greg,

Agreed. Version numbering with Raven is a bit interesting, basically you need a separate document to do the versioning, but it is pretty easy all around.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB