Optimizing event processing

time to read 4 min | 758 words

During the RavenDB Days conference, I got a lot of questions from customers. Here is one of them.

There is a migration process that deals with event sourcing system. So we have 10,000,000 commits with 5 – 50 events per commit. Each event result in a property update to an entity.

That gives us roughly 300,000,000 events to process. The trivial way to solve this would be:

foreach(var commit in YieldAllCommits())
{
    using(var session = docStore.OpenSession())
    {
        foreach(var evnt in commit.Events)
        {
            var entity = evnt.Load<Customer>(evnt.EntityId);
            evnt.Apply(entity);
        }
        session.SaveChanges();
    }
}

That works, but it tends to be slow. Worse case here would result in 310,000,000 requests to the server.

Note that this has the nice property that all the changes in a commit are saved in a single commit. We’re going to relax this behavior, and use something better here.

We’ll take the implementation of this LRU cache and add an event for dropping from the cache and iteration.

usging(var bulk = docStore.BulkInsert(allowUpdates: true))
{
    var cache = new LeastRecentlyUsedCache<string, Customer>(capacity: 10 * 1000);
    cache.OnEvict = c => bulk.Store(c);
    foreach(var commit in YieldAllCommits())
    {
        using(var session = docStore.OpenSession())
        {
            foreach(var evnt in commit.Events)
            {
                Customer entity;
                if(cache.TryGetValue(evnt.EventId, out entity) == false)
                {
                    using(var session = docStore.OpenSession())
                    {
                        entity = session.Load<Customer>(evnt.EventId);
                        cache.Set(evnt.EventId, entity);
                    }
                }
                evnt.Apply(evnt);
            }
        }
    }
    foreach(var kvp in cache){
        bulk.Store(kvp.Value);
    }
}

Here we are using a cache of 10,000 items. With the assumption that we are going to have clustering for events on entities, so a lot of changes on an entity will happen on roughly the same time. We take advantage of that to try to only load each document once. We use bulk insert to flush those changes to the server when needed. This code will handle the case where we flushed out a document from the cache then we get events for it again, but he assumption is that this scenario is much lower.

Tweet Share Share 18 comments

Tags:

Comments

25 Sep 2014
10:45 AM

Gleb

I'm curious, what happens if the following situation take place: A - commit is applied and entity E is updated in cache B - someone updates E on the server, but the cache has no idea C - you finally apply changes from the cache ? Would changes from step B be lost?

25 Sep 2014
12:11 PM

Jakub Konecki

I wonder which LRU implementation you've used. The one linked doesn't contain OnEvict.

Or is this supposed to be a pseudo-code?

Anyway, it is a very nice idea!

25 Sep 2014
12:18 PM

Jakub Konecki

Gleb,

I believe this example assumes that the current process is the only one writing to the server. Think about rebuilding projection in CQRS system.

If someone was modifying the Customer it wouldn't happen in the RavenDB but another event would be added to the event source and will appear in the future in YieldAllCommits().

Jakub

25 Sep 2014
12:32 PM

Jakub Konecki

Ayende, what do you think about this idea:

On cache eviction we could bulk update bottom x% of the cache in order to avoid a lot of single updates once the cache is full.

Jakub

26 Sep 2014
08:17 AM

Ayende Rahien

Gleb, The idea is that this is the only process that does this.

26 Sep 2014
08:17 AM

Ayende Rahien

Jakub, The one I linked to, yes. I said, you need to add the enumeration and the OnEvict event. Both should be very trivial to add.

26 Sep 2014
08:18 AM

Ayende Rahien

Jakub, You don't need to do batching yourself, the bulk insert will take care of that for you. RavenDB is designed to be obvious an efficient, and that is one of the ways it does so.

26 Sep 2014
09:04 AM

Dominic Zukiewicz

For the evicted code sample, you can add the following code to add the OnEvict event:

The argument class is this:

And the code for NodeEvictedArgs is:

public class NodeEvictedArgs<TKey, TValue> : EventArgs { public NodeEvictedArgs(TKey key, TValue value) { this.Key = key; this.Value = value; }

    public TKey Key { get; private set; }
    public TValue Value { get; private set; }
}

The additional event handler code is this:

   public event EventHandler<NodeEvictedArgs<TKey, TValue>> OnEvict;

    protected void OnEvicted(Node node)
    {
        if (this.OnEvict != null)
            OnEvict(this, new NodeEvictedArgs<TKey, TValue>(node.Key, node.Value));
    }

And change the Set() method in the LRU to include firing the event:

public void Set(TKey key, TValue value) { Node entry; if (!entries.TryGetValue(key, out entry)) { entry = new Node { Key = key, Value = value }; if (entries.Count == capacity) { entries.Remove(tail.Key); OnEvicted(tail);

                tail = tail.Previous;
                if (tail != null) tail.Next = null;
            }
            entries.Add(key, entry);
        }

        entry.Value = value;
        MoveToHead(entry);
        if (tail == null) tail = head;
    }

26 Sep 2014
13:45 PM

Dmitry

I would load entity and apply events only in OnEvict. It helps keep entity associated with session and session lifetime will be short. It doesn't matter for RavenDb, but it may be required if entity store is some RDBMS

26 Sep 2014
13:49 PM

Ayende Rahien

Dmitry, That would mean a LOT of connections to the server. On the other hand, with Bulk Insert, you have just one.

26 Sep 2014
14:06 PM

Dmitry

Ayende, imho the number of open connections to server will be equal to number of concurrent OnEvict calls.

cache.OnEvict = item => { var id = item.Id; var events = item.Events;

using(var session = entityStore.OpenSession())
{
    var entity = session.Load<Customer>(id);
    foreach(var evnt in events)
    {
        evnt.Apply(entity);
    }

    session.SaveChanges();
}

};

26 Sep 2014
14:08 PM

Ayende Rahien

Dmitry, There are no concurrent OnEvict calls. Your code will require a lot of back & forth to the server.

Setting up & tearing down http connections isn't cheap.

26 Sep 2014
14:20 PM

Dmitry

Ayende, may be I should take a look at OnEvict usage... I expected it is called every time LRU needs some space... Another pros to your idea that you apply event immediately to entity and I need to keep list of events which may consume lot more memory. But setting up http connection shouldn't be an issue when "keep-alive" does it work.

26 Sep 2014
14:22 PM

Ayende Rahien

Dmitry, OnEvict is called when the LRU needs some space, yes, it evict the entity (loaded once, maybe modified many times) and store it in the bulk insert connection. The bulk insert has a single connection, and is very optimized for bulk ops.

You can't get faster than that.

30 Sep 2014
19:53 PM

Dan

Hope I'm not getting OT, but 1 event per property change on an event sounds extreme to me. If the datastore can handle, I guess it doesn't make a difference how granular the events are, but still, would this be considered a typical practice in event driven systems?

30 Sep 2014
20:10 PM

Dan

WOops, I meant "1 event per property change on an entity"

01 Oct 2014
02:17 AM

Ayende Rahien

Dan, An event might be something like "order created", which is handled by business logic to run it on the customer (maybe upgrading the customer to Gold status. It usually isn't something like "set status = gold", there is BL involved.

06 Oct 2014
19:37 PM

Greg Young

There is a bad thing about this as well in that there is no way to reliably checkpoint the stream that you are reading from. What this means in practice is that you have to do the entire replay in one go and can't pause in the middle for any reason. This is a reasonable constraint in some conditions and not reasonable in others. A slightly modified version of this that either periodically flushed the LRU or tracked the lowest applied value & only supported idempotent handlers could fix this relatively easily if needed.

"Hope I'm not getting OT, but 1 event per property change on an event sounds extreme to me. If the datastore can handle, I guess it doesn't make a difference how granular the events are, but still, would this be considered a typical practice in event driven systems?"

It is extreme and there are lots of other downsides to such granular events such as they lose the context of the original operation. Knowing that field changed is often quite useless.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB