Data modeling with indexes: Event sourcing–Part II

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Feb 11 2019

Data modeling with indexesEvent sourcing–Part II

time to read 3 min | 500 words

In the previous post I talked about how to use a map reduce index to aggregate events into a final model. You can see this on the right. This is an interesting use case of indexing, and it can consolidate a lot of complexity into a single place, at which point you can utilize additional tooling available inside of RavenDB.

As a reminder, you can get the dump of the database that you can import into your own copy of RavenDB (or our live demo instance) if you want to follow along with this post.

Starting from the previous index, all we need to do is edit the index definition and set the Output Collection, like so:

What does this do? This tell RavenDB that in addition to indexing the data, it should also take the output of the index and create new documents from it in the ShoppingCarts collection. Here is what these documents look like:

You can see at the bottom that this document is flagged as artificial and coming from an index. The document id is a hash of the reduce key, so changes to the same cart will always go to this document.

What is important about this feature is that once the result of the index is a document, we can operate it using all the usual tools for indexes. For example, we might want to create another index on top of the shopping carts, like the following example:

In this case, we are building another aggregation. Taking all the paid shopping carts and computing the total sales per product from these. Note that we are now operating on top of our event streams but are able to extract second level aggregation from the data.

Of course, normal indexes on top of the artificial ShoppingCarts allow you to do things like: “Show me my previous orders”. In essence, you are using the events for your writes, define the aggregation to the final model in an index and then RavenDB take care of the read model.

Some other options to pay attention to is the not doing the read model and the full work on the same database instance as your events. Instead, you can output the documents to a collection and then use RavenDB’s native ETL capabilities to push them to another database (which can be another RavenDB instance or a relational database) for further processing.

The end result is a system that is built on dynamic data flow. Add an event to the system, the index will go through it, aggregate it with other events on the same root and output it to a document, at which point more indexes will pick it up and do further work, ETL will push it to other databases, subscriptions can start operation on it, etc.

Tweet Share Share 8 comments

Tags:

raven
design

Comments

11 Feb 2019
10:32 AM

Kurbein

The document id is a hash of the reduce key, so changes to the same cart will always go to this document

1) How does Raven handle hash collissions?
2) How would I (efficiently) find a certain cart in this secondary collection? I mean, the secondary document id is some unrelated hash that my application has no knowledge about. My agg-root id would still be the cart name (i.e 'carts/294-A'). But wouldn't querying ShoppingCarts collection by cart name trigger a full seach in the collection, instead of a much more efficient lookup by doc-id. Adding a new index to it would solve the problem, but I have a gut-feeling that it is a bit 'over-engineered' to add an index just to perform a basic lookup by doc-id.
Maybe using some predictable concatenation/aggregation/user-specified-formula/whatever of the reduce key would make this scenario easier to work with, at the price of putting the burden of guaranteeing uniqueness on the user.

11 Feb 2019
12:47 PM

Oren Eini

Kurbien, 1) This is explicitly handled. In the rare chance that you'll have a hash collision, we will have document ids that looks like: carts/some-hash/1, carts/some-hash/2, etc, instead of just carts/some-hash, which is the normal behavior. We had to explicitly override the hash generation to be able to test it, since we are also using high quality hash function.

2) You issue a query on this, and RavenDB will build the index for this. We considered using the raw value or letting the user control it, but that led to a lot of complexity. Note that the actual hash is completely predictable, so you can go from the well known value to the generated document id easily enough. But usually querying it will be easiest, and RavenDB will take care of optimizing access behind the scenes.

11 Feb 2019
13:57 PM

Dejan Milicic

This post mentions database backup can be imported into the live instance at http://live-test.ravendb.net

However, when creating a new database from backup, there is no option to upload backup, only to select local one.

11 Feb 2019
15:20 PM

Johannes Rudolph

Thanks for this series. I see some interesting possibilities for offloading rather simple event sourcing operations to ravendb indices completely, i.e. without having to adopt more heavyweight infrastructure/libs or frameworks. What’s missing from the picture for me though is how I’d be able to to do ordered event processing, e.g. based on timestamp.

I always assumed that one of the core benefits of map reduce is that it scales horizontally because ordering is not important. So what options do I have to do ordered event processing with ravendb? I have thought of a separate index to build sorted aggregate docs and the work on those (con: doc size) or pushing it to the application using changes API. Would love to see your take and expand on this in a future post.

11 Feb 2019
15:24 PM

Ryan Rounkles

querying it will be easiest, and RavenDB will take care of optimizing access behind the scenes.

Can you elaborate on this? Does you mean querying the index, or the artificial document?

I’m in the process of using ETL to transform events into a read model right now, and wondering about the latency of ETL vs data subscriptions. It looks like they’re both implemented as server tasks, but does one method have a higher update rate than the other?

12 Feb 2019
10:54 AM

Oren Eini

Dejan, Use the Settings, import for this

12 Feb 2019
10:59 AM

Oren Eini

Ryan,

Querying the artificial documents directly, at which point RavenDB will optimize these queries with an index.

In both cases, the latency from document modification to the subscription / ETL being triggered is pretty much nil. The question is more about the processing time here. ETL is typically running entirely on the server and can push things faster out. However, it operates on the delete/create model, and you don't control the generated ids. Each time the source document is updated, the destination document match id changes.

Subscription will get the documents that were changed, and then can act on it however it wants. Which give you more freedom and flexibility.If you want to write it back to the server, you can do that, but will require another round trip. In most cases, I don't think it matters at all.

12 Feb 2019
11:12 AM

Oren Eini

Johannes, Time is pretty hard to handle in a distributed system. And ordered events processing isn't a good idea when you do it in a distributed environment. The main issue is that you may get updates out of order. Now, you can try to sort them by time, but when a new event comes in, you'll have to revert to the previous state before that event, and then reply everything from that point forward.

In most cases, this isn't actually required because the aggregation doesn't care about the ordering itself. For example, aggregating transactions on account to get final tally. Let's take a case where it does matter. Paying the mortgage. If you pay a mortgage late, it works very differently than paying on time. However, if you paid on time and the event didn't go through (which happens a lot), you need to reverse all the state changes because of the lateness. I would deal with that in two stages. First, we define an index that operate over the events and uses the (load-id, month) as the key. That outputs the state of the loan in a particular month and output it to an artificial collection. Then you have another index that operate on those and aggregate the overall state of the loan. This way, a missed payment will show up (and I don't care about the order) and if the money was paid, the first index will be updated and then the second one, resulting in the behavior we want.

I'll write a blog post on this, since this is an interesting topic and require a proper example

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Data modeling with indexesEvent sourcing–Part II

More posts in "Data modeling with indexes" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Data modeling with indexes" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication