Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 69

filter by tags archive

Raven StreamsAggregations–from the system point of view

time to read 5 min | 941 words

Previously we introduced the following index (not sure if that would be a good name, maybe aggregation or projection?):

   1: from msg in messages
   2: select new
   3: {
   4:     Customer = msg.From,
   5:     Count = 1
   6: }
   7:  
   8: from result in results
   9: group result by result.Customer
  10: into g
  11: select new
  12: {
  13:     Customer = g.Key,
  14:     Count = g.Sum(x=>x.Count)
  15: }

Let us consider how it works from the point of view of the system. The easy case is when we have just one event to work on. We are going to run it through the map and then through the reduce, and that would be about it.

What about the next step? Well, on the second event, all we actually need to do is run it through the map, then run it and the previous result through the reduce. The only thing we need to remember is the final result. No need to remember all the pesky details in the middle, because we don’t have the notion of updates / deletes to require them.

This make the entire process so simple it is ridiculous. I am actually looking forward to doing this, if only because I have to dig through a lot of complexity to get RavenDB’s map/reduce’s indexes to where they are now.

More posts in "Raven Streams" series:

  1. (06 Jun 2013) What to do with the data?
  2. (05 Jun 2013) Aggregations–from the system point of view
  3. (04 Jun 2013) aggregations–how the user sees them

Comments

Brian Vallelunga

Will there be a time component to this system? What about creating an index that shows data for the last 5 minutes? Obviously that index would have to be updated automatically on the server. If something like this could be done, that might be very useful for metric tracking.

I believe this is what StreamInsight is used for and am just beginning to research the possibilities.

nuodb

ayende, what your opinion about http://www.nuodb.com/? It's not specifically related with the things you comment in this series of posts, but they tell that supports ACID and RavenDB has eventual consistency...

Ayende Rahien

Brian, Time sensitive stuff is pretty important. As I mentioned, I have absolutely no idea if / whatever this will go forward, but I would like to do it like that. If you just want to get aggregation over the last N time, that should be pretty easy to do, I guess.

Ayende Rahien

RavenDB is ACID. I haven't looked deeply into NouDB.

nuodb

I know that RavenDB is ACID too. I try to say that NuoDB is ACID and consistent. What happes with consistency? I found this post interesting http://lostechies.com/jimmybogard/2013/05/15/eventual-consistency-in-rest-apis/

Ayende Rahien

I think you are missing something. Even from brief cursory read in the docs it was quite apparent that using noudb is not going to produce consistent results in failure modes, and good luck with doing aggregation on a distributed network, or joins across that. There is a reason that the model just not going to work.

nuodb

I'm sure that is possible that I'm missing something ;) but extracted from his site: "And of course, you can rely on NuoDB to provide 100% Atomic, Consistent, Isolated, and Durable (ACID) transactions.". It isn't this contradict with the impressions you get from the docs?

Ayende Rahien

That is pretty much marketing only. ACID transactions and consistency are two very different things. For example, you can't get distributed consistent reads across a cluster in the presence of a failure.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - one day from now
  2. Production postmortem: The industry at large - about one day from now
  3. The insidious cost of allocations - 3 days from now
  4. Buffer allocation strategies: A possible solution - 6 days from now
  5. Buffer allocation strategies: Explaining the solution - 7 days from now

And 3 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    01 Sep 2015 - The case of the lying configuration file
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats