Ayende @ Rahien

It's a girl

Raven Streams: Aggregations–from the system point of view

Previously we introduced the following index (not sure if that would be a good name, maybe aggregation or projection?):

   1: from msg in messages
   2: select new
   3: {
   4:     Customer = msg.From,
   5:     Count = 1
   6: }
   7:  
   8: from result in results
   9: group result by result.Customer
  10: into g
  11: select new
  12: {
  13:     Customer = g.Key,
  14:     Count = g.Sum(x=>x.Count)
  15: }

Let us consider how it works from the point of view of the system. The easy case is when we have just one event to work on. We are going to run it through the map and then through the reduce, and that would be about it.

What about the next step? Well, on the second event, all we actually need to do is run it through the map, then run it and the previous result through the reduce. The only thing we need to remember is the final result. No need to remember all the pesky details in the middle, because we don’t have the notion of updates / deletes to require them.

This make the entire process so simple it is ridiculous. I am actually looking forward to doing this, if only because I have to dig through a lot of complexity to get RavenDB’s map/reduce’s indexes to where they are now.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Brian Vallelunga
06/05/2013 02:57 PM by
Brian Vallelunga

Will there be a time component to this system? What about creating an index that shows data for the last 5 minutes? Obviously that index would have to be updated automatically on the server. If something like this could be done, that might be very useful for metric tracking.

I believe this is what StreamInsight is used for and am just beginning to research the possibilities.

nuodb
06/06/2013 08:00 AM by
nuodb

ayende, what your opinion about http://www.nuodb.com/? It's not specifically related with the things you comment in this series of posts, but they tell that supports ACID and RavenDB has eventual consistency...

Ayende Rahien
06/06/2013 12:07 PM by
Ayende Rahien

Brian, Time sensitive stuff is pretty important. As I mentioned, I have absolutely no idea if / whatever this will go forward, but I would like to do it like that. If you just want to get aggregation over the last N time, that should be pretty easy to do, I guess.

Ayende Rahien
06/06/2013 12:19 PM by
Ayende Rahien

RavenDB is ACID. I haven't looked deeply into NouDB.

nuodb
06/07/2013 07:04 AM by
nuodb

I know that RavenDB is ACID too. I try to say that NuoDB is ACID and consistent. What happes with consistency? I found this post interesting http://lostechies.com/jimmybogard/2013/05/15/eventual-consistency-in-rest-apis/

Ayende Rahien
06/07/2013 07:15 AM by
Ayende Rahien

I think you are missing something. Even from brief cursory read in the docs it was quite apparent that using noudb is not going to produce consistent results in failure modes, and good luck with doing aggregation on a distributed network, or joins across that. There is a reason that the model just not going to work.

nuodb
06/07/2013 09:14 AM by
nuodb

I'm sure that is possible that I'm missing something ;) but extracted from his site: "And of course, you can rely on NuoDB to provide 100% Atomic, Consistent, Isolated, and Durable (ACID) transactions.". It isn't this contradict with the impressions you get from the docs?

Ayende Rahien
06/07/2013 06:35 PM by
Ayende Rahien

That is pretty much marketing only. ACID transactions and consistency are two very different things. For example, you can't get distributed consistent reads across a cluster in the presence of a failure.

Comments have been closed on this topic.