Ayende @ Rahien

My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969


Posts: 5,947 | Comments: 44,541

filter by tags archive

Raven Xyz: Trying out some ideas

One of the things that we are planning for Raven 3.0 is the introducing of additional options. In addition to having RavenDB, we will also have RavenFS, which is a replicated file system with an eye toward very large files. But that isn’t what I want to talk about today. Today I would like to talk about something that is currently just in my head. I don’t even have a proper name for it yet.

Here is the deal, RavenDB is very good for data that you care about individually. Orders, customers, etc. You track, modify and work with each document independently. If you are writing a lot of data that isn’t really relevant on its own, but only as an aggregate, that is probably not a good use case for RavenDB.

Examples for such things include logs, click streams, event tracking, etc. The trivial example would be any reality show, where you have a lot of users sending messages to vote for a particular candidate, and you don’t really care for the individual data points, only the aggregate. Other things might be to want to track how many items were sold in a particular period based on region, etc.

The API that I had in mind would be something like:

   1: foo.Write(new PurchaseMade { Region = "Asia", Product = "products/1", Amount = 23 } );
   2: foo.Write(new PurchaseMade { Region = "Europe", Product = "products/3", Amount = 3 } );

And then you can write map/reduce statements on them like this:

   1: // map
   2: from purchase in purchases
   3: select new
   4: {
   5:     purchase.Region,
   6:     purchase.Item,
   7:     purchase.Amount
   8: }
  10: // reduce
  11: from result in results
  12: group result by new { result.Region, result.Item }
  13: into g
  14: select new
  15: {
  16:     g.Key.Region,
  17:     g.Key.Item,
  18:     Amount = g.Sum(x=>x.Amount)
  19: }

Yes, this looks pretty much like you would have in RavenDB, but there are important distinctions:

  • We don’t allow modifying writes, nor deleting them.
  • Most of the operations are assumed to be made on the result of the map/reduce statements.
  • The assumption is that you don’t really care for each data point.
  • There is going to be a lot of those data points, and they are likely to be coming in at a relatively high rate.



Chris Marisic

I think this is an excellent idea.

I don't think i would use "Write" as the verb, that could easily be confused for users (unless this isn't hanging off session anyway).

Maybe HighFrequencyWrite? I don't know i'm struggling for terms here.

Ayende Rahien

Chris, As I said, that isn't something has been decided, it is all pretty nebulous concept right now.

Nic Wise

I very much like the idea, and I'd most likey use it right now if it was available (we do a limited amount of this in Raven already)

Not sure if it would be a feature of Raven, or a new product.... either way, tho...

Graeme Christie

Sounds like what you are describing is a raven event store... (As in the store for event sourcing patterns such as cqrs/es) Which I think would be a great idea... Having a raven event store that could project to a raven db for the domain model/ read side ... Using ravens own publish/subscribe model for consistency sounds really interesting ...

Piers Lawson

Nice idea.

Rather than "Write" what about "Record" and I think the XYZ you record objects in would be a "Log". Logs are well understood as a fast, write only (i.e. no update) analyse later concept.


What about something along the lines of a materialised view? Every write triggers triggers a function that updates the view?


This is really exciting! something I missed using RavenDB and would use it right away to do analytics. To do these aggregations or queries over large datasets in the past I’ve been importing data into column databases or running Rhino-ETL jobs to aggregate data, very tedious. I could actually see a use for drilling down to see what data points an aggregate is built on.


This is a great idea, RavenES (Event Store? RavenStream?) where you can write and read streams of data related to one Id (ContextId? StreamId? - Log file, Aggregate Root, GPS coordinates, etc.) and aggregate the values in map/reduce. Each item related to an Id has a Revision/Sequence and it is a read-only, forward-only stream you can access. You could also access substreams (lets say log entries for a specific day, or an aggregate root events up to a specific revision) but always in order.

What would be cool is if you could easily do an IEnumerable.Aggregate on a stream and it would run server side (for example, rebuild and Aggregate Root from an event stream), or even better run an aggregation and write the result to RavenDB as a document, something like CreateDocumentFromStream? For logs it would be building a stats document, for GPS location maybe an itinerary, etc.


I think this is a fantastic idea! This use case is exactly why we ended up not using RavenDB in our application. We need to log lots of information quickly, and then perform off-line ad-hoc queries against that data for statistical data regarding production runs.

Khalid Abuhakmeh

I like the idea, but inevitably people are going to be curious as to how they got a certain result. This means they'll want to dive into smaller subsections of the overall stream. The smallest subset would obviously be one document / item.

The idea is solid, but the execution will be more important.

Ayende Rahien

Nic, We thinking about making this a separate product.

Ayende Rahien

Piers, Yes, we got a bunch of discussions about this, and I think that might very be what we end up calling this. Raven Log, and the method would be Append, or something like that.

Ayende Rahien

Matt, That is why I had the map/reduce there. Note that I dislike doing things on the write, better to do that in an async manner.

Ayende Rahien

Karhgath, That is pretty much what we had in mind there, yes. The aggregation is meant to be done in the map/reduce.

Ayende Rahien

Dave, I am not sure about ad hoc queries, that is something that is generally expensive :-)

Ayende Rahien

Khalid, You could get to the individual item, sure. But the question is why / what you would do with them


"Ad-Hoc" might be a little too liberal of a term. We have well defined "types" of statistics that we need extracted, but the time-date range is what can shift (i.e. I need a report for last month, last week, last shift, etc)

Khalid Abuhakmeh

I guess the better question is what this product will allow you to do?

  1. will it let you see an evolution to the final result? You could do this if you had another mechanism for snapshots based on a frequency set in the map/reduce definition. This gives the developer the ability to set up some form of historical context to their data.

ex. On Monday we saw that we were up 20% from Tuesday. (Graphs).

  1. Will it let you see only the final result?

You could do this if you implemented it with snapshots, or without. Implementing it without snapshots would mean you would only every know the final result.

Time is the context here, and you can either choose to say all results are in the present or embrace time into the architecture. You could do snapshots for the user, or let the user query and save snapshots into another system (RavenDB?) based on their own approaches: scripting, C# client, Ruby, etc.

This system would be perfect for the MarkedUp team (markedup.com). Maybe you should reach out to them and get their thoughts.


This sounds similar to EventStore. Rob Ashton did a recent blog series on using it. http://codeofrob.com/entries/playing-with-the-eventstore.html

How would this new product differ from EventStore?


I've been dying for something like this. Would happily buy it yesterday.


Did you just invent a bloated rrdtool?

Matt Johnson

This reminds me of my sensors sample. https://github.com/mj1856/RavenSensors. I'll echo the others by saying that time is of the essence. One thing Raven isn't good at is querying data over an arbitrary time range. You have to predetermine the granularity of the buckets. If you can improve on this in any way, it would be a big deal.

Ayende Rahien

Matt, Arbitrary time ranges are problematic, mostly because they mean that you have to process the entire date range to get something done.


Any guestimate when the product will be available?

I actually like the "write" verb, as the record is being written.

Ayende Rahien

Quinton, This is probably going to be in Raven 3.0


Any ideea when Raven 3.0 will be available ? Even as a beta version?

Ayende Rahien

Alex, RavenDB 3.0 is scheduled for Q1 2014

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats