Raven Xyz: Trying out some ideas

time to read 8 min | 1426 words

One of the things that we are planning for Raven 3.0 is the introducing of additional options. In addition to having RavenDB, we will also have RavenFS, which is a replicated file system with an eye toward very large files. But that isn’t what I want to talk about today. Today I would like to talk about something that is currently just in my head. I don’t even have a proper name for it yet.

Here is the deal, RavenDB is very good for data that you care about individually. Orders, customers, etc. You track, modify and work with each document independently. If you are writing a lot of data that isn’t really relevant on its own, but only as an aggregate, that is probably not a good use case for RavenDB.

Examples for such things include logs, click streams, event tracking, etc. The trivial example would be any reality show, where you have a lot of users sending messages to vote for a particular candidate, and you don’t really care for the individual data points, only the aggregate. Other things might be to want to track how many items were sold in a particular period based on region, etc.

The API that I had in mind would be something like:

   1: foo.Write(new PurchaseMade { Region = "Asia", Product = "products/1", Amount = 23 } );

   2: foo.Write(new PurchaseMade { Region = "Europe", Product = "products/3", Amount = 3 } );

And then you can write map/reduce statements on them like this:

   1: // map

   2: from purchase in purchases

   3: select new

   4: {

   5:     purchase.Region,

   6:     purchase.Item,

   7:     purchase.Amount

   8: }

9:

  10: // reduce

  11: from result in results

  12: group result by new { result.Region, result.Item }

  13: into g

  14: select new

  15: {

  16:     g.Key.Region,

  17:     g.Key.Item,

  18:     Amount = g.Sum(x=>x.Amount)

  19: }

Yes, this looks pretty much like you would have in RavenDB, but there are important distinctions:

We don’t allow modifying writes, nor deleting them.
Most of the operations are assumed to be made on the result of the map/reduce statements.
The assumption is that you don’t really care for each data point.
There is going to be a lot of those data points, and they are likely to be coming in at a relatively high rate.

Thoughts?

Tweet Share Share 27 comments

Tags:

raven

Comments

23 May 2013
13:43 PM

Chris Marisic

I think this is an excellent idea.

I don't think i would use "Write" as the verb, that could easily be confused for users (unless this isn't hanging off session anyway).

Maybe HighFrequencyWrite? I don't know i'm struggling for terms here.

23 May 2013
13:45 PM

Ayende Rahien

Chris, As I said, that isn't something has been decided, it is all pretty nebulous concept right now.

24 May 2013
09:34 AM

Nic Wise

I very much like the idea, and I'd most likey use it right now if it was available (we do a limited amount of this in Raven already)

Not sure if it would be a feature of Raven, or a new product.... either way, tho...

24 May 2013
09:42 AM

Graeme Christie

Sounds like what you are describing is a raven event store... (As in the store for event sourcing patterns such as cqrs/es) Which I think would be a great idea... Having a raven event store that could project to a raven db for the domain model/ read side ... Using ravens own publish/subscribe model for consistency sounds really interesting ...

24 May 2013
12:03 PM

Piers Lawson

Nice idea.

Rather than "Write" what about "Record" and I think the XYZ you record objects in would be a "Log". Logs are well understood as a fast, write only (i.e. no update) analyse later concept.

24 May 2013
12:33 PM

Matt

What about something along the lines of a materialised view? Every write triggers triggers a function that updates the view?

24 May 2013
12:55 PM

Jonas

This is really exciting! something I missed using RavenDB and would use it right away to do analytics. To do these aggregations or queries over large datasets in the past I’ve been importing data into column databases or running Rhino-ETL jobs to aggregate data, very tedious. I could actually see a use for drilling down to see what data points an aggregate is built on.

24 May 2013
12:58 PM

Karhgath

This is a great idea, RavenES (Event Store? RavenStream?) where you can write and read streams of data related to one Id (ContextId? StreamId? - Log file, Aggregate Root, GPS coordinates, etc.) and aggregate the values in map/reduce. Each item related to an Id has a Revision/Sequence and it is a read-only, forward-only stream you can access. You could also access substreams (lets say log entries for a specific day, or an aggregate root events up to a specific revision) but always in order.

What would be cool is if you could easily do an IEnumerable.Aggregate on a stream and it would run server side (for example, rebuild and Aggregate Root from an event stream), or even better run an aggregation and write the result to RavenDB as a document, something like CreateDocumentFromStream? For logs it would be building a stats document, for GPS location maybe an itinerary, etc.

24 May 2013
13:09 PM

DaveNay

I think this is a fantastic idea! This use case is exactly why we ended up not using RavenDB in our application. We need to log lots of information quickly, and then perform off-line ad-hoc queries against that data for statistical data regarding production runs.

24 May 2013
14:13 PM

Khalid Abuhakmeh

I like the idea, but inevitably people are going to be curious as to how they got a certain result. This means they'll want to dive into smaller subsections of the overall stream. The smallest subset would obviously be one document / item.

The idea is solid, but the execution will be more important.

24 May 2013
14:36 PM

Ayende Rahien

Nic, We thinking about making this a separate product.

24 May 2013
14:37 PM

Ayende Rahien

Piers, Yes, we got a bunch of discussions about this, and I think that might very be what we end up calling this. Raven Log, and the method would be Append, or something like that.

24 May 2013
14:38 PM

Ayende Rahien

Matt, That is why I had the map/reduce there. Note that I dislike doing things on the write, better to do that in an async manner.

24 May 2013
14:40 PM

Ayende Rahien

Karhgath, That is pretty much what we had in mind there, yes. The aggregation is meant to be done in the map/reduce.

24 May 2013
14:40 PM

Ayende Rahien

Dave, I am not sure about ad hoc queries, that is something that is generally expensive :-)

24 May 2013
14:41 PM

Ayende Rahien

Khalid, You could get to the individual item, sure. But the question is why / what you would do with them

24 May 2013
14:45 PM

DaveNay

"Ad-Hoc" might be a little too liberal of a term. We have well defined "types" of statistics that we need extracted, but the time-date range is what can shift (i.e. I need a report for last month, last week, last shift, etc)

24 May 2013
14:56 PM

Khalid Abuhakmeh

I guess the better question is what this product will allow you to do?

will it let you see an evolution to the final result? You could do this if you had another mechanism for snapshots based on a frequency set in the map/reduce definition. This gives the developer the ability to set up some form of historical context to their data.

ex. On Monday we saw that we were up 20% from Tuesday. (Graphs).

Will it let you see only the final result?

You could do this if you implemented it with snapshots, or without. Implementing it without snapshots would mean you would only every know the final result.

Time is the context here, and you can either choose to say all results are in the present or embrace time into the architecture. You could do snapshots for the user, or let the user query and save snapshots into another system (RavenDB?) based on their own approaches: scripting, C# client, Ruby, etc.

This system would be perfect for the MarkedUp team (markedup.com). Maybe you should reach out to them and get their thoughts.

24 May 2013
16:15 PM

Jason

This sounds similar to EventStore. Rob Ashton did a recent blog series on using it. http://codeofrob.com/entries/playing-with-the-eventstore.html

How would this new product differ from EventStore?

25 May 2013
02:03 AM

Luke

I've been dying for something like this. Would happily buy it yesterday.

25 May 2013
19:21 PM

Rafal

Did you just invent a bloated rrdtool?

26 May 2013
05:45 AM

Matt Johnson

This reminds me of my sensors sample. https://github.com/mj1856/RavenSensors. I'll echo the others by saying that time is of the essence. One thing Raven isn't good at is querying data over an arbitrary time range. You have to predetermine the granularity of the buckets. If you can improve on this in any way, it would be a big deal.

26 May 2013
08:24 AM

Ayende Rahien

Matt, Arbitrary time ranges are problematic, mostly because they mean that you have to process the entire date range to get something done.

27 May 2013
05:41 AM

Quinton

Any guestimate when the product will be available?

I actually like the "write" verb, as the record is being written.

27 May 2013
11:00 AM

Ayende Rahien

Quinton, This is probably going to be in Raven 3.0

20 Jun 2013
13:43 PM

Alex

Any ideea when Raven 3.0 will be available ? Even as a beta version?

20 Jun 2013
17:03 PM

Ayende Rahien

Alex, RavenDB 3.0 is scheduled for Q1 2014

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB