Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 10 | Comments: 37

filter by tags archive

The next BIG Thing in RavenDB 1.2

time to read 1 min | 193 words

Is something that you probably wouldn’t even notice, to be perfectly honest. We are going to work on our map/reduce implementation.

This is freakishly complex, because we need to do updatable, persistent map/reduce. It got so complex that I decided that I can’t really implement this on my own in the RavenDB solution and moved to spiking the solution in isolation.

If you care, you can look at this here. There would be additional optimizations to worry about in RavenDB, but it is pretty much all there, in less than 400 lines of code.

I couldn’t find anything out there which was nearly as useful. Most of the map/reduce implementations are about distributing the work load and scheduling it. None of them really deal with the notion of updatable map/reduce results.

Note that the Storage layer there is both only there for the sole purpose of actually showing we can persist and restart from any point and also has critical behavior in its behavior (for example, scheduling).

I’ll probably do a set of posts about this, but for now, here is the source, have fun poking at it: https://github.com/ayende/updatable-persistent-map-reduce


Comments

Brian

Really?

static void Main() { foreach (var directory in Directory.GetDirectories(".")) { try { Directory.Delete(directory, true); } catch (Exception) { } <snip...> }

Nicolas

Take a look at this: https://github.com/nathanmarz/storm/

Maybe is not what you are looking for, or in the same language for that matter, but maybe it can give you some ideas... Or maybe not!

Daniel Lang

One shouldn't run this exe from C:/ with admin privileges... lol

Ayende Rahien

Brian, Yes... ? This is there to make sure that we clear old results from the previous run.

Starfish

I trust there are no "breaking" changes?

Ayende Rahien

Starfish, Not outward facing, not from this.

Brian

Oren, then might I suggest you at least put some sanity check around that block. Even though this is not production code, that's just leaving a loaded gun lying around (it's irresponsible coding and if you were reviewing any code that did this I'll bet you'd have blasted the author, too).

Ayende Rahien

Brian, That code is there to make my life easier. This project is there as a POC. I am not going to worry about it.

Matt Warren

"I couldn’t find anything out there which was nearly as useful. Most of the map/reduce implementations are about distributing the work load and scheduling it. None of them really deal with the notion of updatable map/reduce results."

You might want to take a look at Percolator, it's Google enhancement to Map/Reduce to solve this exact issue. I.e. so it doesn't have to re-create it's index from scratch every 2 weeks, it can do incremental updates.

See http://www.theregister.co.uk/2010/09/24/googlepercolator/ and http://static.googleusercontent.com/externalcontent/untrusted_dlcp/research.google.com/en//pubs/archive/36726.pdf

It might have some ideas that could be used in RavenDB

Ayende Rahien

Matt, I was mostly interested in going over source code, to see how the split & merge the work

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The case of the memory eater and high load - about one day from now
  2. Production postmortem: The case of the lying configuration file - 2 days from now
  3. Production postmortem: The industry at large - 3 days from now
  4. The insidious cost of allocations - 4 days from now
  5. Find the bug: The concurrent memory buster - 5 days from now

And 4 more posts are pending...

There are posts all the way to Sep 10, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    14 Aug 2015 - The case of the man in the middle
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats