﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Ayende Rahien commented on Improving Map/Reduce performance in RavenDB</title><description>ppatterson,
Yes, we are persisting the intermediate results.
We use the term buckets to split the data into multiple sections within a single key. Batch &amp; bucket are probably the same thing.
Bucket is local to a key.
We have 100 buckets for CA at level 1, for example, each containing some data.
Then we have 10 buckets for CA at level 2, each containing the reduced data from level 1.
Then we have the final result for CA.
</description><link>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment4</link><guid>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment4</guid><pubDate>Wed, 22 Aug 2012 09:00:08 GMT</pubDate></item><item><title>Ayende Rahien commented on Improving Map/Reduce performance in RavenDB</title><description>avolkov,
This is really interesting, but not very useful for our needs. The way CouchDB and us store the results internally is _quite_ different.</description><link>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment3</link><guid>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment3</guid><pubDate>Wed, 22 Aug 2012 08:57:37 GMT</pubDate></item><item><title>ppatterson commented on Improving Map/Reduce performance in RavenDB</title><description>So basically this is persisting the intermediate steps of the map/reduce step? For the california example there are 37 million input elements scattered thoughout the census data which has around 300 million entries (assuming most people filled out a census). As this is processed the reduce phase should have separated these out into individual pieces with intermediate sums. Taken from your original post in 2010 one of the batches might have 15 million, another 100K, etc. depending on which states happened to be sent to whichever batch. 

So is the concept of a bucket then to be seen as the equivalent of one of the batches created in map/reduce (before the final batch when all the keys are unique and reducing is completed)?

Or is it that a bucket will hold for specific keys, so bucket 1 may contain all the california data and bucket 2 contain texas for example? 

I've read this post several times over and I can't wrap my head around what is going on at all. </description><link>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment2</link><guid>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment2</guid><pubDate>Tue, 21 Aug 2012 19:55:46 GMT</pubDate></item><item><title>avolkov commented on Improving Map/Reduce performance in RavenDB</title><description>Re: updates, did you see how CouchDB uses B-tree indexes to persist their intermediate reduce results? http://guide.couchdb.org/draft/views.html#reduce ?</description><link>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment1</link><guid>http://ayende.com/157889/improving-map-reduce-performance-in-ravendb#comment1</guid><pubDate>Tue, 21 Aug 2012 19:21:51 GMT</pubDate></item></channel></rss>