Ayende @ Rahien

Ayende @ Rahienhttp://ayende.comAyende @ RahienCopyright (C) Ayende Rahien 2004 - 2021 (c) 202660Ayende Rahien commented on Designing a document database: Aggregation RecalculatingChris, Right now, I am not considering yet how to actually get the entire document / view space distributed, it looks much easier to simply replicate things with sharding algorithm. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment11http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment11Sun, 15 Mar 2009 14:35:04 GMTMr_Simple commented on Designing a document database: Aggregation Recalculating@Ayende "Yes, the implementation _would_ be challenging, but not complex, just hard. " I agree with complex but not hard. I often tell my clients exactly that. Programming should never be measured in simple or hard. The variable of time is much more useful and as time holds all solutions, programmers simply have time to solve an issue or not. Length of time determines cost and whether the solution can afford to be found. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment10http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment10Sun, 15 Mar 2009 13:42:14 GMTChris Wright commented on Designing a document database: Aggregation RecalculatingYou also have duplication. If you want to be able to read duplicated data for added efficiency rather than just keeping it as backups, you might decide not to record which copy of the data is considered real -- maybe all copies are the real copy. But when you want to do this kind of map/reduce thing, you need to know whether to include this entry in the results, and duplicates should be excluded. This means, though, that when a node goes down, you have to discover that fact, and select another node that contains a copy of its non-duplicate data to replace it. The alternative is to write your queries in such a way that duplicates can be resolved by the client, but that really isn't the client's concern, and it's inefficient. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment9http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment9Sun, 15 Mar 2009 13:30:34 GMTAyende Rahien commented on Designing a document database: Aggregation Recalculatingconfigurator, That is a great question, I don't really know. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment8http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment8Sun, 15 Mar 2009 04:28:44 GMTjosh commented on Designing a document database: Aggregation RecalculatingFor those like me who aren't as familiar with MapReduce: [http://labs.google.com/papers/mapreduce.html](http://labs.google.com/papers/mapreduce.html) [http://en.wikipedia.org/wiki/MapReduce](http://en.wikipedia.org/wiki/MapReduce) I only read the google page because it made sense to me after that, but the wiki page looks like it covers a little more detail. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment7http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment7Sun, 15 Mar 2009 03:45:31 GMTconfigurator commented on Designing a document database: Aggregation RecalculatingIs this the map-reduce algorithm used by Google? What data would you return to the user while aggregation is being done? http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment6http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment6Sat, 14 Mar 2009 21:49:22 GMTAyende Rahien commented on Designing a document database: Aggregation RecalculatingRafal, That is why I specified that the aggregation is done as part of a background process. That way, you can still serve requests while still maintaining the perf of the server. Evgeny, Yes, the implementation _would_ be challenging, but not complex, just hard. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment5http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment5Sat, 14 Mar 2009 21:10:43 GMTEvgeny Kobzev commented on Designing a document database: Aggregation Recalculating"...a challenging task to build, but from design perspective, it looks pretty straightforward." I think implementation is the main problem here. Failure at reducing node during calculation and so on. But the idea looks good, thank you for the post :) We have interesting discussion about it at Friday :) http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment4http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment4Sat, 14 Mar 2009 20:22:15 GMTRafal commented on Designing a document database: Aggregation RecalculatingYou're right, RDBMS-based systems usually have problems with data aggregation - that's why we're using separate report databases for larger applications. Aggregations done in a transactional system are too heavy for the database server, also they usually don't cache query results or partial results and perform aggregation each time data is requested. So map-reduce with automatic caching of partial results would help in such cases. Example: task management system where each user and group of users has its own 'inbox' for keeping todo list and each user has its own dashboard with statistics. If you want to calculate statistics for each logged in user based on raw transactional data, you'll probably kill the database server. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment3http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment3Sat, 14 Mar 2009 17:45:56 GMTAyende Rahien commented on Designing a document database: Aggregation RecalculatingRafal, Reporting scenarios is a major consideration, certainly. But it is not just that, there are numerous reasons to want to be able to do aggregation in most systems. Look at the right side of the blog, you see the category list, and the monthly list? Those are aggregations. In many scenarios, it is important to be able to do so as efficiently as possible. Leaving that aside, a good reporting story is pretty important, don't you think? I have a possible scenario of having to handle _lots_ of small databases, mostly with reports on them. http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment2http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment2Sat, 14 Mar 2009 16:47:14 GMTRafal commented on Designing a document database: Aggregation RecalculatingOkay, map-reduce is very spectacular and appealing, but can you please describe some real-world problem solved using map-reduce on documents? In typical business applications you usually perform operations on single entities and don't aggregate them. Aggregation is usually done when reporting and involves separate report database or OLAP system. I think map-reduce can be used for indexing document data - is it the main reason why you are writing about it? http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment1http://ayende.com/3908/designing-a-document-database-aggregation-recalculating#comment1Sat, 14 Mar 2009 16:35:41 GMT