﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Stan commented on NHibernate Shards: Progress Report</title><description>how do i help to port Hyberrnate Shards to  with NH Shards?
  
who i i contact?
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment20</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment20</guid><pubDate>Wed, 16 Dec 2009 20:31:52 GMT</pubDate></item><item><title>Agile Jedi commented on NHibernate Shards: Progress Report</title><description>If I want the first 10 records from a sorted query on a table with 10,000,000 records it would be much more performant to sort on the database.
  
  
This is a common operation when paging results.
  
  
I'd say that DB sorting is quicker....now if you do the sort and pull only primary keys...then grab the top 10 records using primary key lookup...you may have a better result with in memory sort.
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment19</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment19</guid><pubDate>Tue, 03 Nov 2009 23:30:57 GMT</pubDate></item><item><title>Evgeny Shapiro commented on NHibernate Shards: Progress Report</title><description>Nadav,
  
  
If you want to export much data NHibernate is probably the wrong tool to go with. ETL is the right answer.
  
  
The constraints of the applicability of NHibernate make inmemory sorting as good as merged sort (that is beside extreme cases).
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment18</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment18</guid><pubDate>Sun, 25 Oct 2009 11:04:41 GMT</pubDate></item><item><title>Nadav commented on NHibernate Shards: Progress Report</title><description>Actually, I think that oren's version will probably be faster too for most cases (and the priority queue solution might be nice in theory, but the added complexity makes it not worth it :) ).
  
  
I think the real issue with the in memory version is with scalabilty.
  
  
I mean, what if the user wants to export the last 5 years worth of history to a text file? That can be millions of records. I don't want to keep all that in memory so I can sort it.
  
  
Nadav
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment17</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment17</guid><pubDate>Tue, 20 Oct 2009 06:36:37 GMT</pubDate></item><item><title>Jeremy Gray commented on NHibernate Shards: Progress Report</title><description>So, how about this: The merge fans acknowledge that Oren's version is simpler, which has value, Oren acknowledges that the merge version is faster, which has value, and the merge fans submit a patch when all is said and done. :)
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment16</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment16</guid><pubDate>Mon, 19 Oct 2009 14:21:35 GMT</pubDate></item><item><title>Ayende Rahien commented on NHibernate Shards: Progress Report</title><description>Nadav,
  
Which ends up being a lot of brain power for just getting the data, sorting it using the builtin methods, and moving on to actually producing value :-)
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment15</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment15</guid><pubDate>Mon, 19 Oct 2009 13:28:12 GMT</pubDate></item><item><title>Nadav commented on NHibernate Shards: Progress Report</title><description>Merging sorted is O(m*n) when m is the number of shards.
  
Sorting in memory is O(n*log2(n)).
  
So if m
&lt;log2(n)&gt;
So if we have 8 shards and 10000 records then 
  
m=8, log2(n)=13(more or less).
  
  
The other good thing about the merge algorithm is that 
  
you don't need to read all the data into memory to start returning results.
  
If you implement the MergeSortedList to return an IEnumerable then you just need to get the first item from each shard to return the first item.
  
  
BTW, I don't think you need to sort the shards, each time you just need the shard with the smallest first Item, so you can use a priority queue,
  
which , if I remember correctly, is O(log(m)) for updating one item,
  
So if you put all the enumerators in a priority queue (ordered on the value of the Current Item) and then just remove the first item from the priority queue, return the Current item, Execue MoveNext() and return the enumerator to the queue if it is not empty, You should have an algorithm that works with O(log(m)*n) which should be alot better than O(log(n)*n).
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment14</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment14</guid><pubDate>Mon, 19 Oct 2009 12:45:19 GMT</pubDate></item><item><title>configurator commented on NHibernate Shards: Progress Report</title><description>Of course you won't pull millions of items. It's just impossible to see the difference with less data - and if we're talking about a server that servers millions of requests, each with 100 items (which, like you said, is practically no data), it would matter.
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment13</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment13</guid><pubDate>Sun, 18 Oct 2009 20:19:49 GMT</pubDate></item><item><title>Ayende Rahien commented on NHibernate Shards: Progress Report</title><description>You are never going to pull millions of items.
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment12</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment12</guid><pubDate>Sun, 18 Oct 2009 20:02:07 GMT</pubDate></item><item><title>configurator commented on NHibernate Shards: Progress Report</title><description>With a slight change (chaging SmallerThan to comparer.Compare and adding a Comparer
&lt;t comparer = Comparer
&lt;t.Default line) the merge's times were better than List
&lt;t.Sort. My tests don't include the database sorting time, and I haven't really bothered with optimising this code, but my times were:
  
Merge time: 3562
  
Sort time: 6236
  
This was done with four lists of 10,000,000 integers each.
&gt;</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment11</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment11</guid><pubDate>Sun, 18 Oct 2009 17:38:11 GMT</pubDate></item><item><title>Ayende Rahien commented on NHibernate Shards: Progress Report</title><description>Configurator,
  
I am willing to lay odds that you wouldn't be able to get this to perform faster than.
  
Array.Sort or List.Sort that are already in the BCL.
  
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment10</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment10</guid><pubDate>Sun, 18 Oct 2009 16:28:58 GMT</pubDate></item><item><title>configurator commented on NHibernate Shards: Progress Report</title><description>This data is not mostly sorted. It's several sorted list - and for that you have a simple algorithm that scales well over any amount of data as long as there aren't too many shards - it's approximately an O(n*m) algorithm where n is the total data and m is the number of shards.
  
  
		List
&lt;t MergeSortedLists
&lt;t(params IEnumerable
&lt;t[] enumerables) {
  
			// How this works:
  
			// Stores a list of the enumerators
  
			// For each item, finds the minimum 'Current' of all enumerators, then preforms MoveNext() on the enumerator whose item has been selected
  
			// Once an enumerator is finished, it is removed from the enumerator list
  
			// Once the list is empty, we're done.
  
  
			// First, create a list of enumerators, with one MoveNext() to get to the first item
  
			List
&lt;ienumerator&lt;t&gt; enumerators = new List
&lt;ienumerator&lt;t&gt;();
  
			foreach (IEnumerable
&lt;t enumerable in enumerables) {
  
				IEnumerator
&lt;t e = enumerable.GetEnumerator();
  
				if (e.MoveNext())
  
					enumerators.Add(e);
  
				else
  
					// If there is no first item, we don't bother adding the enumerator to the list
  
					e.Dispose();
  
			}
  
  
  
			List
&lt;t result = new List
&lt;t();
  
			while (enumerators.Count &gt; 0) {
  
				// compare current item in each enumerator and choose the first one
  
				IEnumerator
&lt;t next = enumerators[0];
  
				for (int i = 1; i &lt; enumerators.Count; i++) {
  
					if (SmallerThan(enumerators[i].Current, next.Current))
  
						next = enumerators[i];
  
				}
  
  
				// current item on selected enumerator is added to the result
  
				result.Add(next.Current);
  
  
				// move next on that enumerator and get rid of it if we're done
  
				if (!next.MoveNext()) {
  
					enumerators.Remove(next);
  
					next.Dispose();
  
				}
  
			}
  
  
			return result;
  
		}
  
&gt;</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment9</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment9</guid><pubDate>Sun, 18 Oct 2009 16:24:52 GMT</pubDate></item><item><title>Ayende Rahien commented on NHibernate Shards: Progress Report</title><description>Configurator,
  
[stackoverflow.com/.../which-sort-algorithm-work...](http://stackoverflow.com/questions/220044/which-sort-algorithm-works-best-on-mostly-sorted-data)  
  
There are some algorithms that performs horrible on nearly sorted data.
  
Quicksort, in particular, may get on O(n^2) on sorted data.
  
Since we aren't dealing with large amounts of data, it is quick to sort them in memory, it is a step we are going to take anyway, so why let the DB do it?
  
120 is no data, practically
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment8</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment8</guid><pubDate>Sun, 18 Oct 2009 16:01:11 GMT</pubDate></item><item><title>configurator commented on NHibernate Shards: Progress Report</title><description>But database sorts are cheaper, because they are indexed... Am I missing something here?
  
  
Suppose we have a large amount of data - say 120 records, sharded into 3 shards and we want it sorted by an indexed field. Now we get a bunch of records and sort them ourselves. But we could use the (fast) database indexed sort, and then combine the lists ourself quite quickly.
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment7</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment7</guid><pubDate>Sun, 18 Oct 2009 15:53:07 GMT</pubDate></item><item><title>Ayende Rahien commented on NHibernate Shards: Progress Report</title><description>Fabio,
  
You are absolutely right!
  
I updated the post
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment6</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment6</guid><pubDate>Sun, 18 Oct 2009 15:51:25 GMT</pubDate></item><item><title>Ayende Rahien commented on NHibernate Shards: Progress Report</title><description>configurator,
  
Because you are going to have to sort them anyway. At that point, it is cheaper to let the DB just stream it to us and we will sort them in memory.
  
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment5</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment5</guid><pubDate>Sun, 18 Oct 2009 15:49:09 GMT</pubDate></item><item><title>Fabio Maulo commented on NHibernate Shards: Progress Report</title><description>Thanks to Dario Quintana to put a lot of effort developing NHibernate.Shards and thanks to the others developers have sent patch in the last few weeks.
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment4</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment4</guid><pubDate>Sun, 18 Oct 2009 15:48:15 GMT</pubDate></item><item><title>configurator commented on NHibernate Shards: Progress Report</title><description>How come it's cheaper to do the sorting in memory than to allow the database to do it for each shard? Combining sorted list is an O(n) operation.
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment3</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment3</guid><pubDate>Sun, 18 Oct 2009 15:43:23 GMT</pubDate></item><item><title>Ayende Rahien commented on NHibernate Shards: Progress Report</title><description>Brian,
  
Since with DDD you only ever access stuff from the root, only aggregates are required to have shared id.
  
BTW, ShardUUIDGenerator isn't the only one that you can use, you can use other sharding aware id generators.
  
An aggregates all reside on a single shard.
  
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment2</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment2</guid><pubDate>Sun, 18 Oct 2009 14:57:52 GMT</pubDate></item><item><title>Brian Hartsock commented on NHibernate Shards: Progress Report</title><description>  
First off, awesome.
  
  
Second, I was wondering about the ShardResolutionStrategy.  How is this going to work with associations and aggregate roots?  I would think only the aggregate root needs the ShardUUIDGenerator, but it seems as though that assumption is wrong.  Would aggregates have their own generator and be spread out between multiple databases for the same root?
  
</description><link>http://ayende.com/4252/nhibernate-shards-progress-report#comment1</link><guid>http://ayende.com/4252/nhibernate-shards-progress-report#comment1</guid><pubDate>Sun, 18 Oct 2009 14:54:45 GMT</pubDate></item></channel></rss>