Ayende @ Rahien

Stan commented on NHibernate Shards: Progress Report

Wed, 16 Dec 2009 20:31:52 GMT

how do i help to port Hyberrnate Shards to with NH Shards? who i i contact?

Agile Jedi commented on NHibernate Shards: Progress Report

Tue, 03 Nov 2009 23:30:57 GMT

If I want the first 10 records from a sorted query on a table with 10,000,000 records it would be much more performant to sort on the database. This is a common operation when paging results. I'd say that DB sorting is quicker....now if you do the sort and pull only primary keys...then grab the top 10 records using primary key lookup...you may have a better result with in memory sort.

Evgeny Shapiro commented on NHibernate Shards: Progress Report

Sun, 25 Oct 2009 11:04:41 GMT

Nadav, If you want to export much data NHibernate is probably the wrong tool to go with. ETL is the right answer. The constraints of the applicability of NHibernate make inmemory sorting as good as merged sort (that is beside extreme cases).

Nadav commented on NHibernate Shards: Progress Report

Tue, 20 Oct 2009 06:36:37 GMT

Actually, I think that oren's version will probably be faster too for most cases (and the priority queue solution might be nice in theory, but the added complexity makes it not worth it :) ). I think the real issue with the in memory version is with scalabilty. I mean, what if the user wants to export the last 5 years worth of history to a text file? That can be millions of records. I don't want to keep all that in memory so I can sort it. Nadav

Jeremy Gray commented on NHibernate Shards: Progress Report

Mon, 19 Oct 2009 14:21:35 GMT

So, how about this: The merge fans acknowledge that Oren's version is simpler, which has value, Oren acknowledges that the merge version is faster, which has value, and the merge fans submit a patch when all is said and done. :)

Ayende Rahien commented on NHibernate Shards: Progress Report

Mon, 19 Oct 2009 13:28:12 GMT

Nadav, Which ends up being a lot of brain power for just getting the data, sorting it using the builtin methods, and moving on to actually producing value :-)

Nadav commented on NHibernate Shards: Progress Report

Mon, 19 Oct 2009 12:45:19 GMT

Merging sorted is O(m*n) when m is the number of shards. Sorting in memory is O(n*log2(n)). So if m So if we have 8 shards and 10000 records then m=8, log2(n)=13(more or less). The other good thing about the merge algorithm is that you don't need to read all the data into memory to start returning results. If you implement the MergeSortedList to return an IEnumerable then you just need to get the first item from each shard to return the first item. BTW, I don't think you need to sort the shards, each time you just need the shard with the smallest first Item, so you can use a priority queue, which , if I remember correctly, is O(log(m)) for updating one item, So if you put all the enumerators in a priority queue (ordered on the value of the Current Item) and then just remove the first item from the priority queue, return the Current item, Execue MoveNext() and return the enumerator to the queue if it is not empty, You should have an algorithm that works with O(log(m)*n) which should be alot better than O(log(n)*n).

configurator commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 20:19:49 GMT

Of course you won't pull millions of items. It's just impossible to see the difference with less data - and if we're talking about a server that servers millions of requests, each with 100 items (which, like you said, is practically no data), it would matter.

Ayende Rahien commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 20:02:07 GMT

You are never going to pull millions of items.

configurator commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 17:38:11 GMT

With a slight change (chaging SmallerThan to comparer.Compare and adding a Comparer

Ayende Rahien commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 16:28:58 GMT

Configurator, I am willing to lay odds that you wouldn't be able to get this to perform faster than. Array.Sort or List.Sort that are already in the BCL.

configurator commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 16:24:52 GMT

This data is not mostly sorted. It's several sorted list - and for that you have a simple algorithm that scales well over any amount of data as long as there aren't too many shards - it's approximately an O(n*m) algorithm where n is the total data and m is the number of shards. List enumerators = new List (); foreach (IEnumerable 0) { // compare current item in each enumerator and choose the first one IEnumerator

Ayende Rahien commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 16:01:11 GMT

Configurator, [stackoverflow.com/.../which-sort-algorithm-work...](http://stackoverflow.com/questions/220044/which-sort-algorithm-works-best-on-mostly-sorted-data) There are some algorithms that performs horrible on nearly sorted data. Quicksort, in particular, may get on O(n^2) on sorted data. Since we aren't dealing with large amounts of data, it is quick to sort them in memory, it is a step we are going to take anyway, so why let the DB do it? 120 is no data, practically

configurator commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 15:53:07 GMT

But database sorts are cheaper, because they are indexed... Am I missing something here? Suppose we have a large amount of data - say 120 records, sharded into 3 shards and we want it sorted by an indexed field. Now we get a bunch of records and sort them ourselves. But we could use the (fast) database indexed sort, and then combine the lists ourself quite quickly.

Ayende Rahien commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 15:51:25 GMT

Fabio, You are absolutely right! I updated the post

Ayende Rahien commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 15:49:09 GMT

configurator, Because you are going to have to sort them anyway. At that point, it is cheaper to let the DB just stream it to us and we will sort them in memory.

Fabio Maulo commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 15:48:15 GMT

Thanks to Dario Quintana to put a lot of effort developing NHibernate.Shards and thanks to the others developers have sent patch in the last few weeks.

configurator commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 15:43:23 GMT

How come it's cheaper to do the sorting in memory than to allow the database to do it for each shard? Combining sorted list is an O(n) operation.

Ayende Rahien commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 14:57:52 GMT

Brian, Since with DDD you only ever access stuff from the root, only aggregates are required to have shared id. BTW, ShardUUIDGenerator isn't the only one that you can use, you can use other sharding aware id generators. An aggregates all reside on a single shard.

Brian Hartsock commented on NHibernate Shards: Progress Report

Sun, 18 Oct 2009 14:54:45 GMT

First off, awesome. Second, I was wondering about the ShardResolutionStrategy. How is this going to work with associations and aggregate roots? I would think only the aggregate root needs the ShardUUIDGenerator, but it seems as though that assumption is wrong. Would aggregates have their own generator and be spread out between multiple databases for the same root?