Ayende @ Rahien

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Mon, 09 Apr 2012 11:33:28 GMT

Vlad, I wrote a five parts series about this exact topic: http://ayende.com/blog/155809/api-design-sharding-status-for-failure-scenarios?key=167619a5bdec4055a66651904916ffb4 http://ayende.com/blog/155841/api-design-sharding-status-for-failure-scenariosndash-ignore-and-move-on?key=fe08107267224788b42ce633469418ec http://ayende.com/blog/155873/api-design-sharding-status-for-failure-scenariosndash-explicit-failure-management?key=341dc08c7fde407a83dc2b7ad81c3bd0 http://ayende.com/blog/155905/api-design-sharding-status-for-failure-scenariosndash-explicit-failure-management-doesnrsquo-t-work?key=c9a5a273ba8d4029bd2f111e3a7293c8 http://ayende.com/blog/155937/api-design-sharding-status-for-failure-scenariosndash-solving-at-the-right-granularity?key=f3e596cb1e5342c193aa617553adea5d I would like your opinion about this.

Vlad commented on RavenDB Sharding–Map/Reduce in a cluster

Mon, 09 Apr 2012 11:29:48 GMT

Sure things, Oren. A user should be notified in case when the system avoids unhandled exception. Basically I think about only Load(id) operation which returns only single value. In this case we can: 1. return found document (if any exists); 2.return null; 3. return exception when not found any and connection is broken. Yes, you are right - the system must guarantee that user see all requested data _or_ notify about the problem like "You cannot view requested data because...", or in very rare situation - show part of requested data _and_ notify about the problem like "You see not all of requested data because..."

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Mon, 09 Apr 2012 06:21:51 GMT

Vlad, Note that one thing that you have to worry about in this scenario is actually _knowing_ that you have a connection problem. The absolute worst thing you can have is to have a connection problem and for some of the data to just go away without you actually noticing.

Vlad commented on RavenDB Sharding–Map/Reduce in a cluster

Sun, 08 Apr 2012 23:04:44 GMT

Oren, here is an example of the case. One of a system customers wants to store their data in separated db instance, in their own network/datacenter. Database is sharded by Customer. The customer also wants to store user accounts at their site (very paranoidal policy). I want to avoid situation when customer's connection problems will affect the system and other users will not have access to it. It will be great to have sharding which helps to avoid the problem. I've investigated the code of existing Access strategies (Sequential and Parallel) and as I see I can write the same but with appropriate exception handling. Thank you and your team for readable code!

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Sun, 08 Apr 2012 10:14:37 GMT

Vlad, Right now, we don't handle this scenario, and that is on purpose. I don't know what the kind of behavior to implement here would be. But we do provide a hook to show how you can handle that yourself, giving you the ability to handle that. The hook is the IShardAccessStrategy.

Vlad commented on RavenDB Sharding–Map/Reduce in a cluster

Fri, 06 Apr 2012 22:10:16 GMT

Oren, can I retrieve results if even nodes which doesn't contain appropriate data are down (offline)? Shards located in different datacentres. Simple example: 1. I do Load(ID) from ShardedDocumentStore [Shard1, Shard2]. 2. Shard 1 is online. Shard2 is offline (down). 3. Shard 1 contains required document. I have exception in this case. How I can setup a session for retrieving available data? Does Raven have this possibility?

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Tue, 03 Apr 2012 00:05:10 GMT

Sonic, No, you can do it easily with RavenDB. You need to provide two values to the ShardingOn method, the first is to extract the value from the entity, and the second is to convert the value to the shard id. That way, you can do things like arbitrary or date based.

Sonic commented on RavenDB Sharding–Map/Reduce in a cluster

Mon, 02 Apr 2012 17:44:11 GMT

Is there a way to "partition/shard" the data by a dynamic/arbitrary sequential value like date? In other words, could I tell RavenDB to partition a DocumentStore by month or do I have to explicitly define the shard like in your example of region?

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Sun, 01 Apr 2012 08:34:41 GMT

Petar, Sure, take all the data from the sharded instances and put it in one box. Then use the standard DocumentStore

petar commented on RavenDB Sharding–Map/Reduce in a cluster

Sat, 31 Mar 2012 02:35:43 GMT

Is there a path that one can take to reverse the sharded DB back into single instance? Thanks.

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Fri, 23 Mar 2012 10:59:16 GMT

Morcs, We provide an extension point for you to inject your own behavior, so I am not sure if an exception would be a good idea here.

morcs commented on RavenDB Sharding–Map/Reduce in a cluster

Fri, 23 Mar 2012 09:17:44 GMT

Understood :) I've not had to use anything like sharding before but it would worry me that: session.Query(...).OrderBy(...).Skip(x) would return different, possibly confusing results in a sharded setup, but I do understand why. Would the "safe by default" principle suggest that we should get an exception when trying to use Skip on an IOrderedQueryable in a sharded setup?

Thomas Krause commented on RavenDB Sharding–Map/Reduce in a cluster

Thu, 22 Mar 2012 21:14:49 GMT

@morcs and Jonty Doing sorting and paging at the same time like being described is hard. One solution I could think of is using the same attribute for sharding and sorting. So in the example above you could sort first by region and then by name. This would get the first pages from server A, the next pages from server B and so on. I'm not sure if this is already supported by RavenDB, but it shouldn't be too complicated. In case you reach the end of one shard and only get a partial page, you would need to do another request for the next shard to fill the page.

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Thu, 22 Mar 2012 11:54:18 GMT

Morcs, It would get the 10 - 20 results from each server, sort them and give you 10 results

morcs commented on RavenDB Sharding–Map/Reduce in a cluster

Thu, 22 Mar 2012 11:08:57 GMT

I guess what I really should do is download Raven DB and try it for myself, since Ayende's put so much effort in to making that as simple as possible to do :)

morcs commented on RavenDB Sharding–Map/Reduce in a cluster

Thu, 22 Mar 2012 11:07:47 GMT

So would the result of (hope the formatting works): session.Query...() .Skip(10) .Take(10) .ToList() return unexpected results under sharding? Or would it get 20 results from each server and do the Skip and Take client-side?

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Thu, 22 Mar 2012 09:56:43 GMT

Geert, Yes, you are correct. You _cannot_ get a globally paged sharded result without having access to all of the information. Now, we may provide feedback in the future that will allow you to fine tune those per shard, but this is _really_ hard to do, and place undue burden on the client. More than that, the actual scenario doesn't look too good from a business point of view either. In a sharded env, you respect the sharding, and you don't try to go ahead and do things like this on the fly. You would modify your behavior so your queries respected the shard boundaries.

Jonty commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 21:49:20 GMT

I'm with Geert here. Logically you'd have to bring back m results from each server, as the first record on any given server could be in the desired page. Unless you did some kind of round robin between servers before returning to the client.

Geert Baeyaert commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 21:11:33 GMT

Ok, I must be missing something. Let's say we want document 3 through 5, sorted by key. There are 2 servers in the cluster. Server A contains docs with key A, B, C, E, H. Server B contains docs with key D, F, G, I, J The expected result without clustering is C, D, E. However, with clustering: Server A : C, E, H Server B : G, I, J which after sorting, and taking the first 3, gives you C, E, G.

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 19:08:05 GMT

Greet, No, it does no. we get the N-M from each server, then sort them, and give you the required page size from there.

morcs commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 16:53:11 GMT

@Geert I guess that's the case, except it would be the reduce results that are being returned and thrown away, not whole documents!

Geert Baeyaert commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 16:37:39 GMT

Ayende, is that also how it works for pages other than the first page? Let's say we want documents n through m. Are you saying that you get the first m documents from each server, and then on the client sort and throw away the unnecessary documents?

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 11:19:06 GMT

Jonty, Yes, that is what we are doing.

Jonty commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 11:14:54 GMT

How does that work then? Presumably you'd have to sort on each server, return the number of results from each server equal to the page size and do a further sort on the client.

Ayende Rahien commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 11:11:00 GMT

Jonty, Paging and sort _is_ supported.

Jonty commented on RavenDB Sharding–Map/Reduce in a cluster

Wed, 21 Mar 2012 11:08:33 GMT

Nice. Presumably paging and sorting are not supported?