Ayende @ Rahien

Ayende @ Rahienhttp://ayende.comAyende @ RahienCopyright (C) Ayende Rahien 2004 - 2021 (c) 202660Ayende Rahien commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsSimon, There is already a unique key involved, the document key. No need to get complicated. But it is actually much more common for you to want to control how you are doing this sort of thing, because you desire locality of reference. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment12http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment12Wed, 04 May 2011 09:20:44 GMTSimon Hughes commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsOk, for auto-sharding with no key specified by the user, you would have to do it like this: A) * Have master auto-key shard document, that is replicated to all databases. Or alternatively, provide a separate auto-key shard database. * Auto-shard data evenly between shards, recording the ID in the master shard document. B) Yes, you have to query all servers as no key is specified, that is the penalty the user has to pay for auto-sharding without a key. This is similar to a table scan in SQL server. C) Yes, again that is a penaly the user has to pay for auto-sharding without a key. That is why you would always recommend the designer to specify a key. The auto-key is simple to get running across your whole database instantly, but has a cost implication. However, the auto-sharding with no key will still out-perform a database running on a single database given a large enough dataset. ----- Random thought) Got me thinking about the separate auto-key shard database. I wonder if you can create a parity database like Raid 5, and provide data redundancy in case a shard went off-line? Then auto-rebuild the shard when it came back up. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment11http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment11Wed, 04 May 2011 09:17:43 GMTAyende Rahien commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsSimon, The problem is that then you have: a) very hard time replicating data (how do you replicate with the key to decide what to replicate and where) b) you have to query all servers, that means that load is still heavy on all servers. c) as the number of servers grow, the cost of querying them gets very high. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment10http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment10Wed, 04 May 2011 09:01:27 GMTSimon Hughes commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsFor auto-sharding, you won't need to specify a key. You distribute the data evenly between the specified number of shards, perhaps in a round robin fashion. For retrieval, you send the same query to all databases, and collate the data returned from all databases into a list and return back to the user. By not specifying a key, it would mean you always need to query all shards for the data, but this happends in parallel so the query will still be fast. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment9http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment9Wed, 04 May 2011 08:59:12 GMTAyende Rahien commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsSimon, Sure, that is just an issue at the client side. We actually already have client side sharding, including the ability to run parallel queries. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment8http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment8Wed, 04 May 2011 08:56:28 GMTSimon Hughes commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsWhen sharded, will you be able to perform parallel queries on each shard? SQL server is able to do this when its partitioned, and greatly improves performance. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment7http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment7Wed, 04 May 2011 08:51:08 GMTAyende Rahien commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsMichael, Just about all the configuration for RavenDB and RavenDB bundles are done using documents. That means that we have drastically simplified a lot of problems for ourselves, because we have a consistent medium. For example, change notifications are handled once, and easily handled, unlike if you were storing this anywhere else. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment6http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment6Wed, 04 May 2011 08:08:40 GMTtobi commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsWe would have to worry about skewed data distributions with this scheme. We would probably not shard on a natural key (Customer Name), mostly on a surrogate (ID). So those might be skewed (and identity columns would only grow at the end, making only one shard bigger). This might benefit from hash distribution (loosing range queries thereby). A solution to skew would also be to lower the chunk size considerably, so we could keep range queries. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment5http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment5Tue, 03 May 2011 18:08:30 GMTMichael L Perry commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsI'm somewhat concerned by the idea of keeping the configuration itself in a document. I understand that it's a common pattern (for example, import a SQL database into Visio and you might see sysdiagrams), but I've seen cases where meta information interferes with application information. I can't identify a specific problem in this case, yet. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment4http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment4Tue, 03 May 2011 17:57:53 GMTAyende Rahien commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsJer0enH, The example is meant to demonstrate an idea, not be executable code. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment3http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment3Tue, 03 May 2011 10:07:11 GMTRichard Slater commented on RavenDB Auto Sharding Bundle Design–Early ThoughtsBrings a tear to my eye, something so simple and so beautiful in comparison to other options in the .NET arena. http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment2http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment2Tue, 03 May 2011 10:01:58 GMTJer0enH commented on RavenDB Auto Sharding Bundle Design–Early Thoughtshow to interpret the Range in your example? from (inclusive) - to (exclusive), I presume but what's up with the 2 last ranges (ppp-zzz and 000-999)? http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment1http://ayende.com/4830/ravendb-auto-sharding-bundle-design-early-thoughts#comment1Tue, 03 May 2011 09:17:27 GMT