RavenDB Auto Sharding Bundle Design–Early Thoughts
Originally posted at 4/19/2011
RavenDB Auto Sharding is an implementation of sharding on the server. As the name implies, it aims to remove all sharding concerns from the user. At its core, the basic idea is simple. You have a RavenDB node with the sharding bundle installed. You just work with it normally.
At some point you realize that the data has grown too large for a single server, so you need to shard the data across multiple servers. You bring up another RavenDB server with the sharding bundle installed. You wait for the data to re-shard (during which time you can still read / write to the servers). You are done.
At least, that is the goal. In practice, there is one step that you would have to do, you would have to tell us how to shard your data. You do that by defining a sharding document, which looks like this:
{ // Raven/Sharding/ByUserName "Limits": [3], "Replica": 2 "Definitions": [ { "EntityName": "Users", "Paths": ["Username"] }, { "EntityName": "Posts", "Paths": ["AuthorName"] } ] }
There are several things to not here. We define a sharding document that shards on just one key, and the shard key has a length of 3. We also define different ways to retrieve the sharding key from the documents based on the entity name. This is important, since you want to be able to say that posts by the same user would sit on the same shard.
Based on the shard keys, we generate the sharding metadata:
{ "Id": "chunks/1", "Shards": ["http://shard1:8080", "http://shard1-backup:8080"], "Name": "ByUserName", "Range": ["aaa", "ddd"] } { "Id": "chunks/2", "Shards": ["http://shard1:8080", "http://shard2-backup:8080"], "Name": "ByUserName", "Range": ["ddd", "ggg"] } { "Id": "chunks/3", "Shards": ["http://shard2:8080", "http://shard3-backup:8080"], "Name": "ByUserName", "Range": ["ggg", "lll"] } { "Id": "chunks/4", "Shards": ["http://shard2:8080", "http://shard1-backup:8080"], "Name": "ByUserName", "Range": ["lll", "ppp"] } { "Id": "chunks/5", "Shards": ["http://shard3:8080", "http://shard2-backup:8080"], "Name": "ByUserName", "Range": ["ppp", "zzz"] } { "Id": "chunks/6", "Shards": ["http://shard3:8080", "http://shard3-backup:8080"], "Name": "ByUserName", "Range": ["000", "999"] }
This information gives us a way to make queries which are both directed (against a specific node, assuming we include the shard key in the query) or global (against all shards).
Note that we split the data into chunks, each chunk is going to be sitting in two different servers (because of the Replica setting above). We can determine which shard holds which chunk by using the Range data.
Once a chunk grows too large (25,000 documents, by default), it will split, potentially moving to another server / servers.
Thoughts?
Comments
how to interpret the Range in your example?
from (inclusive) - to (exclusive), I presume but what's up with the 2 last ranges (ppp-zzz and 000-999)?
Brings a tear to my eye, something so simple and so beautiful in comparison to other options in the .NET arena.
Jer0enH,
The example is meant to demonstrate an idea, not be executable code.
I'm somewhat concerned by the idea of keeping the configuration itself in a document. I understand that it's a common pattern (for example, import a SQL database into Visio and you might see sysdiagrams), but I've seen cases where meta information interferes with application information. I can't identify a specific problem in this case, yet.
We would have to worry about skewed data distributions with this scheme. We would probably not shard on a natural key (Customer Name), mostly on a surrogate (ID). So those might be skewed (and identity columns would only grow at the end, making only one shard bigger). This might benefit from hash distribution (loosing range queries thereby).
A solution to skew would also be to lower the chunk size considerably, so we could keep range queries.
Michael,
Just about all the configuration for RavenDB and RavenDB bundles are done using documents. That means that we have drastically simplified a lot of problems for ourselves, because we have a consistent medium.
For example, change notifications are handled once, and easily handled, unlike if you were storing this anywhere else.
When sharded, will you be able to perform parallel queries on each shard?
SQL server is able to do this when its partitioned, and greatly improves performance.
Simon,
Sure, that is just an issue at the client side.
We actually already have client side sharding, including the ability to run parallel queries.
For auto-sharding, you won't need to specify a key. You distribute the data evenly between the specified number of shards, perhaps in a round robin fashion.
For retrieval, you send the same query to all databases, and collate the data returned from all databases into a list and return back to the user.
By not specifying a key, it would mean you always need to query all shards for the data, but this happends in parallel so the query will still be fast.
Simon,
The problem is that then you have:
a) very hard time replicating data (how do you replicate with the key to decide what to replicate and where)
b) you have to query all servers, that means that load is still heavy on all servers.
c) as the number of servers grow, the cost of querying them gets very high.
Ok, for auto-sharding with no key specified by the user, you would have to do it like this:
A)
Have master auto-key shard document, that is replicated to all databases. Or alternatively, provide a separate auto-key shard database.
Auto-shard data evenly between shards, recording the ID in the master shard document.
B) Yes, you have to query all servers as no key is specified, that is the penalty the user has to pay for auto-sharding without a key. This is similar to a table scan in SQL server.
C) Yes, again that is a penaly the user has to pay for auto-sharding without a key. That is why you would always recommend the designer to specify a key. The auto-key is simple to get running across your whole database instantly, but has a cost implication. However, the auto-sharding with no key will still out-perform a database running on a single database given a large enough dataset.
Random thought) Got me thinking about the separate auto-key shard database. I wonder if you can create a parity database like Raid 5, and provide data redundancy in case a shard went off-line? Then auto-rebuild the shard when it came back up.
Simon,
There is already a unique key involved, the document key. No need to get complicated. But it is actually much more common for you to want to control how you are doing this sort of thing, because you desire locality of reference.
Comment preview