﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Meisinger,
What you refer to as the "manager" is the sharding function. It can either be a true function (compute based on input) or a function retrieving data from a common source.
Depending on your sharding strategy, sharding may change, and the sharding strategy needs to handle that. For most cases, you only change the sharding function when you deploy a new version to production, because it is such a big thing</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment36</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment36</guid><pubDate>Thu, 17 Nov 2011 18:31:24 GMT</pubDate></item><item><title>meisinger commented on Sharding vs. Having multiple databases</title><description>to Annie's point; i often struggle with this very topic.
i can easily reason about the first approach (where content is in a separate database) primarily because i can have configuration settings that tell me where to get "images", "comments" and "users." granted... performance degrades but, it feels like there are patterns and implementations out there to help with scaling issues and performance (e.g. caching).

when it comes to true "true sharding", however, i miss how my code and implementation reasons about which database or data store to query. something has to tell my code that if the user has a last name between "A-D" connect to this database, etc...

does it really come down to having some "manager" (for the lack of a better word) that knows/learns which database or data store to connect to? are we really talking about potentially connecting to a database and not finding a user or customer because the sharding rules have changed mid stream (e.g. a database has changed from A-H to A-C)?

so while i love the approach and think that "true sharding" has many benefits, it is being able to reason about how to implement a solution that causes many to fall back to separate database for each "concern"</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment35</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment35</guid><pubDate>Wed, 16 Nov 2011 07:39:29 GMT</pubDate></item><item><title>Annie Luxton commented on Sharding vs. Having multiple databases</title><description>@Oren

Well put - I totally agree with your last statement.  Thanks for the discussion :)</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment34</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment34</guid><pubDate>Tue, 15 Nov 2011 15:19:08 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Annie,
The only _reason_ to go to sharding is if you have no other choice, usually for perf / scale reasons.
Sharding makes a LOT of things a LOT harder, and it is only worth going there if you actually need that</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment33</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment33</guid><pubDate>Tue, 15 Nov 2011 15:12:45 GMT</pubDate></item><item><title>Annie Luxton commented on Sharding vs. Having multiple databases</title><description>@Oren

That's precisely my point - I have no good alternative... which is why I'm not blogging about this as a connoisseur on my own blog and instead I'm asking on a forum where there are experts on the matter. 

And regarding whether that's a big or small change - I disagree. I think it depends on the company you work for, how well organized projects are,  what sort of software you as a developer are used to working on and in this particular case, whether your database is sharded or not, because if it weren't sharded, something like I described would be a very minor change. Sharding is what would make it a bigger / more complicated change.

And furthermore, this highlights the fact that 'sharding' is a complex beast whose advantages and disadvantages should be very carefully considered before implementing it. Unfortunately many software companies looking for a performance improvement are turning to it as a 'silver bullet', not fully understanding what implications it might have for future development on the site. You may end up trading site speed for development speed and maintainability.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment32</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment32</guid><pubDate>Tue, 15 Nov 2011 14:32:47 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Annie,
And your alternative to that is...?

And what you describe is by no means a small change.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment31</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment31</guid><pubDate>Tue, 15 Nov 2011 13:59:12 GMT</pubDate></item><item><title>Annie Luxton commented on Sharding vs. Having multiple databases</title><description>@Oren

Yes, I realize that you can't afford to call multiple databases - ask me how I know this. It ends up being dreadfully slow and requires all sorts of band aids to avoid (such as introducing a Redis layer on top)... not ideal!

But what you're saying is with this type of sharding, even for what seems like a small change (i.e. can you please show this user's follower's profile pic thumbnails and their associated taglines where you didn't before), you need database changes and data duplication / migration?  This is precisely what turns me off sharding.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment30</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment30</guid><pubDate>Tue, 15 Nov 2011 13:57:05 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Annie,
When you have new requirements, you get the data that you need from the other locations and put it in the user's database, in a "foo" location for the "foo" feature. You literally _can't_ afford to call multiple databases in most scenarios</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment29</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment29</guid><pubDate>Tue, 15 Nov 2011 13:28:40 GMT</pubDate></item><item><title>Annie Luxton commented on Sharding vs. Having multiple databases</title><description>@Oren

Sure, that works if you know all of the data requirements for a site or even a page from the start. But again, from my experience, requirements change at the very last minute and before you know it you're being asked to show some data on a user's page that wasn't planned for, and therefore isn't in that user's 'timeline' table.  

If you've never been in this situation, you're a lucky man!!</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment28</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment28</guid><pubDate>Tue, 15 Nov 2011 13:26:59 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Annie,
You don't query other shards. Each user has their own "timeline" table, with all the entries that they need to show their own page.
That is part of the reason that twitter goes down a lot, a single twit can can a _massive_ amount of write.s</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment27</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment27</guid><pubDate>Tue, 15 Nov 2011 13:13:45 GMT</pubDate></item><item><title>Annie Luxton commented on Sharding vs. Having multiple databases</title><description>Whilst I understand that sharding is great at increasing performance in some specific situations, and that there are many different ways of implementing sharding, I still don't understand how you'd get around the problem of having to query many different databases to be able to display one page of say, a social networking site. 

Just as an example, take something like Twitter where users have followers as well as users that they follow. Say you wanted to build a page that for a particular user, would display all of the users they follow plus the users that follow them and some information about each of those users. If you were to shard the user table out as described in the 'proper sharding' way, you'd need to query many different databases, something which clearly isn't advisable and certainly won't increase system performance.  

From my own personal experience, many sites face this problem. Any site that has relationships between the entities that are split into different shards may face this problem. With each iteration of a site, more and more information from different shards may end up needing to be displayed on one page and it really sucks to have to say 'sorry, can't do that' to non-techie co-workers who end up looking at you like you don't know what you're doing because they're thinking 'it can't be that hard' - and really, it shouldn't be. So aside from pushing back on change requests or solving this conundrum by duplicating data (which apparently Flickr does with user comments and I really feel is a step backwards in terms of maintainability), how can sharding work well for this sort of scenario?</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment26</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment26</guid><pubDate>Tue, 15 Nov 2011 12:14:26 GMT</pubDate></item><item><title>Patrick Huizinga commented on Sharding vs. Having multiple databases</title><description>@Rafal

No, but it is useful to at least know these exists and what they roughly do. If you ever have a problem that they can solve, you will at least be able to consider them.

And don't start talking about them without reason. That way leads to the same destination as "let's always use XML".</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment25</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment25</guid><pubDate>Tue, 15 Nov 2011 10:38:50 GMT</pubDate></item><item><title>Rafal commented on Sharding vs. Having multiple databases</title><description>I never used sharding in my life. And I never had to deal with users, comments and images to the extent that I would have nightmares about not being able to fit them all in a single database. And never wrote a map-reduce function. Does it mean that I'm old and useless now? Will I feel more cutting-edge if I start talking about sharding, scalability, nosql and map-reducing? Hope this is the right place to ask.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment24</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment24</guid><pubDate>Tue, 15 Nov 2011 08:31:33 GMT</pubDate></item><item><title>Mark W commented on Sharding vs. Having multiple databases</title><description>@Daniel Lang

&gt;Flickr does that by duplicating that information

Ummmm don't see you telling Ayende he doesn't know what Flickr's configuration is.  What a suck up.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment23</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment23</guid><pubDate>Tue, 15 Nov 2011 03:24:59 GMT</pubDate></item><item><title>Mark W commented on Sharding vs. Having multiple databases</title><description>@Daniel Lang

Bob's an idiot?  It's obvious you're a Microsoft stooge who thinks Amazon runs the entire site on a single 2008 R2 server with a single SQL Server back end.  Man this site attracts a lot of fanbois.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment22</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment22</guid><pubDate>Tue, 15 Nov 2011 03:22:23 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Ruslan,
That was the choice that was presented, and I used it. I agree that sometimes it does make sense to do it that way, but that is far rarer. It would generally be easier to keep sharding down until you got to managable pieces.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment21</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment21</guid><pubDate>Mon, 14 Nov 2011 22:04:05 GMT</pubDate></item><item><title>Ruslan Konviser commented on Sharding vs. Having multiple databases</title><description>Hm, why you say "Sharding vs Having multiple databases" here? I.e. my question not related to Naming conversion, but related to "vs". I see many example when architecture and design done the way that you have BOTH sharding AND have multiple specific databases! It is make sense some times, sometimes not etc. Really depends on your requirements! So my vote -1 for "vs" :(

Just one example: you have so many users which username start from A :D And each user may have so many images uploaded to site, that you can't actually use one single server any more for both user / comments and images, even if you start using Users Ids range instead of Usernames :D for sharding ranges etc. 

So it is completely make sense to have SEPARATE databases sometimes. It give you ability to scale in addition to "classical" sharding.
It can also be required for security purposes, for high storage requirements (like FB have many databases in use like HBase, MySQL etc for different functionality and each one use own "sharding", "partitioning" or call it your name schema :D).

P.S. +1 to some people comments regarding possible network efficiency etc. It is all just to complicated to say here "vs" :)

</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment20</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment20</guid><pubDate>Mon, 14 Nov 2011 17:43:24 GMT</pubDate></item><item><title>Chanan Braunstein commented on Sharding vs. Having multiple databases</title><description>@David

The second option is better for that. What do you do in the first option if the User database is full?

On the Second option you just change the A-H to A-D, E-H (or some other breaking point that makes sense data wise).</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment19</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment19</guid><pubDate>Mon, 14 Nov 2011 16:10:02 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Njy,
What on earth would most pages contain data from multiple shards in the first place?
That pretty much defeat the idea of splitting the data so you only touch a single server</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment18</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment18</guid><pubDate>Mon, 14 Nov 2011 16:05:21 GMT</pubDate></item><item><title>njy commented on Sharding vs. Having multiple databases</title><description>@Oren: yeah, without any doubt. Probably it's just that from my point of view that would be unacceptable too (even considering that every page would contain data from more than one shard, probably, and that would make each page a mess anyway, if they can be still renderable at all) and usually that aspect is managed in other ways (dormient backup servers/vms or stuff like that).
All in all i tend to not think to that as a requirement on which base the data tier achitecture, but maybe it may very well be that my sense of security for that area come from the fact that i'm not the sysadmin here, and i may take for granted some safe concepts that typically apply to rdbms scenarios, but do not apply in the world of nosql solutions.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment17</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment17</guid><pubDate>Mon, 14 Nov 2011 16:00:15 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>@Njy,
In your case, any server down would impact ALL users.
In my case, any server down would impact only the users on that servers, all the others can continue operations.
That is a pretty important aspect.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment16</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment16</guid><pubDate>Mon, 14 Nov 2011 15:37:47 GMT</pubDate></item><item><title>njy commented on Sharding vs. Having multiple databases</title><description>@Oren: if one of the servers is down, it will happen exactly the same as if you are splitting data A-H / I-S / etc... some data will be unavailable. I mean, i'm not against sharding, on the contrary! I'm just trying to understand the right reasoning behind it, and i don't think that "what if a server goes down" is a good motivation.

The other reasons instead make more sense to me.
In reality considering those details we should start discussing the server-to-server network architecture too, because for example in one of our customers we have 3 different network cards for each server, 1 for the outside communications and 2 for internal server-to-server communications, to make it faster to communicate locally in parallel (from the app server to 2 different concurrent db servers).

But then again... yeah, i think it is become all too complicated to talk about this here in a blog post's comment :-)

Anyway thanks for the discussion, helpful as always.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment15</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment15</guid><pubDate>Mon, 14 Nov 2011 15:33:02 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Njy,
What happen if one of the servers is down? What happen for network traffic perspective? How do you maintain transaction integrity?
You have to consider a lot more factors than just a single request</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment14</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment14</guid><pubDate>Mon, 14 Nov 2011 15:19:15 GMT</pubDate></item><item><title>njy commented on Sharding vs. Having multiple databases</title><description>@Oren: oh, i see. Btw, as a general concept, i'm wondering if we're not going too much berzerk mode on this speed thing... i mean, duplicating a comment, that maybe is even editable in the future (think about stack overflow for example), and having to keep it in sync manually on updates... i mean, c'mon: the DB access is already blazingly fast like that (if you compare it to the old school rdbms), and in an high volume site you would probably use memcached/velocity/something-like-that too anyway.
So having A) a couple more ms read time to access 2 different stores instead of B) having some data duplicated and the burden to keep in sync each copy... i think it would be preferable the A) apporach.
Oh and, depending on the situations, it may even be possible to parallelize the access to the 2 or more different stores (while i'm getting the comments in a store, i'm getting the user data on another store) ... and that starts to become pretty interestaing...</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment13</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment13</guid><pubDate>Mon, 14 Nov 2011 14:59:11 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Njy,
The most common example of shared data is one user commenting on another's user image.
Flickr does that by duplicating that information, storing the comment on both shards.
I agree that this is an approach that makes a lot of sense.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment12</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment12</guid><pubDate>Mon, 14 Nov 2011 14:29:50 GMT</pubDate></item><item><title>Ayende Rahien commented on Sharding vs. Having multiple databases</title><description>Phillip,
Actually, no. You either do denormalization, or you do the joins in your code.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment11</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment11</guid><pubDate>Mon, 14 Nov 2011 14:27:43 GMT</pubDate></item><item><title>njy commented on Sharding vs. Having multiple databases</title><description>@Oren: i can see the point in sharding, my doubt is how do you organize your shards in the context of shared data? I mean, if each user has his own images, that woudl make sense, you group them togheter in a shard. But what about data related to multiple users, for example? Do you have any particular advice on scenarios like that?</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment10</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment10</guid><pubDate>Mon, 14 Nov 2011 14:04:43 GMT</pubDate></item><item><title>Daniel Lang commented on Sharding vs. Having multiple databases</title><description>Bob, this is a silly argument. You don't know the details of amazon application and infrastructure, but I'm sure they're not excessively wasting ressources. In the above example, chances are you will reduce the network-load to a third. I think this is a good argument.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment9</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment9</guid><pubDate>Mon, 14 Nov 2011 13:34:55 GMT</pubDate></item><item><title>Bob commented on Sharding vs. Having multiple databases</title><description>&gt;The main problem with the first option is that in order to actually do something interesting, you have to go to three different servers. 

Amazon does this to build each page view and the performance seems to be fine.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment8</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment8</guid><pubDate>Mon, 14 Nov 2011 13:28:29 GMT</pubDate></item><item><title>Liran Zelkha commented on Sharding vs. Having multiple databases</title><description>I think you're right, and the second architecture is more "sharding" than the first. But if you're looking for ways to shard, or if, for instance, customers A-H don't fit in one database - look at &lt;a href="http://www.scalebase.com&gt;ScaleBase&lt;/a&gt;. ScaleBase delivers a transparent database sharding solution, so you don't need to change your code to scale your database.</description><link>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment7</link><guid>http://ayende.com/134145/sharding-vs-having-multiple-databases#comment7</guid><pubDate>Mon, 14 Nov 2011 12:42:33 GMT</pubDate></item></channel></rss>