Ayende @ Rahien

Ayende @ Rahienhttp://ayende.comAyende @ RahienCopyright (C) Ayende Rahien 2004 - 2021 (c) 202660fschwiet commented on Graph DB Sharding Strategies: Gravityalex has a point that your model doesnt really account for access frequency, it sort of presumes they're all accessed with similar frequency. I wonder, since you are building the indexes used to access the objects, if you should keep some statistics in the indexes. Consider tracking for each object instance how many times its accessed from each shard. Overall number of accesses will tell you if this item is hot and worth moving. Weight per shard will tell you where it belongs. There may be some 0.1% of objects whose access frequencies are high enough to justify replicating them across shards. This kind of weighting wouldn't detect entire groups that could be moved together, but it would work well for fine tuning. Maybe some grouping detector could run and move larger clumps around. http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment11http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment11Wed, 12 May 2010 04:08:36 GMTAlex commented on Graph DB Sharding Strategies: Gravityhi Ayende, nice post again! some interesting ideas there to evolve to the dynamic nature of twitter it would be interesting to not only "strengthen" active relationships but also introduce a mechanism that forces older relationships to "decay" for example, nobody gives a flying duck about what was tweeted 6 months ago... that data is basically forgotten by the users of twitter. I'm sure twitter themselves use it to mine for interesting patterns, but we as users do not to deal with this we could timestamp all relationships, and then your "gravity" algo could reduce (halve?) the weight of those relationships as soon they are over a certain age... relationships would have a "half life" the algorithm could consider relationship based on some probability, which is dictated by their weight... this may reduce computational complexity in a semantically meaningful way regarding the geo-locality assumption, I think this is a better method than random initialization, but as others have mentioned I think it could be greatly improved. twitter just isnt as "local" as something like facebook some other points (that only someone with better mathematical kung fu than I could answer): - how do we ensure shards have equal sizes? it may be that "gravity" converges to uneven shard sizes - how do we ensure the algorithm doesn't get trapped in a local minima? - IF it does produce useful results, how long does it take to converge to this state? - many more ???... :) Alex http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment10http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment10Tue, 11 May 2010 08:15:14 GMTRafal commented on Graph DB Sharding Strategies: GravityThere's another problem with this algorithm: when the accumulated node mass exceeds certain threshold everything will collapse under the gravity force and a black hole will be formed. http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment9http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment9Tue, 11 May 2010 07:38:46 GMTFrank Quednau commented on Graph DB Sharding Strategies: GravityIf you already use gravity as analogy I am wondering whether other fundamental forces could also be useful, e.g. repulsive forces between nodes without any relations between them (compare to electromagnetism with equal polarity - maybe introducing a non-relation with negative weigth between unrelated nodes?) or higher escape velocities for strongly related nodes (compare to strong nuclear force). Indeed, modeling that last force correctly would support the breaking up of clusters that grow too big (compare to radioactivity). No, I am not joking. Reality's force model is incredibly ingenious and I love relating it to other stuff... http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment8http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment8Mon, 10 May 2010 20:54:23 GMTAyende Rahien commented on Graph DB Sharding Strategies: Gravityfschwiet, Relations are stored along with the node. This scheme requires a master list of node id to shard id. http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment7http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment7Mon, 10 May 2010 16:46:08 GMTfschwiet commented on Graph DB Sharding Strategies: GravityWhat you've described is a means to adjust where objects are sharded. Where do you end up storing the relations? Also, when retrieving an object, how do you determine what shard it is in? Sorry not to familiar with object databases. I'm thinking in terms of memcached where any server can determine the server an object is on just by looking at its identity key. Maybe you do not care about knowing what server the object is on, if it can still be located via a map/reduce operation going to all servers. http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment6http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment6Mon, 10 May 2010 16:41:48 GMTTom Dietrich commented on Graph DB Sharding Strategies: GravityI love the term "Escape velocity" for this algorithm. Is this a constant? it seems as though you could calculate based on the average load of the clusters what the escape velocity should be- to balance the cost of moving a node versus the benefit of lowering your shard load when it is needed. Very interesting post though. http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment5http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment5Mon, 10 May 2010 12:12:27 GMTRamon Smits commented on Graph DB Sharding Strategies: GravitySorry for all those typos :-) (RFC: add some edit functionalty to your blog) http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment4http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment4Mon, 10 May 2010 10:55:51 GMTRamon Smits commented on Graph DB Sharding Strategies: GravityWell, maybe Scott was a bad example ;-) what I ment was your weighting seems off to me. Weighting on geographical data is interesting for *real* friends but I'm using twitter for following / reading tweets that are related to my developer interests. My data read pattern thus based on my relation to the people I follow. I think that that is the most frequent access pattern for most people at least for our profession. People like Phil Haacked also tweet personal stuff but that isn't the reason why they follow him. If you would be very actively related with your mom then that still doesn't weight much for all his followers. Mentions having a weight which result you and your mom ending up on the same shard would not at all benefit the followers which read data I/O wise would benefit from being all on the same shard. Still this could result in I/O bottlenecks so maybe it would even be better to only put the 'profile' data on the same shard because these accounts are weighted to be related but not the tweet data as that could result in better I/O is those are scaled out. With having a less weighted value when you follow more people. I think that doesn´t matter at all in the end when all follow associations would have that same weight strategy. Another issue is that this is more based on recent behavior as in maybe even today. Because you are actively related today (thus high escape velocity) does not mean it would benefit your followers in read round-trip times at all to relocated all that data as it can just as easily be that you don't even be in a 'conversation' in the future. However, this would strategy would benefit the 'active' users though. So if that would be the primary objective (having the best read/write performance between activity related users) then this would be a very good solution but we all know that it twitter is not about conversations its about eavesdropping conversations. Well, maybe not that secretly then ;-) but IMHO its all about listening to what others are doing/saying to learn and understand and that every 5 minutes all day long :-) http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment3http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment3Mon, 10 May 2010 10:53:02 GMTAyende Rahien commented on Graph DB Sharding Strategies: GravityRamon, Nope, not at all. Take a look at the actual numbers: Ayende's friends: 59, Intersection with Rob: 4, Intersection With Scott: 0 Rob's friends: 31, Intersection with Ayende: 4, Intersection With Scott: 0 Scott's friends: 100, Intersection with Ayende: 0, Intersection With Rob: 0 Total intersection 0 http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment2http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment2Mon, 10 May 2010 10:19:09 GMTRamon Smits commented on Graph DB Sharding Strategies: GravityI don't know if this is the correct optimization. For example, I'm not a real known developer but people like you, Rob Conery, Scott Hanselman are. To me it seems that these people will end up in the same shard but is that really what you would want? It would only be interesting if al their followers would have the exact same interests in all these people who are actively related which I seriously doubt. http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment1http://ayende.com/4494/graph-db-sharding-strategies-gravity#comment1Mon, 10 May 2010 10:10:00 GMT