SQL Azure, Sharding and NHibernate: A call for volunteers
I was quite surprised to hear that SQL Azure has a 10 GB limit for each database. That drastically reduce the amount of effort that I guess SQL Azure takes. At a guess, I would say it is simply replicated instances of databases instead of real SQL on the cloud.
One of the nice premises of working on the cloud is that you get transparent scaling. 10GB limit is not transparent. The answer from Microsoft seems to be that you need to implement Sharding. That is, you spread your logical database over several physical databases.
Usually it is done on physical database instances for the purpose of speeding up application because you get can parallelize the queries. In this case, you would need this because each database is pretty small.
Sharding is a term that was invented by Google, and a few years ago several Google engineers decided that they want to use Sharding with Hibernate. Thus, the Hibernate Shards project was born, bringing transparent sharding support to Hibernate.
The equivalent project for NHibernate was started, but porting was never complete. This is a call for volunteers to help continue the port of Hibernate Shards to NHibernate. You now have a very clear goal for why you would want that.
Having NHibernate Shards fully functional would mean that you get transparent scaling on SQL Azure. The fun part is that there isn’t a lot of thinking or design involved, the road was already traveled by, the only effort would be porting it.
And, to give some incentive, I am willing to donate an NH Prof license for all the major contributors that would finish the Hibernate Shards port.
Never really understood why move to Azure and keep the relational model. It seems if I'm going to move to Azure it is for benefit of building a scalable application which would mean Azure Storage (tables). If I just want to take an existing app and 'cloud it up' EC2 is a better fit.
SQL Azure just seems like a compromised solution. They started with a different goal but it didn't work (at the initial stages it was hard to differentiate between tables which lead to confusion and the later stages never came) so now its just 'SQL in the cloud'.
Sounds great. I'll tweet you directly.
In many apps, you have a few tables that hold 99% of your raw data. Those are obvious candidates for Blob Storage or Table Storage under Azure. Then, the other 50 tables just aren't worth the migration because they don't hold that much data. This is more or less our situation at Lokad.com and the approach we have adopted while migrating to Azure.
That would be much better than my proposal :)
(see comments in ayende.com/.../...nate-on-the-cloud-sql-azure.aspx)
would be happy to volunteer if i can contribute, how can i help?
I'm pretty busy with the Linq to NHibernate work at the moment, but if there's still a need for contributors once I've got that nailed, then I'd be up for it. Already ported a lot of Hibernate Java code over to NH, so it's something I'm pretty used to :)
I could use this!
How can I help?
Would be happy to help!
I am not very experienced, but I am eager to learn and would love to contribute, where do I sign up ?
I would have thought Shard or Sharding came from MMOG's like Ultimate Online, which you logged onto different shards. These were deployed to manage player load, and locality to the server (ping time), basically the same scaling problem, and around the same time 1996/1997.
I'd like to help make NHibernate Shards alive!
Hi Ayende, glad to read this post !
I began with NH.Shards long time ago and I couldn't finish it yet.
I moved some commits to this svn because a friend was helping me with some code...
But I will commit those changes to the trunk/ on NH.Contrib if you like then somebody can continue, and I will be happy to help ;-)
Dario, I glanced nhshards repo and seems that it has quite much of the base work done. IMO it would be good to sync those changes to the contrib.
I'm definitely game, where do I sign up?
@eyston There's a lot of info regarding SQL Azure. Long story short, they tried a schema-less and got a very vocal response from those who wanted "SQL in the cloud". Reading in between the lines of the announcement of the transiton, Microsoft said that SQL Azure will support TDS (the protocol of SQL server). I imagine that they have an engine in front of the schema-less data storage that handles the translation for you.
It's not a compromise, it's the best of both worlds. My hope is that the schema-less API will be re-enabled at some point in the future.
Would definitely be interested in contributing...
How/Where do I get started?
Anything to help the NHibernate community. Have used NHibernate extensively over the last few years - it would be nice to give something back!
I'm in debt with NHibernate: it saved me many work hours.
If I can contribute in any way, please tell me where I can sign up.
Hi, I have worked with NHibernate during the last 3 years and I really like to contribute. Where do we start?
The place to discuss this is the nhibernate contrib mailing list.
I think this would be great, and it seems like a lot of people are willing to help
The place to discuss this is the nhibernate contrib mailing list.
As for what SQL Azure is, it is most definitely not sitting in front of the schemaless storage. There is just no way it could work.
The 10GB limit makes me think that they simply put some version of SQL Server and handle replication on the fly. That way they can still benefit from what SQL Server can do.
No, there is no translation, it is just SQL installed on their virtual instances.
The original SSDS (SDS) was schema less and built to scale. When presented at PDC it wasn't fleshed out, didn't really do joins the way people wanted, but it scaled "automatically" (as long as you embraced the new way of thinking). The fact that it was schema less and the relational bits weren't really there made people confused with the differentiation between Azure Table Storage and SQL Data Services.
It looks like SDS wasn't going to be mature enough to meet the Azure schedule and people weren't really interested in learning the different paradigm so we get SQL Azure which has no 'seamless scaling' capabilities that every other Azure service has. It kind of sticks out like a sore thumb to me. I'm sure more details / roadmap will be unveiled at PDC, because I don't think SQL Azure is the long term goal. Complete speculation though.
It should be noted that amazons simpledb (cloud schemaless db) also has a 10gig limit per domain (database). You reach that limit pretty fast in a scema-less world. Sharding your fact data is a good idea for both scale and performance.
Any chance you could finish Shards this week ?? :) We need this, like now! We are creating the ultimate Azure killer app - no really! Lots of database partitions in the cloud - for performance reasons mostly - need to sell lots of stuff in a very short period. I need to create an automatic partitioning and de-partition mechanism so we can move slices of the database in and out of the cloud as demand increases or decreases. I think sharding might help with this.
If you are serious, we can talk about sponsoring the Shards development.
Putting money into this is the way to make sure that it will happen fast.