Ayende @ Rahien

Ayende Rahien commented on That No SQL Thing

Thu, 08 Apr 2010 10:33:52 GMT

Justin, In general, you don't use multiple keys to store a single value, hence, there is no need to handle multi key transaction. In other words, you bring relational thinking to non relational world, and then complain that things are broken.

Ayende Rahien commented on That No SQL Thing

Thu, 08 Apr 2010 10:32:37 GMT

Parag, I don't know object databases well enough to answer that.

Todd Price commented on That No SQL Thing

Fri, 02 Apr 2010 05:57:11 GMT

Like many other latest-and-greatest solutions, this one (NoSQL) solves a problem I do not have. If I did suddenly have terrible scaling problems and run head on into CAP, I would definitely check out NoSQL. So I am grateful for the explanation of the problem Ayende. One day I hope to have that problem, when I invent the next YouTube and get Google to buy me for way too much money. Until then, I'll happily use my rusty little SQL database and continue to try to milk as much value as possible out of the technology castle I've built for my company these past 12 years.

Sony Mathew commented on That No SQL Thing

Tue, 30 Mar 2010 19:26:24 GMT

RDMBS's are being scaled using shared-cache architectures (e.g. Oracle RAC) and in memory data grids (e.g. Oracle Coherence) with ACID intact. But these are expensive solutions in my opinion - a NoSQL approach fronted by data services providing "most" ACID qualities feels like would give better performance throughput per $ spent. Additionally domain models generally map better as object models are more cleanly exposed via such data services.

Justin commented on That No SQL Thing

Tue, 30 Mar 2010 14:22:30 GMT

From your newer post: "Transactions – while it is possible to offer transaction guarantees in a key value store, those are usually only offer in the context of a single key put. It is possible to offer those on multiple keys, but that really doesn’t work when you start thinking about a distributed key value store, where different keys may reside on different machines." Transactions on a single key put would be like an RDBMS only offering transactions on a single row update at a time. This is LESS isolated than even "Read Uncommitted" in MSSQL, oh no corrupt data!!! "Some data stores offer no transaction guarantees." I thought most (all?) offered transactions, which is it?

Parag commented on That No SQL Thing

Tue, 30 Mar 2010 05:10:36 GMT

BTW where is your gravtar image ? Find it difficult to track your replies!

Parag commented on That No SQL Thing

Tue, 30 Mar 2010 05:09:22 GMT

Ayende how do we deal with revisions in object databases ? For eg, if I have a person object. After 6 months in production I add a new column "ExternalID" . At this point how do I deal with data that is already there ? If I don't modify them, then how am I going to show the object in the view ?

Ayende Rahien commented on That No SQL Thing

Mon, 29 Mar 2010 22:29:05 GMT

Justin, NoSQL doesn't mean no transactions. Most (all?) NoSQL solutions are transactionable.

Justin commented on That No SQL Thing

Mon, 29 Mar 2010 16:52:33 GMT

Why aren't you worried about "corrupt" data as you put it with NoSQL databases? Giving up transactions lets say is worse than "Read Uncommitted" in MSSQL, and you went on on on how that would lead to corrupt data what gives? My point is RDBMS give you all those properties if you want them, and many will allow you to turn them off if you want performance. RDBMS can scale just like a NoSQL solution if you are ok with eventual consistency and lack of transactions and constraints, but they give you the option to sacrafice scalability for correctness, where a NoSQL solution may not. BTW Google adwords runs on MySQL, is that enough scalability for you?

Ayende Rahien commented on That No SQL Thing

Mon, 29 Mar 2010 16:46:52 GMT

Onur, It doesn't take a lot of data to move outside the realm of a single machine. If you are assuming read clone, the cost of large amount of storage goes up very quickly.

Onur Gumus commented on That No SQL Thing

Mon, 29 Mar 2010 16:34:59 GMT

@Ayende I think the space consideration is not the concern here in terms of scalability. Will I be able to do next google ? Hell no. (Yet yahoo uses a highly modified Postgresql) . My very point is the statement "RDBMS does not scale" is wrong. @Demis please see my first comment in this post that links to Postgresql site. The options are mentioned there.

Ayende Rahien commented on That No SQL Thing

Mon, 29 Mar 2010 15:57:01 GMT

Onur, Put simply, what when your data set outgrows what can be stored on a single machine? On 5 machines?

Andrew commented on That No SQL Thing

Mon, 29 Mar 2010 15:21:22 GMT

I'm not sure why everyone's so concerned with scalability, when the biggest benefits of a OODB is the fact that they're so easy to use. The fact that no longer being required to to polute 60-75% of our code base with handrolled SQL or being forced to use an ORM just to communicate with a Database should be reason enough to at least look at using a NoSQL solution. And, as always, if you are doing reporting off your transaction database, you are Doing It Wrong (tm). Quite frankly, it amazes me that it even comes up as an arguement.

Demis Bellot commented on That No SQL Thing

Mon, 29 Mar 2010 13:07:00 GMT

@Onur Gumus Ok I missed the 'Statement-Based Replication Middleware' solution you've quoted. This very much looks like Master-Master replication which by the sounds of it is suggesting is happening on the app/middle-tier level. Master-Master replication is a good option for a small datasets but doesn't help much with scaling as your writes are effectively multiplied amongst available 'master servers' and not divided in partitioned datastores which is obviously the most efficient solution. Another factor which does not make it an ideal candidate for 'horizonal scaling' is that the dataset is not partitioned and each master effectively has the entire copy of the dataset. Master-Master replication also adds extra complexity in maintaining consistency across all master servers. Are the writes asynchronous or blocking? How do you compensate for masters that are down? Again this is all do able with RDBMS, it just takes a lot more effort and resources.

Demis Bellot commented on That No SQL Thing

Mon, 29 Mar 2010 12:46:31 GMT

@Onur Gumus Replicated read slaves still takes processing resources from the central master database server(s). When the master goes down all system writes effectively cease. Depending on whether you can partition your data then sharding can be a superior solution. In these cases there is still a master table maintained on a central server(s) that normally serves as a lookup as to which shard the partitioned data (e.g. User) lives. When adding new users you still need writeable access to the master table, again both cases rely on a central database server(s). In contrast with NoSql datastores if one server goes down, only the users that live on that store are affected. New users can still be stored in one of the available datastores. We've opted to go with a sharded PostgreSql solution for our persistence needs though we still utilize intelligent in-memory tiered data views for both increased perf and to take processing off the db servers. Any read/write operation we can do on Redis we consider to be a No-op. Even at our small scale we are hitting RDBMS limits as we have services that routinely perform 1000+ writes, on Redis this can easily be achieved <1sec, while on Postgresql (even sharded) it takes a lot longer.

Onur Gumus commented on That No SQL Thing

Mon, 29 Mar 2010 12:21:05 GMT

@Demis Bellot "In short all RDBMS scaling options require access to a central server (which given a large enough load becomes the bottleneck)" I fail to see how above is correct. Please read my earlier post carefully. I don't see a central server (at least for read-only queries) is obligatory.

Demis Bellot commented on That No SQL Thing

Mon, 29 Mar 2010 10:58:24 GMT

@Onur Gumus In short all RDBMS scaling options require access to a central server (which given a large enough load becomes the bottleneck). They are also more costly in hardware and maintenance costs, here's a good article explaining it in a bit more detail: [stu.mp/.../...l-vs-rdbms-let-the-flames-begin.html](http://stu.mp/2010/03/nosql-vs-rdbms-let-the-flames-begin.html)

Demis Bellot commented on That No SQL Thing

Mon, 29 Mar 2010 10:50:55 GMT

@Jonathan Allen If you're storing XML as text blobs in the database you may want to checkout my Open Source C# POCO serializer: [code.google.com/p/servicestack/wiki/TypeSerializer](http://code.google.com/p/servicestack/wiki/TypeSerializer) It's a 3.5x faster and 2.6x more compact than .NET's XML DataContract serializer. It's also cleaner and more resilient to schema changes, supports inheritence, late-bound object properties, etc and can work with any C# POCO type, not just DTO's. Effectively it was made for blobbing data in a fast, clean text format.

Onur Gumus commented on That No SQL Thing

Mon, 29 Mar 2010 10:33:00 GMT

How does load balancing differ from scalability ? Also I see solutions there does involve multiple servers. Master-Slave Replication A master-slave replication setup sends all data modification queries to the master server. The master server asynchronously sends data changes to the slave server. The slave can answer read-only queries while the master server is running. The slave server is ideal for data warehouse queries. In above read queries are balanced. Thus gives us a degree of scalability. ---------- Statement-Based Replication Middleware With statement-based replication middleware, a program intercepts every SQL query and sends it to one or all servers. Each server operates independently. Read-write queries are sent to all servers, while read-only queries can be sent to just one server, allowing the read workload to be distributed. Same goes with above. I assume you read all these. If you are after something like map-reduce, I would say, map reduce isn't the only way of scalability. Load balancing , even if the query is executed on a single server means scaling in my book. Am I wrong ?

Demis Bellot commented on That No SQL Thing

Mon, 29 Mar 2010 10:17:16 GMT

I'm glad you're weighing on this topic as well Oren, as there have been a flamefest brewing on the Internet of late, mostly from people who haven't used NoSql databases before and think they're 10 years of RDBMS experience gives them enough qualifications to comment on it - it really doesn't NoSql datastores have solved a lot of problems that have been typically hard to do with RDBMS. Note NoSql is not a replacement for RDBMS, they actually complement each other quite well. It's still all about choosing the best tool for the job. Quite simply RDBMS is good (and is still the best at) storing Relational, tabular data, their is no disputing that and that statement still holds true. It's not however so good for storing deep hierarchical data or for storing alternate data structures i.e. Message Queues, etc. (it can still be done, but like any hammer it's not a good fit). Ok I've noticed a couple of one-line comments that indicate that RDBMS (or even their particular brand of RDBMS) can actually scale. Scaling for all intents and purposes means 'horizontal scaling' i.e. you can throw an extra commodity server in the cluster and you can handle 1/n more load than it did before. Usually this means that there are no single bottlenecks (i.e. central servers) each request goes through which allows you to evenly distribute your requests evenly over your app servers and data stores. Most NoSql databases all clients include consistent hashing algorithms which allow you to do this. In the RDBMS world the way we typically scale is to use either Master/ replicated Read slaves or partition your data in a sharded architecture. These are still good approaches to scaling RDBMS they are however more complex to configure and typically cost more to run and maintain than their NoSql equivalents. There are other good reasons to try the NoSql route, namely speed (e.g. Redis can perform 110,000 write operations on an entry level linux box) and schema-less designs. Both these topics are too big to cover in a single comment so I'll try cover them in my own blog posts when I can find the time. Now most of the time we're developing enterprise applications for internal use so we're lucky enough to never hit the scaling limits of RDBMS's in these cases it's safe to ignore NoSql datastores for your own use (although there are still other benefits). Unfortunately as an architect of a social media service (mflow), performance and scalability considerations are mandatory requirements that must be factored into our design which basically consists of an in-memory 'cached data views' mirroring our persisted data which resides on multiple sharded postgresql databases. For those that are interested in trying out NoSql datastores I recommend looking at Redis for which I maintain a rich C# redis client (and windows server builds) at: [code.google.com/.../ServiceStackRedis](http://code.google.com/p/servicestack/wiki/ServiceStackRedis)

Ayende Rahien commented on That No SQL Thing

Mon, 29 Mar 2010 10:08:56 GMT

Onur, Look at the 4th paragraph in the link that you provided, that lay out the problem in easy to understand terms. All of the solutions provided in that link are ways of dealing with information on a single server, and just load balancing the load. The only one that provide a multi server query execution is the paragraph with: Multi-Server Parallel Query Execution And that does seem to provide for solutions for failover, relies on a single server and is going to hit a perf cap very quickly (with a 100 servers, your query time is going to be consumed with managing the query, not actually querying).

Ayende Rahien commented on That No SQL Thing

Mon, 29 Mar 2010 10:03:09 GMT

Onur, I am familiar with this (well, I am familiar with this on Oracle's RAC side). The problem is that those solutions run into CAP head on. If one of your servers goes down, Bad Things happen.

Ayende Rahien commented on That No SQL Thing

Mon, 29 Mar 2010 09:54:28 GMT

Jonathan, You aren't working with a single database, you are working with copies of it. That isn't a single RDBMS spanning multiple machines and allowing you to query on all the data in the DB. "dozens of databases serving different roles" - agreed, that IS my point.

Ayende Rahien commented on That No SQL Thing

Mon, 29 Mar 2010 09:52:03 GMT

Onur, Nope, it doesn't matter what the DB product is, relational DBs breaks down at scale not because of implementation, but because of their very nature.

Onur Gumus commented on That No SQL Thing

Mon, 29 Mar 2010 06:17:17 GMT

I think you are wrong. Postgrsql can scale

Jonathan Allen commented on That No SQL Thing

Sun, 28 Mar 2010 22:02:02 GMT

> The problem is inherit into the basic requirements of the relational database system, it must be consistent, to handle things like foreign keys, maintain relations over the entire dataset, etc. Where did you get the cazry notion? People use replciated databases all the time. By their very nature they don't have foreign keys, nor are they necessarily consistent with the main database at any given time. (Where I work each table is anywhere from a couple of seconds to 2 hours delayed depending on how important it is and how often it changes.) And then there are true reporting databases. These are denormalized versions of the data that are specifically designed to be fast to query. They may even be reduced all the way down to key-value pairs. (Again, where I work we have XML blobs that were distilled from a dozen tables.) You seem to be suffering from the same mistake a lot of novice database developers make, which is thinking that the game being and ends with a single, highly normalized, transactional database. But in a real system you could have dozens of databases serving different roles, often using different schema to share the same data.

Harry Steinhilber commented on That No SQL Thing

Sun, 28 Mar 2010 15:50:48 GMT

@Frans, I haven't noticed the data in document db's to be rigid at all. I have made many changes to the layout of my data in my current application with almost no need to think about it. At least no more thought than I would have using a typical instance of SQL Server. I think what you may be thinking of are object databases that serialize objects to persist them. Then you do run into versioning issues. However, most modern document db's do their best to make the requested data fit into whatever object type you specify (and even easier if your in a dynamic language where it can always make it fit).

Ayende Rahien commented on That No SQL Thing

Sun, 28 Mar 2010 09:50:02 GMT

Frans, It is actually quite easy to get to a point where SQL don't really work for you. We _are_ dealing with a lot more scale than in the past. Sure, for apps that have relatively few users (where few is tens of thousands), that is not an issue, but when you start talking about large number of users, large amount of data, complex interactions that you need to deal with, you run into the limitations of CAP, and when you do that, you can't use RDBMS. Case in point, [http://www.gilt.com/](http://www.gilt.com/), this is an online shopping site where they have a sale every day at noon, they have _huge_ peeks at those times, they _tried_ using RDBMS, and then moved to other options, because they handled insane peeks much better. Even few thousands users hammering the site at a given point in time is likely to kill it. And we haven't talked yet about how to handle tables with billions of rows where you _are actually touching those rows_, rather than just write them out. Yes, RDBMS are perfectly fine for a lot of apps, but I think that I did a good job in establishing the _context_ in which you shouldn't use them. Data in doc dbs is rigid? What gave you that idea?

Frans Bouma commented on That No SQL Thing

Sun, 28 Mar 2010 09:38:49 GMT

> "At some point, RDBMS can't handle the load without running into CAP. At that point, they stop working for real world usage." I've been using relational databases for many many years, and I've yet to see a relational database that big that was too slow to keep up. Mind you: millions and millions of new rows per day isn't too much. For the vast majority of the people using databases in their applications, relational databases are just fine, only for the very few who write the new amazon.com, or the google competitor might need different databases, but how many are those? a handful. I'm not saying NoSQL should just die off as it has no value, it does have value, but the scope is pretty limited as the data contained in document databases is rigid and you need the software to give meaning to the data, plus creating new projections on the data without the software (e.g. in different software) is hard. If that's of no concern, be my guest, but people should realize that. Btw. NoSQL means Not Only SQL. It doesn't mean !SQL.

Ayende Rahien commented on That No SQL Thing

Sun, 28 Mar 2010 08:58:15 GMT

Frans, At some point, RDBMS can't handle the load without running into CAP. At that point, they stop working for real world usage. See the links that I provided.