Ayende @ Rahien

It's a girl

NoSQL without web-scale

The application data is one of the most precious assets that we have. And for a long time, there wasn't any question about where we are going to put this data. The RDBMS was the only game in town. The initial drive away
from the RDBMS was indeed driven by the need to scale. But that was just the original impetuous to start developing the NoSQL solutions. Once those solutions came into being and matured, it isn't just the "we need web-scale" players
that benefited.

Proven & Mature NoSQL solutions aren't applicable just at high end of scaling. NoSQL solutions provide a lot of benefits even for applications that will never need to scale higher than a single machine. Document databases drastically
simplify things like user defined fields, or working with Aggregates. The performance of a NoSQL solution can often exceed a comparable RDBMS solution, because the NoSQL solution will usually focus on a very small subset of the
featureset that RDMBS has.

Comments

Louis Haußknecht
10/05/2010 02:58 PM by
Louis Haußknecht

That's why CouchDB is targeting mobile devices which is anything but webscale...

BTW: What's the state of the managed file storage for RavenDB? ;)

Ayende Rahien
10/05/2010 03:22 PM by
Ayende Rahien

Louis,

We have a sort of working impl in a branch.

addys
10/05/2010 06:02 PM by
addys

"Proven & Mature NoSQL solutions" ? come on, you know better...

gandjustas
10/05/2010 06:48 PM by
gandjustas

Another one NoSQL-better-than-RDB post...

Let's try a real solution.

A web site - StackOverflow clone.

Entites:

-Questions

-Answers

-Users

Questions can be tagged with zero or more tags. Questions and Answers can be commented. User can vote (+1 or -1) for Question\Answer\Comment.

Use cases:

1)Show recent Questions

2)Show popular Questions (with most votes)

3)Show Question with Answers and Comments

4)Show Questions by Tag (with sorting by date and popularity)

5)Show "tag cloud"

6)Show user Questions, Answers, Comments

7)Show user votes

8)Show comments to user Questions and Answers

9)Create\Update\Delete Question

10)Create\Update\Delete Answer

11)Add comments

12)Each Question\Answer\Comment should be displayed with author and sum of votes.

In RDB "world" solution is trivial.

Create table for Questions\Answers\Comments\Users\Tags\Votes.

Create junction tables for relations if needed.

Use joins to query data. Use indexed\materialized views for votes\answers\comments\questinos-for-tags count.

What about NoSQL?

tobi
10/05/2010 09:13 PM by
tobi

It is true that NoSql is less flexible in case of heavily interconnected data. I suspect, without having tried, that using raven indexes will help greatly because you do not have to maintain all those different representations, that you need for querying, yourself.

Ayende Rahien
10/05/2010 09:24 PM by
Ayende Rahien

gandjustas,

Now scale your solution...

It is actually very easy to handle each of your scenarios with RavenDBl

They pretty much translate directly to an index.

Demis Bellot
10/05/2010 10:20 PM by
Demis Bellot

@gandjustas

I have some documentation available on how you would build a simple blog using Redis available here:

code.google.com/.../DesigningNoSqlDatabase (There's also a refactored version that puts all redis access behind a repository pattern: http://bit.ly/9niEHU).

This essentially mimics what ayende is doing with RavenDB in his series of blog posts here:

ayende.com/.../...al-modeling-anti-pattern-in.aspx

Using Redis Sets / Sorted Sets takes care of voting in a single, super-fast operation, in-fact Jeff Atwood (@CondingHorror of StackOverflow fame) has said that they are making use of Redis now in StackOverflow and all their StackExchange sites: http://twitter.com/#!/codinghorror/status/22417440038

At the moment it looks like its only used for their shared-caching solution although this is just another scenario where NoSQL db's provide superior solutions over RDBMS's.

gandjustas
10/06/2010 02:35 AM by
gandjustas

@Ayende Rahien

Scaling is not necessary. Database with 1M Questions and 10M Answers, Comments and Votes will fit into 50Gb. It's not-so-large database. One server can easely handle this data.

A want to see a NoSQL solution for StackOverflow cases.

PS. StackOverflow has less than 1M Questions.

gandjustas
10/06/2010 02:45 AM by
gandjustas

@Demis Bellot

Cache is not a primary data store. For caching there is no requirements for Consistency and Durability.

Patrick Huizinga
10/06/2010 08:00 AM by
Patrick Huizinga

Funny how a blog post about how NoSQL can also be a good choice besides scaling requirements turns into comments about 'scaling stackoverflow.com'.

@qandjusta

Just because you find a case where a document database is not the best fit for a problem from a modeling perspective doesn't invalidate the claim that some problems are better modeled with it.

With the little I know, I would take a look into graph databases for highly connected data like stackoverflow.

And there are requirements for consistency for a caching solution. You don't want the user not to see the update he just made. Remember that eventual consistency doesn't mean no consistency and results in delays at computer scale. For most scaling databases (be that NoSQL or YesSQL) the human reaction time is already considered as laughably slow.

Frans Bouma
10/06/2010 08:29 AM by
Frans Bouma

NoSQL isn't about scaling, it's about NoACID. dbmsmusings.blogspot.com/.../...w-to-fix-them.html

And before that guy is butchered to death, read his resume.

What I also find funny is the remark "nosql db's are easier with aggregates", you really mean "storing readonly denormalized data", I think. Aggregates calculated on the fly really require a set-based language, not a document graph.

Btw, it's not that RDBMS-s always are the best choice, it's just that claiming RDBMS-s are 'old news because something better came along' as NoSQL solved all the problems is simply naive.

Demis Bellot
10/06/2010 09:07 AM by
Demis Bellot

@gandjustas

I provided those examples because it is existing documentation available that closely matches what you want to achieve. You should be able to extrapolate based on those approaches to meet the other requirements.

There is nothing inherently difficult about creating a StackOverflow using a NoSQL database. Of those features mentioned, what do you think would be the most difficult of those to maintain in a NoSQL db?

IMHO the most unnatural part of building a db with NoSQL is to identify your querying requirements upfront so you can maintain indexes on them. You can always add the indexes after and doc db's like RavenDB and MongoDB also allow you to perform adhoc querying after the fact. So there is a little of thinking different and my first link on my previous comment should hopefully help with designing a NoSQL db from a RDBMS background.

Included as test data for my Redis Admin UI demo I've imported the entire Northwind Relational Database you can see here: www.servicestack.net/RedisAdminUI/AjaxClient/

From a list of POCO's importing the entire Northwind DB literally took around 11 lines of code (see bottom: code.google.com/.../ServiceStackRedis) 1 LOC per table.

gandjustas
10/06/2010 09:17 AM by
gandjustas

@Patrick Huizinga

Can you tell what problems better modeled with document database?

Ayende Rahien
10/06/2010 09:40 AM by
Ayende Rahien

Frans,

RavenDB is fully ACID.

There is nothing in NoSQL that says tat you need to lose ACID.

Aggregates calculated on the fly really require a set-based language, not a document graph.

Huh?!

Let us talk about the Order Aggregate, what do you mean about it from there?

gandjustas
10/06/2010 09:49 AM by
gandjustas

@Demis Bellot:

IMHO the most unnatural part of building a db with NoSQL is to identify your querying requirements upfront...

I'm already identified querying requirements in my first comment. But no one offered solution in NoSQL (I posted this requirement in other blogs and forums).

Demis Bellot
10/06/2010 09:55 AM by
Demis Bellot

@gandjustas

Can you tell what problems better modeled with document database?

I would say anything that is non-relational would be a candidate.

I actually view NoSQL dbs as a complimentary rather than a supplementary technology. It definitely isn't the right choice in all cases although there are clear scenarios where it holds advantages over a RDBMS - I list a few of them in my blog post here: http://www.servicestack.net/mythz_blog/?p=129

As is the case with any new technology there is sometimes a fear of the unknown when dealing with NoSQL db's, however I would approach NoSQL db's like learning a new language, once spending some time to get familiar with it you will learn different approaches to solving the same problem which will at the very least make you a better all-round developer. This will give you a better idea to assess where it makes sense to use it or not.

The beauty of NoSQL is that most of them are free and are very easy to get started (typically just download the server and run). For a quick taste, Google App Engine actually provides a general purpose free hosting web development environment in Python or Java. It uses BigTable as its primary storage and I think you will be surprised how quick and frictionless it is to develop and deploy apps based on it.

Demis Bellot
10/06/2010 10:01 AM by
Demis Bellot

@gandjustas

I'm already identified querying requirements in my first comment. But no one offered solution in NoSQL

I'm asking you what you think is the most difficult feature so I can explain how you would achieve that particular functionality in detail. Providing a complete Stack Overflow solution is not the best of use of our time especially since the existing documentation I provided should give you a general idea on how you would use NoSQL to model the solution.

Frank Quednau
10/06/2010 10:32 AM by
Frank Quednau

How does the failover story read in RavenDB?

Ayende Rahien
10/06/2010 10:34 AM by
Ayende Rahien

Frank,

If you have replication setup, you have automatic failover.

Peter Morlion
10/06/2010 10:50 AM by
Peter Morlion

@gandjustas:

I believe the NoSQL option has its benefits, but also its drawbacks. One of the big benefits for me is the fact that you no longer need complex mapping schemes (NHibernate) or ugly ActiveRecord-style code (or plain old SQL in your code, ugh).

NoSQL does force you to construct your domain model nicely, with aggregate roots and such, but that can be seen as something positive.

You will have to solve problems where one aggregate needs to reference another, or part of another.

But even then, you get to focus on your domain, business and UI logic, which should be your core business.

gandjustas
10/06/2010 11:09 AM by
gandjustas

Most difficult feature is implement ALL cases with good-enough performance without by-hand denormalization, aggregetion etc.

Ayende Rahien
10/06/2010 11:11 AM by
Ayende Rahien

gandjustas,

Not at all. For that matter, take a look at Raven's MVC Music Store example

gandjustas
10/06/2010 11:12 AM by
gandjustas

@Peter Morlion,

You will have to solve problems where one aggregate needs to reference another, or part of another.

It's will be a biggest problem, with most performance impact.

gandjustas
10/06/2010 11:26 AM by
gandjustas

Oh, MVC Music Store is a good example.

It's not good data access with EF. There are lack of projections.

In RavenDB version it's completly unextensible. What if I want to display

"top sales artist" widget on pages? Or I need create Rating for Artists: user rates artists, rating affects catalog sorting etc?

tobi mentioned "It is true that NoSql is less flexible in case of heavily interconnected data". But data can become "heavily interconnected" after applications shipped first time. It completly kills NoSQL for majority of applications.

Ayende Rahien
10/06/2010 11:28 AM by
Ayende Rahien

"top sales artist" widget

Create an index, query the index, done.

gandjustas
10/06/2010 12:04 PM by
gandjustas

Creating index on each query prevents dynamic query composition in application code, eg no Linq.

Ayende Rahien
10/06/2010 12:05 PM by
Ayende Rahien

gandjustas,

No, it doesn't.

And RavenDB certainly supports linq.

Jesús López
10/06/2010 06:03 PM by
Jesús López

@Ayende

Does Raven DB support queries defined at run time?

One advantage relational systems has over no SQL databases, is that you can create queries at run time, and they can create a query plan that can use existing indexes.

It seems that with Raven DB you need a predefined index, without that intex it is is difficult to have a decent performant query defined at runtime.

On the other hand, with no SQL databases, It seems to be a must to know query nedds up front. But query needs change over time. Furthermore, query defined at runtime cannot be predicted.

I see no SQL databases can help in current days. But they cannot substitute RDBMS for now.

Ayende Rahien
10/06/2010 06:35 PM by
Ayende Rahien

Jesus,

That is a relatively new feature, but yes, it supports that.

Frank Quednau
10/06/2010 08:00 PM by
Frank Quednau

Ayende,

yes you wrote about that, but you also said that a runtime query is the equivalent of a full table scan. This should be different to RDBMS'possiblity of reusing existing indices.

btw, at some point all this stuff shouldn't be called NoSQL anymore. For all I know you could introduce a SQL parser to RavenDB to define your indices, and what then? Subsequent renaming of hundreds of blog posts! When you are young it's nice to differentiate yourself by saying what you are not, but that can't be the end of the road :)

tobi
10/06/2010 08:08 PM by
tobi

gandjustas, I think it is easy to query the same table in different ways with raven (just add an index). I am in favor of that approach. But the problems start to arise (IMHO) when you need a new query that joins two tables. Then you are forced to do manual maintainance because indexes in raven do not support the join clause.

In raven you can create an index that reformats an existing table but you cannot combine data from different tables automatically. I would be so happy if this feature was in the product. Manual maintainance of data structures sucks.

Seemingly, the only support of this scenario to some extent is the "include" feature, but that always does a nested loops join. You cannot get hash or merge join with it. I believe that join indexes can be maintained efficiently as well because sql server can do it. Ayende, why do you not implement it? Do you want to set the right mindset in raven and discourage the use of joins?

Ayende Rahien
10/06/2010 08:18 PM by
Ayende Rahien

Frank,

No, not really. That is because the feature is so new, I haven't had the chance to blog about it :-)

There are 3 ways to query RavenDB:

  • Indexes

  • Linear query (which is what I blogged about, table scan).

  • Auto Query - uses same syntax as usual indexing, but doesn't require an index, and very efficient.

And actually, we DO support set based update/deletes :-)

Ayende Rahien
10/06/2010 08:20 PM by
Ayende Rahien

you cannot combine data from different tables automatically

That is what the map/reduce indexes are for. You can do that there.

As for how this is implemented, we probably need to ask this in the mailing list.

tobi
10/06/2010 09:06 PM by
tobi

"you cannot combine data from different tables automatically"

I have seen your example ( ayende.com/.../...ting-the-homecontroller-the.aspx) but only one table is involved in the map reduce index. If the order lines were a separate entity the index could not be constructed.

Ayende Rahien
10/06/2010 09:17 PM by
Ayende Rahien

Tobi,

Yes, it could.

It would be somewhat more awkward, but...

// map

from orderOrLine in docs.With("Orders", "OrderLines")

let order = orderOrLine.Is("Orders") ? orderOrLine : null

let line = orderOrLine.Is("OrderLines") ? orderOrLine : null

select new

{

Album = order == null ? line.Album : null, 

Quantity = order == null ? line.Quantity : 0,

Customer = order == null ? null : order.Customer 

}

// reduce

from result in results

group result by result.Album into g

select new

{

Album = g.Key, 

Quantity = g.Sum(x=>x.Quantity),

Customer = g.First(x=>x.Customer != null).Customer

}

tobi
10/07/2010 10:57 AM by
tobi

Hm I did not know about the With method. I believe your query does not work because result.Album can be null so grouping by it makes no sense. It can be fixed however and that is all that counts. In general you can get a joined index with the following steps:

var map = orders.Select(x => new { o = x, c = null }).Concat(customers.Select(x => new { o = null, c = x }));

var reduce = map.GroupBy(x => x.o, (order, group) => new { order, group.FirstOrDefault() };

Probably you can construct a helper method that constructs such a query by using the expression api. That would restore the convenience factor.

Comments have been closed on this topic.