Ayende @ Rahien

Refunds available at head office

re: NoSQL, meh

I was pointed to this blog post, it is written by the guy who wrote ZODB (a python object database) who isn’t excited about NoSQL:

But for me there was a moment of pure joy one morning when an absolutely awesome colleague I worked with at the time said to me something like: "There's a problem with this invoice, I traced it down to this table in the database which has errors in these columns. I've written a SQL statement to correct, or should it be done at the model level?". Not only had he found and analyzed the problem, he was offering to fix it.

Praise the gods. To do similar in Plone, he would have had to learn Python, read up on all the classes. Write a script to get those objects from the ZODB and examine them. Not a small undertaking by any means.

What was going on

The tools for the ZODB just weren't there and it wasn't transparent enough. If there was a good object browser for the ZODB (and yes a few simple projects showed up that tried to do that) that did all the work and exposed things that would really help. But setting up and configuring such a tool is hard and I never saw one that allowed you to do large scale changes.

I also got the following comment, on an unrelated post:

Not directly related, but I'm curious why you don't use Rhino Queues? The tooling with msmq?

Tooling are incredibly important! In fact, it is the tooling that can make or break a system. Rhino Queues is a really nice queuing system, it offer several benefits over MSMQ, but it has no tooling support, and as such, it has a huge disadvantage in comparison to MSMQ (or other Queuing technologies).

With databases, this is even more important. I can (usually) live without having direct access to queued messages, but for databases and data stores, having the ability to access the data in an easy manner is mandatory. Some data is unimportant (I couldn’t care less what the values are in user #12094’s session are), but for the most part, you really want the ability to read that data and look at it in a human readable fashion.

The problems that Andy run into with ZODB are related more to the fact that ZODB didn’t have any good tooling and that the ZODB storage format was tied directly to Python’s pickle.

With Raven, I flat out refuse to release it publically until we have a good tooling story. Raven comes with its own internal UI (accessible via any web browser), which allows you to define indexes, view/create/edit the documents, browse the database, etc.

I consider such abilities crucial, because without those, a database is basically a black hole, which requires special tooling to work with. By providing such tooling out of the box, you reduce the barrier to entry by a huge margin.

This image was sent to me by a lot of people:

This is funny, and true. There is another layer here, you don’t query a key/value store, that is like asking how to feed a car, because you are used to riding horses. If you need to perform queries on a key/value store, you are using the wrong tool, or perhaps you are trying to solve the problem in a non idiomatic way.

Comments

Demis Bellot
04/10/2010 12:45 PM by
Demis Bellot

Agreed, I read that as well and couldn't see how he's problems he was having with ZODB was related to the current NoSQL databases. It is true that most of the technologies are new and most lack a good GUI atm however by design they have a very flat structure that facilitate easy access to your data with a rich set of programmatic or command line access. Most of the time the values are strings and complex types are persisted using the simple and ubiquitous JSON format. I believe that most of the problems are solved using clear text, self describing serialization format.

I can see why you have chosen to implement your own queueing implementation (I've chosen to do the same as well). Out of all the mq implementations I have used, MSMQ is the most limiting and lacking that it doesn't support even the basic MQ Enterprise Integration Patters (book by M. Fowler) .

For corporate environments I would recommend RabbitMQ, however it has been announced that the original designers of the open AMQP spec are abandoning it in favour of a simpler one which you can find at: http://www.zeromq.org - so I guess its worth considering as well.

Rafal
04/10/2010 12:46 PM by
Rafal

I can see the bright future of NoSQL databases. First, vendors provide tooling for browsing, querying and fixing the data. Then they agree on some industry standards for document format, storage, indexing and querying of NoSql databases. Then they design a universal data manipulation language and call it a NoSQL'2012 and we'll have made a full circle back to SQL.

Demis Bellot
04/10/2010 02:09 PM by
Demis Bellot

@Rafal Devs just need to be educated about their available options and the strengths and weaknesses of each. I've been on projects of teams building long running process services (effectively like a queue of complex requests) where a majority of their efforts were spent trying to awkwardly shoe-horn their offline requests into an RDBMS instead of spending their time concentrating on the core business process logic. This could've been easily replaced with a one-liner using something like db4o.

As usual pick the best tool for the job. When your data is not tabular/relational and you don't need rich querying then most of the time there are better solutions to be using than an RDBMS.

Imran
04/10/2010 08:03 PM by
Imran

Hah nice comic.

Justin Chase
04/12/2010 03:19 PM by
Justin Chase

Question, can you access through an API directly in code? Meaning no web-server involved?

Demis Bellot
04/13/2010 10:32 AM by
Demis Bellot

@Justin

Depends on the client. You can access CouchDB directly with an Ajax, Silverlight or Flash app since they provide a JSON+HTTP interface. Otherwise most NoSQL db's (like Redis) are built to handle thousands of concurrent connections so you can access it via a un-impended client like WPF/WinForms/C++ etc if your firewall permits.

I imagine it will also work in Silverlight if you change the server port that its listening to the Silverlight approved range.

Ayende Rahien
04/13/2010 04:47 PM by
Ayende Rahien

Justin,

Depending on what DB you are using

Raven has an embedded option, the others, I don't think so

Chris Nicola
04/18/2010 12:16 AM by
Chris Nicola

Definitely true, if someone is taking the NoSQL approach expecting it's "like SQL but better" then they have some reading to do.

Mario Pareja
04/19/2010 05:32 AM by
Mario Pareja

"...like asking how to feed a car, because you are used to riding horses..."

Pure gold! Sorry Ayende, I'll be stealing that line and using it in a conversation someday.

alphadogg
04/27/2010 01:54 AM by
alphadogg

tabular != Relational.

All structured data is Relational in many different ways, some which one taps into for data integrity in a database.

Unstructured data is essentially useless in computation because it is basically chaotic data where structure cannot be found. Without any kind of structure, there can be no computation.

And, ssemi-structured data is just a handwaiving cop-out for devs too afraid to pick a friggin' schema. :)

Like many other terms in the "NoSQL" movement, from the movement's name onwards, this one is one of the worst.

Comments have been closed on this topic.