Ayende @ Rahien

Refunds available at head office

Designing a document database: What next?

So far I posted quite a few of posts about building the document database. To be frank, the reason that I did this is because the idea has been bouncing in my head a lot recently, and sitting down and actually thinking about it has been great, especially since now I have the design dancing in my head, shiny & beautiful. Here is the full list, in case you missed anything:

  1. Schema-less databases
  2. Designing a document database
  3. Designing a document database: Storage
  4. Designing a document database: Scale
  5. Designing a document database: Authorization
  6. Designing a document database: Concurrency
  7. Designing a document database: Attachments
  8. Designing a document database: Replication
  9. Designing a document database: Views
  10. Designing a document database: Aggregation
  11. Challenge: C# Rewriting
  12. Designing a document database: Aggregation Recalculating
  13. Designing a document database: View syntax
  14. Designing a document database: Remote API & Public API

A few days ago I asked on twitter what do people think, do I have this written up yet or not. Opinions seems to be divided on this score. Let me try to set the record straight. I have a lot of scattered code around this, yes. But it is not a project, is is a lot of tiny experiments to prove that one approach or the other would work. This series of posts has required a lot of research. But I don’t have anything that is even remotely close to a working system.

I am estimating that it would take a month or two to take this from the drawing board to something that I would be willing to use in production*. This if full time work, by the way. It is likely that I can get something usable faster than that, depending on your definition of usable :-). Most of the challenge is going to be in implementing the views, as I see it now. Everything else seems to be pretty straightforward.

That is somewhat of a problem. I don’t really want to spend several months (and the associated support costs afterward) to build an open source project. The main issue is that while it is fun, there is simply no money in it, and I heard that eating is mandatory. On the other hand, I don’t really see something like that selling as a commercial package. This is infrastructure, and infrastructure has been commoditized. The ideal solution from my point of view is what we tried to do with Linq to NHibernate. Getting a company, or several companies, to sponsor its development as an OSS project.

The motivation would be the same as usual, this is something that the aforementioned companies need, and are willing to pay for. It didn’t end up the way I expected it with Linq for NHibernate, but it ended up very well after all, so I am happy about that.

Oh, and as an aside, if you want more posts in this series, do suggest a few topics that you want to hear about.

* Just to give you an idea about the complexity involved, I estimated Linq to NHibernate to be about 3 months.

Comments

Simon Segal
03/17/2009 07:05 AM by
Simon Segal

Ayende

What about extending this into Queuing? What would be involved in turning the idea on it's head a little and using a document to describe a message? What about the API that would be required, would you simply make the messaging work within the confines of the REST API that you proposed? How about making it work in a Sandboxed technology such as Silverlight. Silverlight currently makes the Store and Forward pattern very difficult. I am curious.

Rafal
03/17/2009 07:19 AM by
Rafal

Ayende, there is one question that wasn't asked before: why did you want to build Couch DB at all, since the original is already built and is free? This elliminates the need to pay for a document database. In your posts I didn't see any feature comparison of couch db and your project, it would probably help to state something like 'my database will have better implementation of X and Y and will provide extra features like Z and ...'

How to make money on such project? Incorporate it into some business application, like document management system.

BTW, I spent half of last night updating application on customer's servers. Needed to change table structure, unfortunately the table has millions of records and the update took a hour and half. My service window was exactly 60 minutes, so I did not make it on time and had to call for extra minutes making customer very anxious. And it was just adding several fields..What would happen if the operation failed and had to be repeated - probably a catastrophe.. and next updates will be only worse. I regret the application isn't built on a schema-less database.

Erik
03/17/2009 09:26 AM by
Erik

I have to agree with Rafal, instead of building "CouchDB.Net" maybe we could build a new "Hibernate" or Linq layer for existing schema-less databases?

Of course it depends on what we intend to do with our document DB...

Ayende Rahien
03/17/2009 10:00 AM by
Ayende Rahien

Simon,

I am not sure that I understand what you mean.

I don't see how queuing is related to Doc DB.

Ayende Rahien
03/17/2009 10:04 AM by
Ayende Rahien

Rafal,

a) Couch DB is not supported on Windows. There is some lengthy process you can go through to get it working, but it is not supported there, and there are several things that it does that make it not work nicely.

b) Couch DB is running on Erlang. In most environments, it is... hard to get a new platform in. .Net is already acceptable for most, and that make is much easier to adopt it.

Ayende Rahien
03/17/2009 10:06 AM by
Ayende Rahien

Erik,

ORM == Object Relational Mapping

If there are no relations, there is no ORM.

There is absolutely no challenge in building a layer on top of the doc db.

Querying a doc db is not really feasible, that is why we have views.

Rafal
03/17/2009 10:45 AM by
Rafal

Couch DB is not supported on Windows

This reason is good enough for me. Please don't abandon your project :)

Simon Segal
03/17/2009 11:30 AM by
Simon Segal

Ayende

It's not related directly to DocDB. I know of a project where an API was written to use SQL Servers Queues much in the same vein as MSMQ, but only to 'facilitate store and forward' at the point of publishing or sending a message. Transport in this scenario was handled by WCF and the dequeing of messages and their transport over the wire was wrapped in a transaction. Do you see any value in extending that idea to a docDB? Do documents map nicely enough to messages and with the transactional support, is this lite weight enough for it to be xcopy portable and a real alternative where MSMQ (or the like) is not going to be acceptable. So if you have a durable storage with support for transactions (like the doc DB under discussion), could it fill the gap and help make something like Rhino.Queues durable?

Fernando Felman
03/17/2009 11:30 AM by
Fernando Felman

Hmm, I might be really off track here, but isn't it all a re-implementation of Couch DB? I mean, yeah, I do get that couch DB is not a feasible option for the Windows environment, fair enough, but why re-designing it?

Would love to hear what are the functional differences between this and the Couch DB, especially the reasoning behind it. I think this can provide some insights into what else Couch DB is lacking, or what kind of scenarios this new project is more suitable for.

Cheers,

F

Nuz
03/17/2009 12:07 PM by
Nuz

Hi,

What are your thoughts on using sharepoint to be the document repository and using its api for some of the tasks?

Regards

eledu
03/17/2009 12:34 PM by
eledu

Companies possibly interested in this project:

Microsoft

I don't see any other who may consider it. Worst if it is a windows only solution.

Uriel Katz
03/17/2009 01:33 PM by
Uriel Katz

isn`t it easier to just build a installer for CouchDB on windows to make it easier to install?

i understand the challenge and satisfaction of building something like that(building a database myself,but for really different needs) but there is already a really good product like CouchDB written in a really good language for the task(except for the IO part,if i am not wrong) so why reinvent the wheel.

after saying that,for educational purposes of teaching how to build something like that to people who aren`t familiar with functional programming,that maybe a good reason.

Ayende Rahien
03/17/2009 04:35 PM by
Ayende Rahien

Simon,

The main reason that Rhino Queues was such a pain to write is the persistence format.

Right now, I have a very easy solution for persistent data, Esent, so I don't think it would be much of a challenge.

About using the doc db for this, you could, I just see no reason that you would want to do that.

Travis
03/17/2009 07:01 PM by
Travis

Did a company or someone step up for Linq to NHibernate? What was the outcome?

Vadi
03/18/2009 11:11 AM by
Vadi

I see one problem though is using Esent. It's the unawareness of existence and reliability. How could i just use my own preferred database server, eg., I could be interested in using SQL Server or Oracle, which solves lot of other problems like clustering, replication etc.,

Do you think to start this as a OSS project so that we can contribute one or two?

Ayende Rahien
03/20/2009 10:58 AM by
Ayende Rahien

Uriel,

Maybe, but I lack the skills to do so. In addition to that, putting Erlang in the enterprise is not something that goes quickly or easily.

I am running into problems just putting MSMQ into place, because it is an unfamiliar tech to the sys admins. Putting totally new platform has high resistance.

Ayende Rahien
03/20/2009 11:15 AM by
Ayende Rahien

Nuz,

Not if you put me over hot coals and made me watch all the Drag & Drop demos in MSDN.

Ayende Rahien
03/20/2009 11:18 AM by
Ayende Rahien

Fernando,

From concept idea, they are very similar.

From design perspective, there are a significant changes all around because of different design constraints regarding the implementation.

Ayende Rahien
03/20/2009 11:22 AM by
Ayende Rahien

Vadi,

What problem do you have with Esent?

What do you mean, unawareness of exsitence and reliability.

Comments have been closed on this topic.