Designing a document databaseWhat next?
So far I posted quite a few of posts about building the document database. To be frank, the reason that I did this is because the idea has been bouncing in my head a lot recently, and sitting down and actually thinking about it has been great, especially since now I have the design dancing in my head, shiny & beautiful. Here is the full list, in case you missed anything:
- Schema-less databases
- Designing a document database
- Designing a document database: Storage
- Designing a document database: Scale
- Designing a document database: Authorization
- Designing a document database: Concurrency
- Designing a document database: Attachments
- Designing a document database: Replication
- Designing a document database: Views
- Designing a document database: Aggregation
- Challenge: C# Rewriting
- Designing a document database: Aggregation Recalculating
- Designing a document database: View syntax
- Designing a document database: Remote API & Public API
A few days ago I asked on twitter what do people think, do I have this written up yet or not. Opinions seems to be divided on this score. Let me try to set the record straight. I have a lot of scattered code around this, yes. But it is not a project, is is a lot of tiny experiments to prove that one approach or the other would work. This series of posts has required a lot of research. But I don’t have anything that is even remotely close to a working system.
I am estimating that it would take a month or two to take this from the drawing board to something that I would be willing to use in production*. This if full time work, by the way. It is likely that I can get something usable faster than that, depending on your definition of usable :-). Most of the challenge is going to be in implementing the views, as I see it now. Everything else seems to be pretty straightforward.
That is somewhat of a problem. I don’t really want to spend several months (and the associated support costs afterward) to build an open source project. The main issue is that while it is fun, there is simply no money in it, and I heard that eating is mandatory. On the other hand, I don’t really see something like that selling as a commercial package. This is infrastructure, and infrastructure has been commoditized. The ideal solution from my point of view is what we tried to do with Linq to NHibernate. Getting a company, or several companies, to sponsor its development as an OSS project.
The motivation would be the same as usual, this is something that the aforementioned companies need, and are willing to pay for. It didn’t end up the way I expected it with Linq for NHibernate, but it ended up very well after all, so I am happy about that.
Oh, and as an aside, if you want more posts in this series, do suggest a few topics that you want to hear about.
* Just to give you an idea about the complexity involved, I estimated Linq to NHibernate to be about 3 months.
More posts in "Designing a document database" series:
- (17 Mar 2009) What next?
- (16 Mar 2009) Remote API & Public API
- (16 Mar 2009) Looking at views
- (15 Mar 2009) View syntax
- (14 Mar 2009) Aggregation Recalculating
- (13 Mar 2009) Aggregation
- (12 Mar 2009) Views
- (11 Mar 2009) Replication
- (11 Mar 2009) Attachments
- (10 Mar 2009) Authorization
- (10 Mar 2009) Concurrency
- (10 Mar 2009) Scale
- (10 Mar 2009) Storage
Comments
Ayende
What about extending this into Queuing? What would be involved in turning the idea on it's head a little and using a document to describe a message? What about the API that would be required, would you simply make the messaging work within the confines of the REST API that you proposed? How about making it work in a Sandboxed technology such as Silverlight. Silverlight currently makes the Store and Forward pattern very difficult. I am curious.
Ayende, there is one question that wasn't asked before: why did you want to build Couch DB at all, since the original is already built and is free? This elliminates the need to pay for a document database. In your posts I didn't see any feature comparison of couch db and your project, it would probably help to state something like 'my database will have better implementation of X and Y and will provide extra features like Z and ...'
How to make money on such project? Incorporate it into some business application, like document management system.
BTW, I spent half of last night updating application on customer's servers. Needed to change table structure, unfortunately the table has millions of records and the update took a hour and half. My service window was exactly 60 minutes, so I did not make it on time and had to call for extra minutes making customer very anxious. And it was just adding several fields..What would happen if the operation failed and had to be repeated - probably a catastrophe.. and next updates will be only worse. I regret the application isn't built on a schema-less database.
I have to agree with Rafal, instead of building "CouchDB.Net" maybe we could build a new "Hibernate" or Linq layer for existing schema-less databases?
Of course it depends on what we intend to do with our document DB...
Simon,
I am not sure that I understand what you mean.
I don't see how queuing is related to Doc DB.
Rafal,
a) Couch DB is not supported on Windows. There is some lengthy process you can go through to get it working, but it is not supported there, and there are several things that it does that make it not work nicely.
b) Couch DB is running on Erlang. In most environments, it is... hard to get a new platform in. .Net is already acceptable for most, and that make is much easier to adopt it.
Erik,
ORM == Object Relational Mapping
If there are no relations, there is no ORM.
There is absolutely no challenge in building a layer on top of the doc db.
Querying a doc db is not really feasible, that is why we have views.
This reason is good enough for me. Please don't abandon your project :)
Ayende
It's not related directly to DocDB. I know of a project where an API was written to use SQL Servers Queues much in the same vein as MSMQ, but only to 'facilitate store and forward' at the point of publishing or sending a message. Transport in this scenario was handled by WCF and the dequeing of messages and their transport over the wire was wrapped in a transaction. Do you see any value in extending that idea to a docDB? Do documents map nicely enough to messages and with the transactional support, is this lite weight enough for it to be xcopy portable and a real alternative where MSMQ (or the like) is not going to be acceptable. So if you have a durable storage with support for transactions (like the doc DB under discussion), could it fill the gap and help make something like Rhino.Queues durable?
Hmm, I might be really off track here, but isn't it all a re-implementation of Couch DB? I mean, yeah, I do get that couch DB is not a feasible option for the Windows environment, fair enough, but why re-designing it?
Would love to hear what are the functional differences between this and the Couch DB, especially the reasoning behind it. I think this can provide some insights into what else Couch DB is lacking, or what kind of scenarios this new project is more suitable for.
Cheers,
F
Hi,
What are your thoughts on using sharepoint to be the document repository and using its api for some of the tasks?
Regards
Companies possibly interested in this project:
Microsoft
I don't see any other who may consider it. Worst if it is a windows only solution.
isn`t it easier to just build a installer for CouchDB on windows to make it easier to install?
i understand the challenge and satisfaction of building something like that(building a database myself,but for really different needs) but there is already a really good product like CouchDB written in a really good language for the task(except for the IO part,if i am not wrong) so why reinvent the wheel.
after saying that,for educational purposes of teaching how to build something like that to people who aren`t familiar with functional programming,that maybe a good reason.
Simon,
The main reason that Rhino Queues was such a pain to write is the persistence format.
Right now, I have a very easy solution for persistent data, Esent, so I don't think it would be much of a challenge.
About using the doc db for this, you _could_, I just see no reason that you would want to do that.
Did a company or someone step up for Linq to NHibernate? What was the outcome?
Travis,
iMeta have provided a full time developer for 3 months. See here:
groups.google.com.ar/.../5111835e99d9a8e8?hl=en
I see one problem though is using Esent. It's the unawareness of existence and reliability. How could i just use my own preferred database server, eg., I could be interested in using SQL Server or Oracle, which solves lot of other problems like clustering, replication etc.,
Do you think to start this as a OSS project so that we can contribute one or two?
Uriel,
Maybe, but I lack the skills to do so. In addition to that, putting Erlang in the enterprise is not something that goes quickly or easily.
I am running into problems just putting MSMQ into place, because it is an unfamiliar tech to the sys admins. Putting totally new platform has high resistance.
Nuz,
Not if you put me over hot coals and made me watch all the Drag & Drop demos in MSDN.
Fernando,
From concept idea, they are very similar.
From design perspective, there are a significant changes all around because of different design constraints regarding the implementation.
Vadi,
What problem do you have with Esent?
What do you mean, unawareness of exsitence and reliability.
Comment preview