Ayende @ Rahien

It's a girl

Designing a document database: Attachments

In a previous post, I asked about designing a document DB, and brought up the issue of attachments, along with a set of questions that needs to be handled:

  • Do we allow them at all?

We pretty much have to, otherwise we will have the users sticking them into the document directly, resulting in very inefficient use of space (binaries in Json format sucks).

  • How are they stored?
    • In the DB?
    • Outside the DB?

Storing them in the DB will lead to very high database sizes. And there is the simple question if a Document DB is the appropriate storage for BLOBs. I think that there are better alternatives for that than the Document DB. Things like Rhino DHT, S3, the file system, CDN, etc.

  • Are they replicated?

Out of scope for the document db, I am afraid. That depend on the external storage that you wish for.

  • Should we even care about them at all? Can we apply SoC and say that this is the task of some other part of the system?

Yes we can and we should.

However, we still want to be able to add attachments to documents. I think we can resolve them pretty easily by adding the notion of a document attributes. That would allow us to add external information to a document, such as the attachment URLs. Those should be used for things that are related to the actual document, but are conceptually separated from it.

An attribute would be a typed key/value pair, where both key and value contains strings. The type is an additional piece of information, containing the type of the attribute. This will allow to do things like add relations, specify attachment types, etc.

Comments

Ayende Rahien
03/11/2009 08:33 AM by
Ayende Rahien

You do realize that I am not going to use SQL Server as the backend, right?

josh
03/11/2009 04:19 PM by
josh

Do you want to support change tracking or version info for the attachments? Or allow multiple attachments per document? Both those are design considerations. For simplicity, I'd assume a user has the same rights on the attachment as the document, but you may think otherwise.

Ayende Rahien
03/11/2009 04:22 PM by
Ayende Rahien

Multiple attachments, certainly.

Version info? I don't think so.

Permission might be an interesting problem if we store this externally

josh
03/11/2009 04:27 PM by
josh

I certainly recommend the external storage approach. worked out well for stuff I've done in the past.

Rafal
03/11/2009 06:46 PM by
Rafal

Versioning attachments is easy if you track not only versions of the document body, but also its metadata and treat the attachment files as immutable.

Comments have been closed on this topic.