Designing a document databaseAttachments
In a previous post, I asked about designing a document DB, and brought up the issue of attachments, along with a set of questions that needs to be handled:
- Do we allow them at all?
We pretty much have to, otherwise we will have the users sticking them into the document directly, resulting in very inefficient use of space (binaries in Json format sucks).
- How are they stored?
- In the DB?
- Outside the DB?
Storing them in the DB will lead to very high database sizes. And there is the simple question if a Document DB is the appropriate storage for BLOBs. I think that there are better alternatives for that than the Document DB. Things like Rhino DHT, S3, the file system, CDN, etc.
- Are they replicated?
Out of scope for the document db, I am afraid. That depend on the external storage that you wish for.
- Should we even care about them at all? Can we apply SoC and say that this is the task of some other part of the system?
Yes we can and we should.
However, we still want to be able to add attachments to documents. I think we can resolve them pretty easily by adding the notion of a document attributes. That would allow us to add external information to a document, such as the attachment URLs. Those should be used for things that are related to the actual document, but are conceptually separated from it.
An attribute would be a typed key/value pair, where both key and value contains strings. The type is an additional piece of information, containing the type of the attribute. This will allow to do things like add relations, specify attachment types, etc.
More posts in "Designing a document database" series:
- (17 Mar 2009) What next?
- (16 Mar 2009) Remote API & Public API
- (16 Mar 2009) Looking at views
- (15 Mar 2009) View syntax
- (14 Mar 2009) Aggregation Recalculating
- (13 Mar 2009) Aggregation
- (12 Mar 2009) Views
- (11 Mar 2009) Replication
- (11 Mar 2009) Attachments
- (10 Mar 2009) Authorization
- (10 Mar 2009) Concurrency
- (10 Mar 2009) Scale
- (10 Mar 2009) Storage
Comments
How about - msdn.microsoft.com/en-us/library/cc949109.aspx
You do realize that I am not going to use SQL Server as the backend, right?
Do you want to support change tracking or version info for the attachments? Or allow multiple attachments per document? Both those are design considerations. For simplicity, I'd assume a user has the same rights on the attachment as the document, but you may think otherwise.
Multiple attachments, certainly.
Version info? I don't think so.
Permission might be an interesting problem if we store this externally
I certainly recommend the external storage approach. worked out well for stuff I've done in the past.
Versioning attachments is easy if you track not only versions of the document body, but also its metadata and treat the attachment files as immutable.
Comment preview