Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

@

Posts: 5,949 | Comments: 44,546

filter by tags archive

Designing a document databaseAttachments


In a previous post, I asked about designing a document DB, and brought up the issue of attachments, along with a set of questions that needs to be handled:

  • Do we allow them at all?

We pretty much have to, otherwise we will have the users sticking them into the document directly, resulting in very inefficient use of space (binaries in Json format sucks).

  • How are they stored?
    • In the DB?
    • Outside the DB?

Storing them in the DB will lead to very high database sizes. And there is the simple question if a Document DB is the appropriate storage for BLOBs. I think that there are better alternatives for that than the Document DB. Things like Rhino DHT, S3, the file system, CDN, etc.

  • Are they replicated?

Out of scope for the document db, I am afraid. That depend on the external storage that you wish for.

  • Should we even care about them at all? Can we apply SoC and say that this is the task of some other part of the system?

Yes we can and we should.

However, we still want to be able to add attachments to documents. I think we can resolve them pretty easily by adding the notion of a document attributes. That would allow us to add external information to a document, such as the attachment URLs. Those should be used for things that are related to the actual document, but are conceptually separated from it.

An attribute would be a typed key/value pair, where both key and value contains strings. The type is an additional piece of information, containing the type of the attribute. This will allow to do things like add relations, specify attachment types, etc.

More posts in "Designing a document database" series:

  1. (17 Mar 2009) What next?
  2. (16 Mar 2009) Remote API & Public API
  3. (16 Mar 2009) Looking at views
  4. (15 Mar 2009) View syntax
  5. (14 Mar 2009) Aggregation Recalculating
  6. (13 Mar 2009) Aggregation
  7. (12 Mar 2009) Views
  8. (11 Mar 2009) Replication
  9. (11 Mar 2009) Attachments
  10. (10 Mar 2009) Authorization
  11. (10 Mar 2009) Concurrency
  12. (10 Mar 2009) Scale
  13. (10 Mar 2009) Storage

Comments

Ayende Rahien

You do realize that I am not going to use SQL Server as the backend, right?

josh

Do you want to support change tracking or version info for the attachments? Or allow multiple attachments per document? Both those are design considerations. For simplicity, I'd assume a user has the same rights on the attachment as the document, but you may think otherwise.

Ayende Rahien

Multiple attachments, certainly.

Version info? I don't think so.

Permission might be an interesting problem if we store this externally

josh

I certainly recommend the external storage approach. worked out well for stuff I've done in the past.

Rafal

Versioning attachments is easy if you track not only versions of the document body, but also its metadata and treat the attachment files as immutable.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. The RavenDB Comic Strip: Part III – High availability & sleeping soundly - 9 hours from now

There are posts all the way to May 28, 2015

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    20 May 2015 - Part II – a team in trouble!
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats