Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 72

filter by tags archive

Designing a document databaseReplication

time to read 3 min | 454 words

In a previous post, I asked about designing a document DB, and brought up the issue of replication, along with a set of questions that effect the design of the system:

  • How often should we replicate?
    • As part of the transaction?
    • Backend process?
    • Every X amount of time?
    • Manual?

I think that we can assume that the faster we replicate, the better it is. However, there are cost associated with this. I think that a good way of doing replication would be to post a message on a queue for the remote replication machine, and have the queuing system handle the actual process. This make it very simple to scale, and create a distinction between the “start replication” part an the actual replication process. It also allow us to handle spikes in a very nice manner.

  • Should we replicate only the documents?
    • What about attachments?
    • What about the generated view data?

We don’t replicate attachments, since those are out of scope.

Generated view data is a more complex issue. Mostly because we have a trade off here, of network payload vs. cpu time. Since views are by their very nature stateless (they can only use the document data), running the view on source machine or the replicated machine would result in exactly the same output. I think that we can safely ignore the view data, treating this as something that we can regenerate. CPU time tend to be far less costly than network bandwidth, after all.

Note that this assumes that view generation is the same across all machines. We discuss this topic more extensively in the views part.

  • Should we replicate to all machines?
    • To specified set of machines for all documents?
    • Should we use some sharding algorithm?

I think that a sharding algorithm would be the best option, given a document, it will give a list of machine to replicate to. We can provide a default implementation that replicate to all machines or to secondary and tertiaries.

More posts in "Designing a document database" series:

  1. (17 Mar 2009) What next?
  2. (16 Mar 2009) Remote API & Public API
  3. (16 Mar 2009) Looking at views
  4. (15 Mar 2009) View syntax
  5. (14 Mar 2009) Aggregation Recalculating
  6. (13 Mar 2009) Aggregation
  7. (12 Mar 2009) Views
  8. (11 Mar 2009) Replication
  9. (11 Mar 2009) Attachments
  10. (10 Mar 2009) Authorization
  11. (10 Mar 2009) Concurrency
  12. (10 Mar 2009) Scale
  13. (10 Mar 2009) Storage

Comments

Josen

Ayende,

What’s your opinion about an RDF store / SPARQL interface (like the Jena Framework) for a system like the one you are describing?

Thank you for your articles.

Ayende Rahien

Seems like implementing this would be really hard.

Rafal

How to do replication offline and keep the system interface transactional? Some centralized version control registry?

Josh Robb

I can't help but think that a view would be a really nice way of specifying a (flexible) sharding algorithm. This setup could potentially allow dynamic re-sharding as well.

Bomerang

Shading algorithm is the best solution out there

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 3 hours from now
  2. Production postmortem: The industry at large - about one day from now
  3. The insidious cost of allocations - 2 days from now
  4. Buffer allocation strategies: A possible solution - 5 days from now
  5. Buffer allocation strategies: Explaining the solution - 6 days from now

And 3 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    01 Sep 2015 - The case of the lying configuration file
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats