Whitepaper: Couchbase vs RavenDB Performance at Rakuten Kobo

time to read 2 min | 220 words

We just published a white paper on RavenDB performance vs. Couchbase performance in a real customer scenario.

I had to check the results three times before I believed them. RavenDB is pretty awesome, but I had no idea it was that awesome.

The data set was reasonably big, 1.35 billion docs and the scenario we present is a real world one based on production load.

Some of the interesting details:

RavenDB uses 1/3 of the disk space that Couchbase uses, but stores 3 times as much data.
Operationally, RavenDB just worked, Couchbase needed 6 times the hardware to just scrape by. A single failure in Couchbase meant at least 15 – 45 minutes for the node to recover. Inducing failures in RavenDB brought the node back up in a few seconds.
For queries, we pitted a Couchbase cluster with 96 cores and 384 GB RAM against single RavenDB node running on a Raspberry PI. RavenDB on the Pi was able to sustain better latencies at the 99 percentile handling twice as much load as Couchbase is able.

There are all sort of other goodies in the white paper and we went pretty deep into the overall architecture and impact of the difference design decisions.

As usual, we welcome your feedback.

Tweet Share Share 12 comments

Tags:

Comments

09 Apr 2021
14:47 PM

13xforever

That's nice, maybe I'll live the day when their JP store won't throw some kind of error on every page navigation 😹

09 Apr 2021
22:06 PM

Rafal

Congrats, looks like RavenDB is not a couch potato And managed to do the task with almost no overhead in disk usage vs raw data However, i wonder, if the goal was to optimize data structure for quick search of highlights by user id and book id, i think there's still a lot of overhead even in the raw data. 1.35 billion records, assume big numbers and lets take 8 bytes for book Id, 8 bytes for user Id and 4 bytes for position in the book - this gives us 27 GB of data. With binary storage of data and indexes we would fit everything in 64GB. Just put it in bare Voron, or BerkeleyDB and a single laptop would handle hundreds of thousands of queries per second. And you dont need clusters, sharding, caching...

09 Apr 2021
22:11 PM

Oren Eini

Rafal,

The key here isn't the association of user to books, what we were working here was the highlights. There is a sample document there that shows the data.

Yes, you can try to model things in the manner you describe, but then the cost of loading the data for a user request becomes much higher. You'll need to get the book contents (may be big), scan to to relevant location, parse the content, translate markup to text, etc. It is cheaper and easier to do it the other way around.

Especially when you have to do that once per highlights, and some people do a LOT of highlights.

09 Apr 2021
22:11 PM

Oren Eini

Rafal,

Another thing, note that the data wasn't just for the highlights. The dataset include a lot of other details which weren't relevant for this specific benchmark. They were there to show data management for large databases.

09 Apr 2021
23:01 PM

Rafal

Yep, must have oversimplified it. But you know, 'billions of something' looks impressive until you realize that gigabyte is a billion, and even your phone has few GB of RAM. So not everyting with a billion records is necessarily a large database that requires a datacenter (but with a careful choice of database you may well need one)

09 Apr 2021
23:06 PM

Rafal

... which reminds me of recent mention of Parler and their insane data overheads - the couchdb case doesnt look that bad compared to that

11 Apr 2021
14:40 PM

Gabriel

Excellent read. And now go against Mongo and Cosmos DB please...

11 Apr 2021
21:53 PM

Rafal

This is probably a difficult subject - going against competing products while knowing they all serve the same purpose, and all get the job done, and neither is particularly expensive - i would not expect spectacular differences. However, like shown here, if you get disk usage reduction by factor of 2-3, and need half the RAM, maybe half the infrastructure, then its substantial, not spectacular but still worth showing. Spectacular would be for example negating the need for an expensive cluster, reducing number of servers 10-fold, but this is not possible without changing the approach entirely. And IT folk are not that easy to impress - after all they are the IT gurus in companies, the experts and know-it-alls, who made some decisions and need to prove they were right => so anyone coming and announcing 'hey, your database is a slow, bloated, data-losing monstrosity' will be shot immediately or at least called an idiot.

Much better in my opinion is to find a specialization, some niche where your product really solves some problem better than everything else out there, and then it will shine. Not sure if it applies to databases - a very general-purpose tool, but maybe in some particular class of problems, in some specific businesses... NB, there are many specialized, niche products (for example, software for handling medical data) where companies can successfully sell products of inferior quality just because they get a hit on several keywords, have some compliance certificates that technically mean nothing but no one else has them.. not implying that this is the way to go but seems a clever strategy

12 Apr 2021
17:56 PM

Oren Eini

Rafal,

Do note that for real world scenarios, you can run at 8% of the hardware costs ! That is better than your 10-fold scenario.

12 Apr 2021
18:50 PM

Rafal

I admit i didnt parse that information from the article. Pretty bad differences at some points, dont know couchbase at all but maybe there's some configuration problem or it's used in a wrong way for the data? Or the community edition has some speed limit built in?

12 Apr 2021
19:02 PM

Oren Eini

Rafal,

I pinged someone that is quite knowledgeable in how Couchbase works, they didn't find any glaring issues in the way we set up things. We also tested the Enterprise edition, their license limits the detail I can expose, but it isn't a magic fix.

12 Apr 2021
20:11 PM

Rafal

Then i hope they do the right thing at Rakuten :)

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB