Oren Eini

aka Ayende Rahien

Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,578
|
Comments: 51,203

Copyright ©️ Ayende Rahien 2004 — 2025

Privacy Policy · Terms
filter by tags archive
stack view grid view
  • architecture (608) rss
  • bugs (450) rss
  • challanges (123) rss
  • community (378) rss
  • databases (481) rss
  • design (894) rss
  • development (640) rss
  • hibernating-practices (71) rss
  • miscellaneous (592) rss
  • performance (397) rss
  • programming (1085) rss
  • raven (1445) rss
  • ravendb.net (529) rss
  • reviews (184) rss
  • 2025
    • May (10)
    • April (10)
    • March (10)
    • February (7)
    • January (12)
  • 2024
    • December (3)
    • November (2)
    • October (1)
    • September (3)
    • August (5)
    • July (10)
    • June (4)
    • May (6)
    • April (2)
    • March (8)
    • February (2)
    • January (14)
  • 2023
    • December (4)
    • October (4)
    • September (6)
    • August (12)
    • July (5)
    • June (15)
    • May (3)
    • April (11)
    • March (5)
    • February (5)
    • January (8)
  • 2022
    • December (5)
    • November (7)
    • October (7)
    • September (9)
    • August (10)
    • July (15)
    • June (12)
    • May (9)
    • April (14)
    • March (15)
    • February (13)
    • January (16)
  • 2021
    • December (23)
    • November (20)
    • October (16)
    • September (6)
    • August (16)
    • July (11)
    • June (16)
    • May (4)
    • April (10)
    • March (11)
    • February (15)
    • January (14)
  • 2020
    • December (10)
    • November (13)
    • October (15)
    • September (6)
    • August (9)
    • July (9)
    • June (17)
    • May (15)
    • April (14)
    • March (21)
    • February (16)
    • January (13)
  • 2019
    • December (17)
    • November (14)
    • October (16)
    • September (10)
    • August (8)
    • July (16)
    • June (11)
    • May (13)
    • April (18)
    • March (12)
    • February (19)
    • January (23)
  • 2018
    • December (15)
    • November (14)
    • October (19)
    • September (18)
    • August (23)
    • July (20)
    • June (20)
    • May (23)
    • April (15)
    • March (23)
    • February (19)
    • January (23)
  • 2017
    • December (21)
    • November (24)
    • October (22)
    • September (21)
    • August (23)
    • July (21)
    • June (24)
    • May (21)
    • April (21)
    • March (23)
    • February (20)
    • January (23)
  • 2016
    • December (17)
    • November (18)
    • October (22)
    • September (18)
    • August (23)
    • July (22)
    • June (17)
    • May (24)
    • April (16)
    • March (16)
    • February (21)
    • January (21)
  • 2015
    • December (5)
    • November (10)
    • October (9)
    • September (17)
    • August (20)
    • July (17)
    • June (4)
    • May (12)
    • April (9)
    • March (8)
    • February (25)
    • January (17)
  • 2014
    • December (22)
    • November (19)
    • October (21)
    • September (37)
    • August (24)
    • July (23)
    • June (13)
    • May (19)
    • April (24)
    • March (23)
    • February (21)
    • January (24)
  • 2013
    • December (23)
    • November (29)
    • October (27)
    • September (26)
    • August (24)
    • July (24)
    • June (23)
    • May (25)
    • April (26)
    • March (24)
    • February (24)
    • January (21)
  • 2012
    • December (19)
    • November (22)
    • October (27)
    • September (24)
    • August (30)
    • July (23)
    • June (25)
    • May (23)
    • April (25)
    • March (25)
    • February (28)
    • January (24)
  • 2011
    • December (17)
    • November (14)
    • October (24)
    • September (28)
    • August (27)
    • July (30)
    • June (19)
    • May (16)
    • April (30)
    • March (23)
    • February (11)
    • January (26)
  • 2010
    • December (29)
    • November (28)
    • October (35)
    • September (33)
    • August (44)
    • July (17)
    • June (20)
    • May (53)
    • April (29)
    • March (35)
    • February (33)
    • January (36)
  • 2009
    • December (37)
    • November (35)
    • October (53)
    • September (60)
    • August (66)
    • July (29)
    • June (24)
    • May (52)
    • April (63)
    • March (35)
    • February (53)
    • January (50)
  • 2008
    • December (58)
    • November (65)
    • October (46)
    • September (48)
    • August (96)
    • July (87)
    • June (45)
    • May (51)
    • April (52)
    • March (70)
    • February (43)
    • January (49)
  • 2007
    • December (100)
    • November (52)
    • October (109)
    • September (68)
    • August (80)
    • July (56)
    • June (150)
    • May (115)
    • April (73)
    • March (124)
    • February (102)
    • January (68)
  • 2006
    • December (95)
    • November (53)
    • October (120)
    • September (57)
    • August (88)
    • July (54)
    • June (103)
    • May (89)
    • April (84)
    • March (143)
    • February (78)
    • January (64)
  • 2005
    • December (70)
    • November (97)
    • October (91)
    • September (61)
    • August (74)
    • July (92)
    • June (100)
    • May (53)
    • April (42)
    • March (41)
    • February (84)
    • January (31)
  • 2004
    • December (49)
    • November (26)
    • October (26)
    • September (6)
    • April (10)
Comparison page for RavenDB and MongoDB
  previous post next post  
Mar 27 2012

Watch your 6, or is it your I/O?

time to read 1 min | 57 words

One of the interesting things about the freedb dataset is that it is distributed as a 3.1 million separate files, most of them in the 1 – 2 KB range.

Loading that to RavenDB took a while, so I set out to fix that. Care to guess what is the absolutely the first thing that I did?

Tweet Share Share 24 comments
Tags:
  • performance

  previous post next post  

Comments

release candidate
27 Mar 2012
10:07 AM
release candidate

gzip all files and then read the compressed stream at once?

Max
27 Mar 2012
10:07 AM
Max

Run it in the profiler.

csokun
27 Mar 2012
10:09 AM
csokun

merge file

Ryan
27 Mar 2012
10:09 AM
Ryan

Compensate for the seek time somehow? Assuming you were on spinning metal drives

Falhar
27 Mar 2012
10:13 AM
Falhar

I would say to merge the files. Makes it easier for subsequent work.

But this merge itself would take some time. But this is really good place for asynchronous IO.

Duckie
27 Mar 2012
10:25 AM
Duckie

Batching Disable indexing (POST /admin/stopindexing )

Duckie
27 Mar 2012
10:39 AM
Duckie

I found some earlier talk about this in google groups 8 feb. Looking a github, you added Self optimizing on batch-sizes:

https://github.com/ravendb/ravendb/commit/294c2134c5fa7b0b95d0297dfac38cb9ab9acd38

Felice Pollano
27 Mar 2012
11:00 AM
Felice Pollano

My two cents: Use IO completion ports ( async ) for reading the files, but mantain the reading strictly sequential. In the "done" function publish an job and consume it by another ( more than one? thread(s) ) pushing data into the target, this will at least compensate the seek time by doing something useful, and possibly if somethime the push opertaion is delayed ( I guess it can be, but maybe I'm wrong ) you can even keep scanning the hard drive.

flukus
27 Mar 2012
11:38 AM
flukus

Move the files to SSD?

Pure Krome
27 Mar 2012
11:42 AM
Pure Krome

Handball the problem over to Itamar?

Sam
27 Mar 2012
13:07 PM
Sam

+1 for Itamar. Handballing is an extremely efficient operation.

Roger Helliwell
27 Mar 2012
13:16 PM
Roger Helliwell

Like flukus said... or even better, mounted a ram disk and filled it with your 3.1 million files.

Itamar
27 Mar 2012
13:18 PM
Itamar

Yeah, he tried with no luck. I agreed to providing mental support at 1AM instead, though.

:-)

SPATEN
27 Mar 2012
13:40 PM
SPATEN

SSD

Dmitry
27 Mar 2012
14:02 PM
Dmitry

You had done a regular data import and RavenDB did its magic.

Josh
27 Mar 2012
14:11 PM
Josh

Increased the file chunk size to match the NTFS storage format chunk size

Martin
27 Mar 2012
14:42 PM
Martin

Put the kettle on?

Every problem is a magnitude simpler when tackled with a hot cup of coffee.

Bundermuft
27 Mar 2012
16:33 PM
Bundermuft

As you probably already had SSD, you did nothing

Joe
27 Mar 2012
17:57 PM
Joe

Loaded them into RavenFS ;-)

Nick
27 Mar 2012
18:50 PM
Nick

You told the freedb team to get their shit together and clean their mess up?

Bordev
27 Mar 2012
18:55 PM
Bordev

Defrag

Tom Robinson
27 Mar 2012
20:37 PM
Tom Robinson

Disabled your anti-virus?

David Cuccia
27 Mar 2012
21:21 PM
David Cuccia

Memory-map the files?

Harry
28 Mar 2012
08:42 AM
Harry

Enjoyed a cup of coffee

Comment preview

Comments have been closed on this topic.

Markdown formatting

ESC to close

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - - 

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

 

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  2. Webinar (6):
    27 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  3. RavenDB News (2):
    02 May 2025 - May 2025
  4. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  5. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
View all series

RECENT COMMENTS

  • What a massive presentation! As a person who spent some time with a db written in .NET I can strongly relate to some points. ...
    By Scooletz on Recording: RavenDB's Upcoming Optimizations Deep Dive
  • I’d love to learn your thoughts on SPANN https://arxiv.org/abs/2111.08566 that with centroids and keeping the posting lists s...
    By Scooletz on Comparing DiskANN in SQL Server & HNSW in RavenDB
  • Joel, The DiskANN paper talks about it being viable for more than a billion vectors datasets.  In such a scenario, it would ...
    By Oren Eini on Comparing DiskANN in SQL Server & HNSW in RavenDB
  • Do you know why they chose DiskANN? These things are usually about tradeoffs but it seems DiskANN is just worse in every way.
    By Joel on Comparing DiskANN in SQL Server & HNSW in RavenDB
  • Scooletz, Yes, we look at other stuff. Most of those are _not_ allowing you to build themselves incrementally nor are they...
    By Oren Eini on Scaling HNSW in RavenDB: Optimizing for inadequate hardware

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}