Oren Eini

aka Ayende Rahien

Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,575
|
Comments: 51,189

Copyright ©️ Ayende Rahien 2004 — 2025

Privacy Policy · Terms
filter by tags archive
stack view grid view
  • architecture (606) rss
  • bugs (450) rss
  • challanges (123) rss
  • community (377) rss
  • databases (481) rss
  • design (893) rss
  • development (640) rss
  • hibernating-practices (71) rss
  • miscellaneous (592) rss
  • performance (397) rss
  • programming (1085) rss
  • raven (1442) rss
  • ravendb.net (526) rss
  • reviews (184) rss
  • 2025
    • May (7)
    • April (10)
    • March (10)
    • February (7)
    • January (12)
  • 2024
    • December (3)
    • November (2)
    • October (1)
    • September (3)
    • August (5)
    • July (10)
    • June (4)
    • May (6)
    • April (2)
    • March (8)
    • February (2)
    • January (14)
  • 2023
    • December (4)
    • October (4)
    • September (6)
    • August (12)
    • July (5)
    • June (15)
    • May (3)
    • April (11)
    • March (5)
    • February (5)
    • January (8)
  • 2022
    • December (5)
    • November (7)
    • October (7)
    • September (9)
    • August (10)
    • July (15)
    • June (12)
    • May (9)
    • April (14)
    • March (15)
    • February (13)
    • January (16)
  • 2021
    • December (23)
    • November (20)
    • October (16)
    • September (6)
    • August (16)
    • July (11)
    • June (16)
    • May (4)
    • April (10)
    • March (11)
    • February (15)
    • January (14)
  • 2020
    • December (10)
    • November (13)
    • October (15)
    • September (6)
    • August (9)
    • July (9)
    • June (17)
    • May (15)
    • April (14)
    • March (21)
    • February (16)
    • January (13)
  • 2019
    • December (17)
    • November (14)
    • October (16)
    • September (10)
    • August (8)
    • July (16)
    • June (11)
    • May (13)
    • April (18)
    • March (12)
    • February (19)
    • January (23)
  • 2018
    • December (15)
    • November (14)
    • October (19)
    • September (18)
    • August (23)
    • July (20)
    • June (20)
    • May (23)
    • April (15)
    • March (23)
    • February (19)
    • January (23)
  • 2017
    • December (21)
    • November (24)
    • October (22)
    • September (21)
    • August (23)
    • July (21)
    • June (24)
    • May (21)
    • April (21)
    • March (23)
    • February (20)
    • January (23)
  • 2016
    • December (17)
    • November (18)
    • October (22)
    • September (18)
    • August (23)
    • July (22)
    • June (17)
    • May (24)
    • April (16)
    • March (16)
    • February (21)
    • January (21)
  • 2015
    • December (5)
    • November (10)
    • October (9)
    • September (17)
    • August (20)
    • July (17)
    • June (4)
    • May (12)
    • April (9)
    • March (8)
    • February (25)
    • January (17)
  • 2014
    • December (22)
    • November (19)
    • October (21)
    • September (37)
    • August (24)
    • July (23)
    • June (13)
    • May (19)
    • April (24)
    • March (23)
    • February (21)
    • January (24)
  • 2013
    • December (23)
    • November (29)
    • October (27)
    • September (26)
    • August (24)
    • July (24)
    • June (23)
    • May (25)
    • April (26)
    • March (24)
    • February (24)
    • January (21)
  • 2012
    • December (19)
    • November (22)
    • October (27)
    • September (24)
    • August (30)
    • July (23)
    • June (25)
    • May (23)
    • April (25)
    • March (25)
    • February (28)
    • January (24)
  • 2011
    • December (17)
    • November (14)
    • October (24)
    • September (28)
    • August (27)
    • July (30)
    • June (19)
    • May (16)
    • April (30)
    • March (23)
    • February (11)
    • January (26)
  • 2010
    • December (29)
    • November (28)
    • October (35)
    • September (33)
    • August (44)
    • July (17)
    • June (20)
    • May (53)
    • April (29)
    • March (35)
    • February (33)
    • January (36)
  • 2009
    • December (37)
    • November (35)
    • October (53)
    • September (60)
    • August (66)
    • July (29)
    • June (24)
    • May (52)
    • April (63)
    • March (35)
    • February (53)
    • January (50)
  • 2008
    • December (58)
    • November (65)
    • October (46)
    • September (48)
    • August (96)
    • July (87)
    • June (45)
    • May (51)
    • April (52)
    • March (70)
    • February (43)
    • January (49)
  • 2007
    • December (100)
    • November (52)
    • October (109)
    • September (68)
    • August (80)
    • July (56)
    • June (150)
    • May (115)
    • April (73)
    • March (124)
    • February (102)
    • January (68)
  • 2006
    • December (95)
    • November (53)
    • October (120)
    • September (57)
    • August (88)
    • July (54)
    • June (103)
    • May (89)
    • April (84)
    • March (143)
    • February (78)
    • January (64)
  • 2005
    • December (70)
    • November (97)
    • October (91)
    • September (61)
    • August (74)
    • July (92)
    • June (100)
    • May (53)
    • April (42)
    • March (41)
    • February (84)
    • January (31)
  • 2004
    • December (49)
    • November (26)
    • October (26)
    • September (6)
    • April (10)
RavenDB - High-Performance NoSQL Document Database
  previous post next post  
Mar 27 2012

Watch your 6, or is it your I/O?

time to read 1 min | 57 words

One of the interesting things about the freedb dataset is that it is distributed as a 3.1 million separate files, most of them in the 1 – 2 KB range.

Loading that to RavenDB took a while, so I set out to fix that. Care to guess what is the absolutely the first thing that I did?

Tweet Share Share 24 comments
Tags:
  • performance

  previous post next post  

Comments

release candidate
27 Mar 2012
10:07 AM
release candidate

gzip all files and then read the compressed stream at once?

Max
27 Mar 2012
10:07 AM
Max

Run it in the profiler.

csokun
27 Mar 2012
10:09 AM
csokun

merge file

Ryan
27 Mar 2012
10:09 AM
Ryan

Compensate for the seek time somehow? Assuming you were on spinning metal drives

Falhar
27 Mar 2012
10:13 AM
Falhar

I would say to merge the files. Makes it easier for subsequent work.

But this merge itself would take some time. But this is really good place for asynchronous IO.

Duckie
27 Mar 2012
10:25 AM
Duckie

Batching Disable indexing (POST /admin/stopindexing )

Duckie
27 Mar 2012
10:39 AM
Duckie

I found some earlier talk about this in google groups 8 feb. Looking a github, you added Self optimizing on batch-sizes:

https://github.com/ravendb/ravendb/commit/294c2134c5fa7b0b95d0297dfac38cb9ab9acd38

Felice Pollano
27 Mar 2012
11:00 AM
Felice Pollano

My two cents: Use IO completion ports ( async ) for reading the files, but mantain the reading strictly sequential. In the "done" function publish an job and consume it by another ( more than one? thread(s) ) pushing data into the target, this will at least compensate the seek time by doing something useful, and possibly if somethime the push opertaion is delayed ( I guess it can be, but maybe I'm wrong ) you can even keep scanning the hard drive.

flukus
27 Mar 2012
11:38 AM
flukus

Move the files to SSD?

Pure Krome
27 Mar 2012
11:42 AM
Pure Krome

Handball the problem over to Itamar?

Sam
27 Mar 2012
13:07 PM
Sam

+1 for Itamar. Handballing is an extremely efficient operation.

Roger Helliwell
27 Mar 2012
13:16 PM
Roger Helliwell

Like flukus said... or even better, mounted a ram disk and filled it with your 3.1 million files.

Itamar
27 Mar 2012
13:18 PM
Itamar

Yeah, he tried with no luck. I agreed to providing mental support at 1AM instead, though.

:-)

SPATEN
27 Mar 2012
13:40 PM
SPATEN

SSD

Dmitry
27 Mar 2012
14:02 PM
Dmitry

You had done a regular data import and RavenDB did its magic.

Josh
27 Mar 2012
14:11 PM
Josh

Increased the file chunk size to match the NTFS storage format chunk size

Martin
27 Mar 2012
14:42 PM
Martin

Put the kettle on?

Every problem is a magnitude simpler when tackled with a hot cup of coffee.

Bundermuft
27 Mar 2012
16:33 PM
Bundermuft

As you probably already had SSD, you did nothing

Joe
27 Mar 2012
17:57 PM
Joe

Loaded them into RavenFS ;-)

Nick
27 Mar 2012
18:50 PM
Nick

You told the freedb team to get their shit together and clean their mess up?

Bordev
27 Mar 2012
18:55 PM
Bordev

Defrag

Tom Robinson
27 Mar 2012
20:37 PM
Tom Robinson

Disabled your anti-virus?

David Cuccia
27 Mar 2012
21:21 PM
David Cuccia

Memory-map the files?

Harry
28 Mar 2012
08:42 AM
Harry

Enjoyed a cup of coffee

Comment preview

Comments have been closed on this topic.

Markdown formatting

ESC to close

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - - 

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

 

FUTURE POSTS

  1. NOT Sharding RavenDB Vector Search - 8 hours from now
  2. Optimizing the cost of clearing a set - 3 days from now
  3. Scaling HNSW in RavenDB: Optimizing for inadequate hardware - 5 days from now

There are posts all the way to May 14, 2025

RECENT SERIES

  1. RavenDB News (2):
    02 May 2025 - May 2025
  2. Recording (15):
    30 Apr 2025 - Practical AI Integration with RavenDB
  3. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  4. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  5. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
View all series

RECENT COMMENTS

  • ถูกใจ ร้านนี้สุดๆ! อุปกรณ์ จาก inkspa pantip ทำให้ ผลงาน สวย จ้าง เสื้อ ทีม ได้ งาน ลงตัว ราคา คุ้ม ชวน ใคร อยากได้ ดีไซน...
    By ink-spa พันทิป on The null check that didn't check for nulls
  • But in case you have nullability checks enabled (i.e. `<Nullable>enable</Nullable>`), then you'll have a compiler warning on ...
    By Samyon Ristov on The null check that didn't check for nulls
  • Grok wasn't *wrong*. It only said that `_items` can't be null for the condition to evaluate to `true`, but didn't say anythi...
    By Johannes Egger on The null check that didn't check for nulls
  • When I started enabling NRT, I remember I was initially confused when all variables (for reference types) declared with `var`...
    By riccardo on The null check that didn't check for nulls
  • That is surprising - I think of var as a shorthand that does not affect the final result of the compilation. I wouldn't expec...
    By Chris B on The null check that didn't check for nulls

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}