Ayende @ Rahien

Refunds available at head office

Watch your 6, or is it your I/O?

One of the interesting things about the freedb dataset is that it is distributed as a 3.1 million separate files, most of them in the 1 – 2 KB range.

Loading that to RavenDB took a while, so I set out to fix that. Care to guess what is the absolutely the first thing that I did?

Comments

release candidate
03/27/2012 10:07 AM by
release candidate

gzip all files and then read the compressed stream at once?

Max
03/27/2012 10:07 AM by
Max

Run it in the profiler.

csokun
03/27/2012 10:09 AM by
csokun

merge file

Ryan
03/27/2012 10:09 AM by
Ryan

Compensate for the seek time somehow? Assuming you were on spinning metal drives

Falhar
03/27/2012 10:13 AM by
Falhar

I would say to merge the files. Makes it easier for subsequent work.

But this merge itself would take some time. But this is really good place for asynchronous IO.

Duckie
03/27/2012 10:25 AM by
Duckie

Batching Disable indexing (POST /admin/stopindexing )

Duckie
03/27/2012 10:39 AM by
Duckie

I found some earlier talk about this in google groups 8 feb. Looking a github, you added Self optimizing on batch-sizes:

https://github.com/ravendb/ravendb/commit/294c2134c5fa7b0b95d0297dfac38cb9ab9acd38

Felice Pollano
03/27/2012 11:00 AM by
Felice Pollano

My two cents: Use IO completion ports ( async ) for reading the files, but mantain the reading strictly sequential. In the "done" function publish an job and consume it by another ( more than one? thread(s) ) pushing data into the target, this will at least compensate the seek time by doing something useful, and possibly if somethime the push opertaion is delayed ( I guess it can be, but maybe I'm wrong ) you can even keep scanning the hard drive.

flukus
03/27/2012 11:38 AM by
flukus

Move the files to SSD?

Pure Krome
03/27/2012 11:42 AM by
Pure Krome

Handball the problem over to Itamar?

Sam
03/27/2012 01:07 PM by
Sam

+1 for Itamar. Handballing is an extremely efficient operation.

Roger Helliwell
03/27/2012 01:16 PM by
Roger Helliwell

Like flukus said... or even better, mounted a ram disk and filled it with your 3.1 million files.

Itamar
03/27/2012 01:18 PM by
Itamar

Yeah, he tried with no luck. I agreed to providing mental support at 1AM instead, though.

:-)

SPATEN
03/27/2012 01:40 PM by
SPATEN

SSD

Dmitry
03/27/2012 02:02 PM by
Dmitry

You had done a regular data import and RavenDB did its magic.

Josh
03/27/2012 02:11 PM by
Josh

Increased the file chunk size to match the NTFS storage format chunk size

Martin
03/27/2012 02:42 PM by
Martin

Put the kettle on?

Every problem is a magnitude simpler when tackled with a hot cup of coffee.

Bundermuft
03/27/2012 04:33 PM by
Bundermuft

As you probably already had SSD, you did nothing

Joe
03/27/2012 05:57 PM by
Joe

Loaded them into RavenFS ;-)

Nick
03/27/2012 06:50 PM by
Nick

You told the freedb team to get their shit together and clean their mess up?

Bordev
03/27/2012 06:55 PM by
Bordev

Defrag

Tom Robinson
03/27/2012 08:37 PM by
Tom Robinson

Disabled your anti-virus?

Harry
03/28/2012 08:42 AM by
Harry

Enjoyed a cup of coffee

Comments have been closed on this topic.