Watch your 6, or is it your I/O?
One of the interesting things about the freedb dataset is that it is distributed as a 3.1 million separate files, most of them in the 1 – 2 KB range.
Loading that to RavenDB took a while, so I set out to fix that. Care to guess what is the absolutely the first thing that I did?
gzip all files and then read the compressed stream at once?
Run it in the profiler.
Compensate for the seek time somehow? Assuming you were on spinning metal drives
I would say to merge the files. Makes it easier for subsequent work.
But this merge itself would take some time. But this is really good place for asynchronous IO.
Batching Disable indexing (POST /admin/stopindexing )
I found some earlier talk about this in google groups 8 feb. Looking a github, you added Self optimizing on batch-sizes:
My two cents: Use IO completion ports ( async ) for reading the files, but mantain the reading strictly sequential. In the "done" function publish an job and consume it by another ( more than one? thread(s) ) pushing data into the target, this will at least compensate the seek time by doing something useful, and possibly if somethime the push opertaion is delayed ( I guess it can be, but maybe I'm wrong ) you can even keep scanning the hard drive.
Move the files to SSD?
Handball the problem over to Itamar?
+1 for Itamar. Handballing is an extremely efficient operation.
Like flukus said... or even better, mounted a ram disk and filled it with your 3.1 million files.
Yeah, he tried with no luck. I agreed to providing mental support at 1AM instead, though.
You had done a regular data import and RavenDB did its magic.
Increased the file chunk size to match the NTFS storage format chunk size
Put the kettle on?
Every problem is a magnitude simpler when tackled with a hot cup of coffee.
As you probably already had SSD, you did nothing
Loaded them into RavenFS ;-)
You told the freedb team to get their shit together and clean their mess up?
Disabled your anti-virus?
Memory-map the files?
Enjoyed a cup of coffee