Memory Mapped Files, File I/O & Performance

time to read 4 min | 695 words

I have been testing out several approaches for writing out to files. And I thought that the results are interesting enough to share. In all cases, I was writing a 128Kb buffer of random data to a file with size of 256Mb.

The first thing that I wanted to try was the trivial managed memory map approach:

using (var mmf = MemoryMappedFile.CreateFromFile("test.bin", FileMode.Create, "test", 1024*1024*256))
{
    using (var accessor = mmf.CreateViewAccessor())
    {
        for (int i = 0; i < accessor.Capacity; i += buffer.Length)
        {
            accessor.WriteArray(i, buffer, 0, buffer.Length);
        }
        accessor.Flush();
    }
}

This completed in 3.871 seconds.

Next, I Wanted to see what would happen if I were using direct memory access, and used CopyMemory to do that:

[DllImport("kernel32.dll", EntryPoint = "RtlMoveMemory")]
static extern void CopyMemory(byte* dst, byte* src, long size);

using (var mmf = MemoryMappedFile.CreateFromFile("test.bin", FileMode.Create, "test", 1024*1024*256))
{
    using (var accessor = mmf.CreateViewAccessor())
    {
        byte* p = null;
        accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref p);
        fixed (byte* src = buffer)
        {
            for (int i = 0; i < accessor.Capacity; i += buffer.Length)
            {
                CopyMemory(p + i, src, buffer.Length);
            }
        }
        accessor.SafeMemoryMappedViewHandle.ReleasePointer();
        accessor.Flush();
    }
}

As you can see, this is somewhat more complex, and require unsafe code. But this completed in 2.062 seconds. Nearly twice as fast.

Then I decided to try with raw file IO:

using (var f = new FileStream("test.bin",FileMode.Create))
{
    f.SetLength(1024*1024*256);
    for (int i = 0; i < f.Length; i += buffer.Length)
    {
        f.Write(buffer, 0, buffer.Length);
    }
    f.Flush(true);
}

This is about the most trivial code that you can think of, and this completed in about 1.956 seconds. Slightly faster, but within the margin of error (note, in repeated tests, they were consistently very close, and the file I/O was very near).

So, in other words, the accessor code adds a lot of overhead when using Memory Mapped Files.

Tweet Share Share 15 comments

Tags:

Comments

23 Aug 2013
11:29 AM

Rafal

Interesting discovery ;) But are you sure the last example is 'raw io' and not 'buffered IO'? You've done a sequential write which is probably faster than random write that only happens to be sequential in your test case.

BTW C-style pointers in C# are ugly. They just don't look good mixed with LongCamelCased.ClassAndAPISymbols. In C# it would be nicer to declare pointers like:

RawPointer<byte> ptr = something

23 Aug 2013
11:37 AM

tobi

I'd like to find out where the time was spent in each test. For example, does creating the file and the view take time or is it almost instant?

I'm also surprised that copying 256mb of memory can take 2s, no matter how many intermediate copies are being done. Memory can be accessed sequentially at >=10gb/s as far as I'm informed.

23 Aug 2013
11:40 AM

tobi

Use Reflector to see what WriteArray does. It is not a memcpy, but a generic function. That must cause the extreme CPU usage. I'd try using a stream on the MMF to write byte[]'s.

I'd not trust the numbers until I'd have seen the profiler results. You might end up measuring stuff you don't care about.

23 Aug 2013
12:02 PM

wiso

Using mmf.CreateViewStream gives you same performance as raw IO:

using (var mmf = MemoryMappedFile.CreateFromFile("test.bin", FileMode.Create, "test", 1024 * 1024 * 256)) { using (var mmvs = mmf.CreateViewStream(0, 0 /* 0 == create a complete view */, MemoryMappedFileAccess.Write)) { for (int i = 0; i < mmvs.Length; i += buffer.Length) { mmvs.Write(buffer, 0, buffer.Length); } mmvs.Flush(); } }

23 Aug 2013
12:08 PM

Ayende Rahien

Rafal, Yes, this is buffered IO, but note that I called Flush(true), and included that in the cost of doing this.

23 Aug 2013
12:08 PM

Ayende Rahien

Tobi, Note that we include the time to flush this to disk.

23 Aug 2013
12:38 PM

Scooletz

Ayende, I hope you run this test under .NET 4.5. In 4.0 it may not flush always http://connect.microsoft.com/VisualStudio/feedback/details/792434/flush-true-does-not-always-flush-when-it-should

Rafal, sometimes when doing externs to unmanaged functions, I wrap pointers with a struct with LayoutKind.Sequential. That's make my wraps a bit safer. How about it?

23 Aug 2013
12:57 PM

Ayende Rahien

Scooletz, Yes, that was run under 4.5, I am aware of that bug.

23 Aug 2013
16:18 PM

Guest

I don't understand. Last code snippet uses memory mapped files? If not then what sense would memory mapped files have if they are 2 times slower?

23 Aug 2013
16:35 PM

Ayende Rahien

Guest, The last code snippet didn't use mmap files. It was the control test.

23 Aug 2013
19:55 PM

alex

It appears that the memory mapped scenarios do not do an "fsync" whereas the file based control test does. This may make a significant difference especially if you try writing to files a lot larger than 256 MB.

Also, I would expect that in your intended usage scenario, flushes/fsyncs would be more frequent than every 256 MB, which - especially in combination with large (multiple GB) files - will have a significant effect on performance depending on what kind of I/O strategy you use.

24 Aug 2013
20:46 PM

Rafal

@alex There's accessor.Flush() which writes the modified pages to disk, so all examples are fsynced. However, I wonder how 'Voron' will handle btree page modifications - will it fsync after every update operation?

24 Aug 2013
22:35 PM

alex

@Rafal accessor.Flush() does not perform an "fsync", it calls into MemoryMappedView.Flush() which in turn calls "FlushViewOfFile", not the same as an "fsync". See also http://msdn.microsoft.com/en-us/library/windows/apps/aa366563(v=vs.85).aspx.

"The FlushViewOfFile function does not flush the file metadata, and it does not wait to return until the changes are flushed from the underlying hardware disk cache and physically written to disk. To flush all the dirty pages plus the metadata for the file and ensure that they are physically written to disk, call FlushViewOfFile and then call the FlushFileBuffers function."

25 Aug 2013
04:55 AM

Ayende Rahien

Alex, You are correct, except that in both cases, we also close the file handle, which will do the flushing for us, so it is the same thing, effectively.

25 Aug 2013
15:41 PM

alex

@Ayende, as far as I am aware, closing the file handle will not cause the drive's caches to be flushed (i.e. it will not issue an "fsync" command to the device: "SYNCHRONIZE CACHE" for SCSI, "FLUSH CACHE" for IDE/ATAPI). Since on an average consumer PC, these drive caches may be as large as 8 MB and on more high end systems 16 MB, that represents the amount of data that is potentially at risk.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB