Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,972 | Comments: 44,523

filter by tags archive

Is select() broken? Memory mapped files with unbufferred writes == race condition?


Let me start this post by stating that I am not even sure if what I am trying to do is legal here. But from reading the docs, it does appear to be a valid use of the API, and it does work, most of the time.

The full code can be found here: https://gist.github.com/ayende/7495987

The gist of it is that I am trying to do two things:

  • Write to a file opened with FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING.
  • Read from this file using a memory map.
  • Occasionally, I get into situations where after I wrote to the file, I am not reading what I wrote.

I have a repro, and we reproduced this on multiple machines. Both Windows 7 and Windows 8.

Here is the relevant code (the full code is in the link), explanation on it below:

   1: const uint nNumberOfBytesToWrite = 4096*3;
   2: var buffer = (byte*)(VirtualAlloc(IntPtr.Zero, new UIntPtr(nNumberOfBytesToWrite), AllocationType.COMMIT, MemoryProtection.READWRITE)
   3:             .ToPointer());
   4:  
   5: for (int i = 0; i < nNumberOfBytesToWrite; i++)
   6: {
   7:     *(buffer + i) = 137;
   8: }
   9:  
  10: var g = Guid.NewGuid().ToString();
  11:  
  12: var safeHandle = CreateFile(g,
  13:     NativeFileAccess.GenericRead | NativeFileAccess.GenericWrite,
  14:     NativeFileShare.Read, IntPtr.Zero,
  15:     NativeFileCreationDisposition.OpenAlways,
  16:     NativeFileAttributes.Write_Through | NativeFileAttributes.NoBuffering | NativeFileAttributes.DeleteOnClose,
  17:     IntPtr.Zero);
  18:  
  19: var fileStream = new FileStream(safeHandle, FileAccess.ReadWrite);
  20: fileStream.SetLength(1024 * 1024 * 1024); // 1gb
  21:  
  22: if (safeHandle.IsInvalid)
  23: {
  24:     throw new Win32Exception();
  25: }
  26:  
  27: FileStream mms = fileStream;
  28: //mms = new FileStream(g, FileMode.Open, FileAccess.Read, FileShare.ReadWrite | FileShare.Delete);
  29: var mmf = MemoryMappedFile.CreateFromFile(mms, Guid.NewGuid().ToString(), fileStream.Length,
  30:     MemoryMappedFileAccess.Read, null, HandleInheritability.None, true);
  31:  
  32: MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(0, fileStream.Length, MemoryMappedFileAccess.Read);
  33: byte* ptr = null;
  34: accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref ptr);
  35:  
  36: Task.Factory.StartNew(() =>
  37: {
  38:     long lastPos = 0;
  39:     while (true)
  40:     {
  41:         int count = 0;
  42:         while (true)
  43:         {
  44:             if (*(ptr + lastPos) != 137)
  45:             {
  46:                 break;
  47:             }
  48:             lastPos += 4096;
  49:             count ++;
  50:         }
  51:         Console.WriteLine();
  52:         Console.WriteLine("Verified {0} MB", count * 4 / 1024);
  53:         Console.WriteLine();
  54:         Thread.Sleep(2000);
  55:     }
  56: });
  57:  
  58: for (int i = 0; i < 1024*64; i++)
  59: {
  60:     var pos = i*nNumberOfBytesToWrite;
  61:     if (i%100 == 0)
  62:         Console.Write("\r{0,10:#,#} kb", pos/1024);
  63:     var nativeOverlapped = new NativeOverlapped
  64:     {
  65:         OffsetHigh = 0,
  66:         OffsetLow = (int) pos
  67:     };
  68:  
  69:     uint written;
  70:     if (WriteFile(safeHandle, buffer, nNumberOfBytesToWrite, out written, &nativeOverlapped) == false)
  71:         throw new Win32Exception();
  72:  
  73:     for (int j = 0; j < 3; j++)
  74:     {
  75:         if (*(ptr + pos) != 137)
  76:         {
  77:             throw new Exception("WTF?!");
  78:         }
  79:         pos += 4096;
  80:     }
  81: }

This code is doing the following:

  • We setup a file handle using NoBuffering | Write_Through, and we also map the file using memory map.
  • We write 3 pages (12Kb) at a time to the file.
  • After the write, we are using memory map to verify that we actually wrote what we wanted to the file.
  • _At the same time_ we are reading from the same memory in another thread.
  • Occasionally, we get an error where the data we just wrote to the file cannot be read back.

Now, here is what I think is actually happening:

  • When we do an unbuffered write, Windows has to mark the relevant pages as invalid.
  • I _think_ that it does so before it actually perform the write.
  • If you have another thread that access that particular range of memory at the same time, it can load the _previously_ written data.
  • The WriteFile actually perform the write, but the pages that map to that portion of the file have already been marked as loaded.
  • At that point, when we use the memory mapped pointer to access the data, we get the data that was there before the write.

As I said, the code above can reproduce this issue (you might have to run it multiple times).

I am not sure if this is something that is valid issue or just me misusing the code. The docs are pretty clear about using regular i/o & memory mapped i/o. The OS is responsible to keeping them coherent with respect to one another. However, that is certainly not the case here.

It might be that I am using a single handle for both, and Windows does less checking when that happens? For what it is worth, I have also tried it using different handles, and I don’t see the problem in the code above, but I have a more complex scenario where I do see the same issue.

Of course, FILE_FLAG_OVERLAPPED is not specified, so what I would actually expect here is serialization of the I/O, according to the docs. But mostly I need a sanity check to tell me if I am crazy.

Don’t play peekaboo with support, damn it!


One of the things that we really pride ourselves with Hibernating Rhinos is the level of support. Just to give you some idea, today & yesterday we had core team members on the phone with people (not customers, yes) who are having problems with RavenDB for quite some time.

Now, I understand that you may not always have the information to give, but what you have, give me! So I can help you.

From a recent exchange in the mailing list:

var clickCount = session.Query<TrackerRequest>().Where(t => t.TrackerCreated == doc.Created).Where(t => t.Type == Type.Click).Count();

This gives:

"Non-static method requires a target"

To which I replied:

What is the full error that you get? Compile time? Runtime?

The answer I got:

This is a document I'm trying to 'Count':

{

  "Created": "2012-11-15T16:12:42.1775747",

  "IP": "127.0.0.1",

  "TrackerCreated": "2012-11-15T14:12:16.3951000Z",

  "Referrer": "http://example.com",

  "Type": "Click"

}

Raven terminal gives:

Request # 172: GET     -     3 ms - <system>   - 200 - /indexes/Raven/DocumentsByEntityName?query=Tag%253ATrackerRequests&start=0&pageSize=30&aggregation=None&noCache=-1129797484

        Query: Tag:TrackerRequests

        Time: 2 ms

        Index: Raven/DocumentsByEntityName

        Results: 3 returned out of 3 total.

By the way, you might note that this ISN’T related in any way to his issue. This query (and document) were gotten from the Studio. I can tell by the URL.

Then there was this:

image

I mean, seriously, I am happy to provide support, even if you aren’t a customer yet, but don’t give me some random bit of information that has absolutely nothing to do to the problem at hand and expect me to guess what the issue is.

Relevant information like the stack trace, what build you are on, what classes are involved, etc are expected.

Negative hiring decisions, Part I


One of the things that I really hate is to be reminded anew how stupid some people are. Or maybe it is how stupid they think I am.  One of the things that we are doing during interviews is to ask candidates to do some fairly simple code tasks. Usually, I give them an hour or two to complete that (using VS and a laptop), and if they don’t complete everything, they can do that at home and send me the results.

This is a piece of code that one such candidate has sent. To be clear, this is something that the candidate has worked on at home and had as much time for as she wanted:

public int GetTaxs(int salary)
{
    double  net, tax;

    switch (salary)
    {
        case < 5070:
            tax = salary  * 0.1;
            net=  salary  - tax ;
            break;

        case < 8660:
        case > 5071:
            tax = (salary - 5071)*0.14;
            tax+= 5070 * 0.1;
            net = salary-tax;   
            break;
        case < 14070:
        case > 8661:
            tax=  (salary - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net=  salary - tax;
            break;
        case <21240:
        case >14071:
            tax=  (salary- 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case <40230:
        case >21241:
            tax=  (salary- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case > 40230:
            tax= (salary - 40230)*0.45;
            tax+=  (40230- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        default:
            break;
    }
}

Submitting code that doesn’t actually compiles is a great way to pretty much ensures that I won’t hire you.

You saved 5 cents, and your code is not readable, congrats!


I found myself reading this post, and at some point, I really wanted to cry:

We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!

We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.

Let me do the math for a second, okay?

A two terabyte hard drive now costs 120 USD. By my math, that makes:

  • 1 TB = 60 USD
  • 1 GB = 0.058 USD

In other words, that massive saving that they are talking about? 5 cents!

Let me do another math problem, oaky?

Developer costs about 75,000 USD per year.

  • (52 weeks – 2 vacation weeks) x 40 work hours = 2,000 work hours per year.
  • 75,000 / 2,000 = 37.5 $ / hr
  • 37.5 / 60 minutes = 62 cents per minutes.

In other words, assuming that this change cost a single minute of developer time, the entire saving is worse than moot.

And it is going to take a lot more than one minute.

Update: Fixed decimal placement error in the cost per minute. Fixed mute/moot issue.

To those of you pointing out that real server storage space is much higher. You are correct, of course. I am trying to make a point. Even assuming that it costs two orders of magnitudes higher than what I said, that is still only 5$. Are you going to tell me that saving the price of a single cup of coffee is actually meaningful?

To those of you pointing out that MongoDB effectively stores the entire DB in memory. The post talked about disk size, not about memory, but even so, that is still not relevant. Mostly because MongoDB only requires indexes to fit in memory, and (presumably) indexes don't really need to store the field name per each indexed entry. If they do, then there is something very wrong with the impl.

How to pay 3 times for the same flight ticket


My current mood is somewhere between mad, pissed off, frustrated and exhausted.

All I want to do it arrive home, but dark forces has conspired against me. It works like this:

  • I got a ticket both ways to London, and I left to London quite happily, taking the train to the airport. At that point things started going horribly wrong. An idiot has suicided on the train trucks, leading to about 2 hours of delays, which, naturally, caused me to miss my flight. We will leave that aside and just point out that I somehow managed to get to London without being late to my course the next morning.
  • That is when I heard that since I was a no show for the outgoing flight, the airline cancelled my flight back.  I had to order a new one, but because I was busy actually doing the course, I asked the company to order that for me.
  • They did, and I went to the airport quite satisfied, except that on arrival, the stupid machine told me that my booking is invalid, which is the point that I discovered that the ticket was ordered on Ayende Rahien. Unfortunately, legally speaking, that guy doesn’t exists. The airline was quite insistent that they can’t put me on board with the different name. And they made me buy a new ticket.

That was the point (after I paid for the same bloody seat for the third time) that they told me that they oversold the flight, and that they put me on waiting list.

I tried to explain that they could use the second ticket that I bought, and just give me the same seat, but they told me that they already sold that to someone else.

I am currently waiting to hear whatever I’ll get to go home tonight or not.

Your truly,

Pissed off,

Oren Eini (because being Ayende is currently dangerous, given my mood)

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats