Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 66

filter by tags archive

Production postmortemThe case of the man in the middle

time to read 3 min | 553 words

One of the most frustrating things when you dealing with production issues is when the problem is not in our product, but elsewhere. In particular, this post is dedicated to the hard work done by many anti virus products, in particular, to make our life harder.

Let us take a look at the following quote, taken from the ESET NOD32 Anti Virus knowledge base (emphasis mine):

By default, your ESET product automatically detects programs that are used as web browsers and email clients, and adds them to the list of programs that the internal proxy scans. This can cause loss of internet connectivity or other undesired results with applications that use network features but are not web browsers/email clients.

Yes, it can. In fact, it very often does.

Previously, we looked at a similar issue with Anti Virus slowing down I/O enough to cause us to slowly die. But in this case, the issue is a lot more subtle.

Because it is doing content filtering, it tends to put a much higher overhead on the system resources, which means that as far as the user is concerned, RavenDB is slow. We actually developed features specifically to handle this scenario. The traffic watch mode will tell you how much time you spend on the server side, and we have added a feature that will make RavenDB account for the internal work each query is doing, so we can tell where the actual cost is.

You can enable that by issuing:

GET databases/Northwind/debug/enable-query-timing

And one that is setup, you can get a good idea about what is costly in the query, as far as RavenDB is concerned. Here is an example of a very slow query:

image

You can see that the issue is that we are issuing a very wide range query, so most of the time is spent in inside Lucene. Other examples might be ridicilously complex queries, which result in high parsing time (we have seen queries in the hundreds of KB range). Or loading a lot of big documents, or… you get the drift. If we see that the server thinks that a query is fast, but the overall time is slow, we know to blame the network.

But an even more insidious issue is that this would drop requests,  consistently and randomly (and yes, I know that those are contradictions, it was consistently dropping requests in a random pattern that seemed explicitly designed to thwart figuring out what is going on). Leading to things breaking, and escalated support calls. “RavenDB is broken” leads to a lot of headache, and a burning desire to hit something when you figure out that not only isn’t it your fault, but the underlying reason is actively trying to prevent you from figuring it out (I assume it is to deal with viruses that try to shut it off), which lead to really complex find facting sessions.

That is more annoying because it seems that the issue there was a bug in respecting keep alive sessions for authenticated requests under some scenarios, in the AV product in question! Absolutely not fun!

Is select() broken? Memory mapped files with unbufferred writes == race condition?

time to read 24 min | 4734 words

Let me start this post by stating that I am not even sure if what I am trying to do is legal here. But from reading the docs, it does appear to be a valid use of the API, and it does work, most of the time.

The full code can be found here: https://gist.github.com/ayende/7495987

The gist of it is that I am trying to do two things:

  • Write to a file opened with FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING.
  • Read from this file using a memory map.
  • Occasionally, I get into situations where after I wrote to the file, I am not reading what I wrote.

I have a repro, and we reproduced this on multiple machines. Both Windows 7 and Windows 8.

Here is the relevant code (the full code is in the link), explanation on it below:

   1: const uint nNumberOfBytesToWrite = 4096*3;
   2: var buffer = (byte*)(VirtualAlloc(IntPtr.Zero, new UIntPtr(nNumberOfBytesToWrite), AllocationType.COMMIT, MemoryProtection.READWRITE)
   3:             .ToPointer());
   4:  
   5: for (int i = 0; i < nNumberOfBytesToWrite; i++)
   6: {
   7:     *(buffer + i) = 137;
   8: }
   9:  
  10: var g = Guid.NewGuid().ToString();
  11:  
  12: var safeHandle = CreateFile(g,
  13:     NativeFileAccess.GenericRead | NativeFileAccess.GenericWrite,
  14:     NativeFileShare.Read, IntPtr.Zero,
  15:     NativeFileCreationDisposition.OpenAlways,
  16:     NativeFileAttributes.Write_Through | NativeFileAttributes.NoBuffering | NativeFileAttributes.DeleteOnClose,
  17:     IntPtr.Zero);
  18:  
  19: var fileStream = new FileStream(safeHandle, FileAccess.ReadWrite);
  20: fileStream.SetLength(1024 * 1024 * 1024); // 1gb
  21:  
  22: if (safeHandle.IsInvalid)
  23: {
  24:     throw new Win32Exception();
  25: }
  26:  
  27: FileStream mms = fileStream;
  28: //mms = new FileStream(g, FileMode.Open, FileAccess.Read, FileShare.ReadWrite | FileShare.Delete);
  29: var mmf = MemoryMappedFile.CreateFromFile(mms, Guid.NewGuid().ToString(), fileStream.Length,
  30:     MemoryMappedFileAccess.Read, null, HandleInheritability.None, true);
  31:  
  32: MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(0, fileStream.Length, MemoryMappedFileAccess.Read);
  33: byte* ptr = null;
  34: accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref ptr);
  35:  
  36: Task.Factory.StartNew(() =>
  37: {
  38:     long lastPos = 0;
  39:     while (true)
  40:     {
  41:         int count = 0;
  42:         while (true)
  43:         {
  44:             if (*(ptr + lastPos) != 137)
  45:             {
  46:                 break;
  47:             }
  48:             lastPos += 4096;
  49:             count ++;
  50:         }
  51:         Console.WriteLine();
  52:         Console.WriteLine("Verified {0} MB", count * 4 / 1024);
  53:         Console.WriteLine();
  54:         Thread.Sleep(2000);
  55:     }
  56: });
  57:  
  58: for (int i = 0; i < 1024*64; i++)
  59: {
  60:     var pos = i*nNumberOfBytesToWrite;
  61:     if (i%100 == 0)
  62:         Console.Write("\r{0,10:#,#} kb", pos/1024);
  63:     var nativeOverlapped = new NativeOverlapped
  64:     {
  65:         OffsetHigh = 0,
  66:         OffsetLow = (int) pos
  67:     };
  68:  
  69:     uint written;
  70:     if (WriteFile(safeHandle, buffer, nNumberOfBytesToWrite, out written, &nativeOverlapped) == false)
  71:         throw new Win32Exception();
  72:  
  73:     for (int j = 0; j < 3; j++)
  74:     {
  75:         if (*(ptr + pos) != 137)
  76:         {
  77:             throw new Exception("WTF?!");
  78:         }
  79:         pos += 4096;
  80:     }
  81: }

This code is doing the following:

  • We setup a file handle using NoBuffering | Write_Through, and we also map the file using memory map.
  • We write 3 pages (12Kb) at a time to the file.
  • After the write, we are using memory map to verify that we actually wrote what we wanted to the file.
  • _At the same time_ we are reading from the same memory in another thread.
  • Occasionally, we get an error where the data we just wrote to the file cannot be read back.

Now, here is what I think is actually happening:

  • When we do an unbuffered write, Windows has to mark the relevant pages as invalid.
  • I _think_ that it does so before it actually perform the write.
  • If you have another thread that access that particular range of memory at the same time, it can load the _previously_ written data.
  • The WriteFile actually perform the write, but the pages that map to that portion of the file have already been marked as loaded.
  • At that point, when we use the memory mapped pointer to access the data, we get the data that was there before the write.

As I said, the code above can reproduce this issue (you might have to run it multiple times).

I am not sure if this is something that is valid issue or just me misusing the code. The docs are pretty clear about using regular i/o & memory mapped i/o. The OS is responsible to keeping them coherent with respect to one another. However, that is certainly not the case here.

It might be that I am using a single handle for both, and Windows does less checking when that happens? For what it is worth, I have also tried it using different handles, and I don’t see the problem in the code above, but I have a more complex scenario where I do see the same issue.

Of course, FILE_FLAG_OVERLAPPED is not specified, so what I would actually expect here is serialization of the I/O, according to the docs. But mostly I need a sanity check to tell me if I am crazy.

Don’t play peekaboo with support, damn it!

time to read 2 min | 326 words

One of the things that we really pride ourselves with Hibernating Rhinos is the level of support. Just to give you some idea, today & yesterday we had core team members on the phone with people (not customers, yes) who are having problems with RavenDB for quite some time.

Now, I understand that you may not always have the information to give, but what you have, give me! So I can help you.

From a recent exchange in the mailing list:

var clickCount = session.Query<TrackerRequest>().Where(t => t.TrackerCreated == doc.Created).Where(t => t.Type == Type.Click).Count();

This gives:

"Non-static method requires a target"

To which I replied:

What is the full error that you get? Compile time? Runtime?

The answer I got:

This is a document I'm trying to 'Count':

{

  "Created": "2012-11-15T16:12:42.1775747",

  "IP": "127.0.0.1",

  "TrackerCreated": "2012-11-15T14:12:16.3951000Z",

  "Referrer": "http://example.com",

  "Type": "Click"

}

Raven terminal gives:

Request # 172: GET     -     3 ms - <system>   - 200 - /indexes/Raven/DocumentsByEntityName?query=Tag%253ATrackerRequests&start=0&pageSize=30&aggregation=None&noCache=-1129797484

        Query: Tag:TrackerRequests

        Time: 2 ms

        Index: Raven/DocumentsByEntityName

        Results: 3 returned out of 3 total.

By the way, you might note that this ISN’T related in any way to his issue. This query (and document) were gotten from the Studio. I can tell by the URL.

Then there was this:

image

I mean, seriously, I am happy to provide support, even if you aren’t a customer yet, but don’t give me some random bit of information that has absolutely nothing to do to the problem at hand and expect me to guess what the issue is.

Relevant information like the stack trace, what build you are on, what classes are involved, etc are expected.

Negative hiring decisions, Part I

time to read 5 min | 853 words

One of the things that I really hate is to be reminded anew how stupid some people are. Or maybe it is how stupid they think I am.  One of the things that we are doing during interviews is to ask candidates to do some fairly simple code tasks. Usually, I give them an hour or two to complete that (using VS and a laptop), and if they don’t complete everything, they can do that at home and send me the results.

This is a piece of code that one such candidate has sent. To be clear, this is something that the candidate has worked on at home and had as much time for as she wanted:

public int GetTaxs(int salary)
{
    double  net, tax;

    switch (salary)
    {
        case < 5070:
            tax = salary  * 0.1;
            net=  salary  - tax ;
            break;

        case < 8660:
        case > 5071:
            tax = (salary - 5071)*0.14;
            tax+= 5070 * 0.1;
            net = salary-tax;   
            break;
        case < 14070:
        case > 8661:
            tax=  (salary - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net=  salary - tax;
            break;
        case <21240:
        case >14071:
            tax=  (salary- 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case <40230:
        case >21241:
            tax=  (salary- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case > 40230:
            tax= (salary - 40230)*0.45;
            tax+=  (40230- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        default:
            break;
    }
}

Submitting code that doesn’t actually compiles is a great way to pretty much ensures that I won’t hire you.

You saved 5 cents, and your code is not readable, congrats!

time to read 3 min | 444 words

I found myself reading this post, and at some point, I really wanted to cry:

We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!

We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.

Let me do the math for a second, okay?

A two terabyte hard drive now costs 120 USD. By my math, that makes:

  • 1 TB = 60 USD
  • 1 GB = 0.058 USD

In other words, that massive saving that they are talking about? 5 cents!

Let me do another math problem, oaky?

Developer costs about 75,000 USD per year.

  • (52 weeks – 2 vacation weeks) x 40 work hours = 2,000 work hours per year.
  • 75,000 / 2,000 = 37.5 $ / hr
  • 37.5 / 60 minutes = 62 cents per minutes.

In other words, assuming that this change cost a single minute of developer time, the entire saving is worse than moot.

And it is going to take a lot more than one minute.

Update: Fixed decimal placement error in the cost per minute. Fixed mute/moot issue.

To those of you pointing out that real server storage space is much higher. You are correct, of course. I am trying to make a point. Even assuming that it costs two orders of magnitudes higher than what I said, that is still only 5$. Are you going to tell me that saving the price of a single cup of coffee is actually meaningful?

To those of you pointing out that MongoDB effectively stores the entire DB in memory. The post talked about disk size, not about memory, but even so, that is still not relevant. Mostly because MongoDB only requires indexes to fit in memory, and (presumably) indexes don't really need to store the field name per each indexed entry. If they do, then there is something very wrong with the impl.

How to pay 3 times for the same flight ticket

time to read 2 min | 344 words

My current mood is somewhere between mad, pissed off, frustrated and exhausted.

All I want to do it arrive home, but dark forces has conspired against me. It works like this:

  • I got a ticket both ways to London, and I left to London quite happily, taking the train to the airport. At that point things started going horribly wrong. An idiot has suicided on the train trucks, leading to about 2 hours of delays, which, naturally, caused me to miss my flight. We will leave that aside and just point out that I somehow managed to get to London without being late to my course the next morning.
  • That is when I heard that since I was a no show for the outgoing flight, the airline cancelled my flight back.  I had to order a new one, but because I was busy actually doing the course, I asked the company to order that for me.
  • They did, and I went to the airport quite satisfied, except that on arrival, the stupid machine told me that my booking is invalid, which is the point that I discovered that the ticket was ordered on Ayende Rahien. Unfortunately, legally speaking, that guy doesn’t exists. The airline was quite insistent that they can’t put me on board with the different name. And they made me buy a new ticket.

That was the point (after I paid for the same bloody seat for the third time) that they told me that they oversold the flight, and that they put me on waiting list.

I tried to explain that they could use the second ticket that I bought, and just give me the same seat, but they told me that they already sold that to someone else.

I am currently waiting to hear whatever I’ll get to go home tonight or not.

Your truly,

Pissed off,

Oren Eini (because being Ayende is currently dangerous, given my mood)

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 3 hours from now
  2. Production postmortem: The case of the lying configuration file - about one day from now
  3. Production postmortem: The industry at large - 2 days from now
  4. The insidious cost of allocations - 3 days from now
  5. Buffer allocation strategies: A possible solution - 6 days from now

And 4 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    31 Aug 2015 - The case of the memory eater and high load
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats