Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 414 words

We run our test suite in a loop to discover any race conditions, timing issues, errors, etc. When doing so, we got a hard crash from the dotnet.exe, and investigating the issue produced a stack trace inside the GC.

So I took a dump of the process memory, and created an issue about that with the CoreCLR repository, while giving it a very high priority internally, and having someone look at that very closely. We are using unsafe code extensively, so it was either a real GC bug or we messed up somewhere are corrupted our own state.

Very quickly Jan Kotas was able to point out that it was a heap corruption issue as well as the likely avenues for investigation.

After looking at this, we found that the problem was in our tests. In particular, in one specific test. In order to test the memory corruption, we changed it to add markers on where it overwrote the buffer, and the test passed.

This caused us additional concern, because the only thing we could think about was that maybe there is some invariant that is being broken. Our suspicion focused on the fixed statement in C# not working properly. Yes, I know, “hoof beats, horses, not zebras”.

So I went to the issue again and reported my finding, and Andy Ayers was kind enough to find the problem, and point it to me.

Here is the relevant test code:

image

This is during debugging, so you can see what the problem is. We defined size to be 40, and we defined an input buffer, whose size is 100.

A little bit below, we created an output buffer based on the size variable (40), and then wrote to it with the expected size of input.Length, which is 100. Everything behaved as it should, and we had a buffer overrun in the test, the heap was corrupted, and sometimes the GC died.

Also, I feel very stupid about spouting all sort of nonsense about bugs in the CLR when our code is unable to do simple arithmetic.

The good news, the bug was only in the tests, and the kind of support that you get from Microsoft on the CoreCLR is absolutely phenomenal. Thank you very much guys.

time to read 3 min | 507 words

imageWe get some… fascinating replies from candidates to our code tests. Some of them are so bad that I really wish that I could revoke some people’s keyboard access:

Case in point, we had a promising candidate from Israel’s equivalent of MIT (Technion, which is in the top 25 engineering schools in the world).

He submitted code that went something like this:

var nameFirstChar = char.Parse(name.Substring(0,1).ToLower());

switch (nameFirstChar)
{                                
    case 'a':
        using (StreamWriter indexFile = new StreamWriter(Path.Combine(basePath, "Name_IndeX_A.csv"), true))
        {
            indexFile.WriteLine(name);
        }
        break;
    case 'b':
        using (StreamWriter indexFile = new StreamWriter(Path.Combine(basePath, "Name_IndeX_B.csv"), true))
        {
            indexFile.WriteLine(name);
        }
        break;
    // ...
    // you can guess :-(
    // ...
    case 'y':
        using (StreamWriter indexFile = new StreamWriter(Path.Combine(basePath, "Name_IndeX_Y.csv"), true))
        {
            indexFile.WriteLine(name);
        }
        break;
    case 'z':
        using (StreamWriter indexFile = new StreamWriter(Path.Combine(basePath, "Name_IndeX_Z.csv"), true))
        {
            indexFile.WriteLine(name);
        }
        break;
}

And yes, he had the full A-Z there. This was part of a 1475 lines methods. And no, he didn’t handle all the cases for Unicode.

Yes, he just graduated, but some things are expected. Like knowing about loops.

time to read 1 min | 169 words

I’m currently reviewing CVs, seemingly by the hundreds*. And I run into a guy which has a Github profile link in the CV. Such links are always followed, because seeing someone’s actual work is so much better than just reading some document about it.

But then I saw this:

image

And looking into the actual repository we have:

image

While this isn’t quite enough to give you a Darwin Award in the job hunting department (sadly, I saw worse), how could anyone think that having a publicly visible repository that says “I do illegal things to software” is a good idea. Leaving aside that you link that from your CV.

* It isn’t that many, it is just annoying.

time to read 3 min | 553 words

One of the most frustrating things when you dealing with production issues is when the problem is not in our product, but elsewhere. In particular, this post is dedicated to the hard work done by many anti virus products, in particular, to make our life harder.

Let us take a look at the following quote, taken from the ESET NOD32 Anti Virus knowledge base (emphasis mine):

By default, your ESET product automatically detects programs that are used as web browsers and email clients, and adds them to the list of programs that the internal proxy scans. This can cause loss of internet connectivity or other undesired results with applications that use network features but are not web browsers/email clients.

Yes, it can. In fact, it very often does.

Previously, we looked at a similar issue with Anti Virus slowing down I/O enough to cause us to slowly die. But in this case, the issue is a lot more subtle.

Because it is doing content filtering, it tends to put a much higher overhead on the system resources, which means that as far as the user is concerned, RavenDB is slow. We actually developed features specifically to handle this scenario. The traffic watch mode will tell you how much time you spend on the server side, and we have added a feature that will make RavenDB account for the internal work each query is doing, so we can tell where the actual cost is.

You can enable that by issuing:

GET databases/Northwind/debug/enable-query-timing

And one that is setup, you can get a good idea about what is costly in the query, as far as RavenDB is concerned. Here is an example of a very slow query:

image

You can see that the issue is that we are issuing a very wide range query, so most of the time is spent in inside Lucene. Other examples might be ridicilously complex queries, which result in high parsing time (we have seen queries in the hundreds of KB range). Or loading a lot of big documents, or… you get the drift. If we see that the server thinks that a query is fast, but the overall time is slow, we know to blame the network.

But an even more insidious issue is that this would drop requests,  consistently and randomly (and yes, I know that those are contradictions, it was consistently dropping requests in a random pattern that seemed explicitly designed to thwart figuring out what is going on). Leading to things breaking, and escalated support calls. “RavenDB is broken” leads to a lot of headache, and a burning desire to hit something when you figure out that not only isn’t it your fault, but the underlying reason is actively trying to prevent you from figuring it out (I assume it is to deal with viruses that try to shut it off), which lead to really complex find facting sessions.

That is more annoying because it seems that the issue there was a bug in respecting keep alive sessions for authenticated requests under some scenarios, in the AV product in question! Absolutely not fun!

time to read 24 min | 4734 words

Let me start this post by stating that I am not even sure if what I am trying to do is legal here. But from reading the docs, it does appear to be a valid use of the API, and it does work, most of the time.

The full code can be found here: https://gist.github.com/ayende/7495987

The gist of it is that I am trying to do two things:

  • Write to a file opened with FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING.
  • Read from this file using a memory map.
  • Occasionally, I get into situations where after I wrote to the file, I am not reading what I wrote.

I have a repro, and we reproduced this on multiple machines. Both Windows 7 and Windows 8.

Here is the relevant code (the full code is in the link), explanation on it below:

   1: const uint nNumberOfBytesToWrite = 4096*3;
   2: var buffer = (byte*)(VirtualAlloc(IntPtr.Zero, new UIntPtr(nNumberOfBytesToWrite), AllocationType.COMMIT, MemoryProtection.READWRITE)
   3:             .ToPointer());
   4:  
   5: for (int i = 0; i < nNumberOfBytesToWrite; i++)
   6: {
   7:     *(buffer + i) = 137;
   8: }
   9:  
  10: var g = Guid.NewGuid().ToString();
  11:  
  12: var safeHandle = CreateFile(g,
  13:     NativeFileAccess.GenericRead | NativeFileAccess.GenericWrite,
  14:     NativeFileShare.Read, IntPtr.Zero,
  15:     NativeFileCreationDisposition.OpenAlways,
  16:     NativeFileAttributes.Write_Through | NativeFileAttributes.NoBuffering | NativeFileAttributes.DeleteOnClose,
  17:     IntPtr.Zero);
  18:  
  19: var fileStream = new FileStream(safeHandle, FileAccess.ReadWrite);
  20: fileStream.SetLength(1024 * 1024 * 1024); // 1gb
  21:  
  22: if (safeHandle.IsInvalid)
  23: {
  24:     throw new Win32Exception();
  25: }
  26:  
  27: FileStream mms = fileStream;
  28: //mms = new FileStream(g, FileMode.Open, FileAccess.Read, FileShare.ReadWrite | FileShare.Delete);
  29: var mmf = MemoryMappedFile.CreateFromFile(mms, Guid.NewGuid().ToString(), fileStream.Length,
  30:     MemoryMappedFileAccess.Read, null, HandleInheritability.None, true);
  31:  
  32: MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(0, fileStream.Length, MemoryMappedFileAccess.Read);
  33: byte* ptr = null;
  34: accessor.SafeMemoryMappedViewHandle.AcquirePointer(ref ptr);
  35:  
  36: Task.Factory.StartNew(() =>
  37: {
  38:     long lastPos = 0;
  39:     while (true)
  40:     {
  41:         int count = 0;
  42:         while (true)
  43:         {
  44:             if (*(ptr + lastPos) != 137)
  45:             {
  46:                 break;
  47:             }
  48:             lastPos += 4096;
  49:             count ++;
  50:         }
  51:         Console.WriteLine();
  52:         Console.WriteLine("Verified {0} MB", count * 4 / 1024);
  53:         Console.WriteLine();
  54:         Thread.Sleep(2000);
  55:     }
  56: });
  57:  
  58: for (int i = 0; i < 1024*64; i++)
  59: {
  60:     var pos = i*nNumberOfBytesToWrite;
  61:     if (i%100 == 0)
  62:         Console.Write("\r{0,10:#,#} kb", pos/1024);
  63:     var nativeOverlapped = new NativeOverlapped
  64:     {
  65:         OffsetHigh = 0,
  66:         OffsetLow = (int) pos
  67:     };
  68:  
  69:     uint written;
  70:     if (WriteFile(safeHandle, buffer, nNumberOfBytesToWrite, out written, &nativeOverlapped) == false)
  71:         throw new Win32Exception();
  72:  
  73:     for (int j = 0; j < 3; j++)
  74:     {
  75:         if (*(ptr + pos) != 137)
  76:         {
  77:             throw new Exception("WTF?!");
  78:         }
  79:         pos += 4096;
  80:     }
  81: }

This code is doing the following:

  • We setup a file handle using NoBuffering | Write_Through, and we also map the file using memory map.
  • We write 3 pages (12Kb) at a time to the file.
  • After the write, we are using memory map to verify that we actually wrote what we wanted to the file.
  • _At the same time_ we are reading from the same memory in another thread.
  • Occasionally, we get an error where the data we just wrote to the file cannot be read back.

Now, here is what I think is actually happening:

  • When we do an unbuffered write, Windows has to mark the relevant pages as invalid.
  • I _think_ that it does so before it actually perform the write.
  • If you have another thread that access that particular range of memory at the same time, it can load the _previously_ written data.
  • The WriteFile actually perform the write, but the pages that map to that portion of the file have already been marked as loaded.
  • At that point, when we use the memory mapped pointer to access the data, we get the data that was there before the write.

As I said, the code above can reproduce this issue (you might have to run it multiple times).

I am not sure if this is something that is valid issue or just me misusing the code. The docs are pretty clear about using regular i/o & memory mapped i/o. The OS is responsible to keeping them coherent with respect to one another. However, that is certainly not the case here.

It might be that I am using a single handle for both, and Windows does less checking when that happens? For what it is worth, I have also tried it using different handles, and I don’t see the problem in the code above, but I have a more complex scenario where I do see the same issue.

Of course, FILE_FLAG_OVERLAPPED is not specified, so what I would actually expect here is serialization of the I/O, according to the docs. But mostly I need a sanity check to tell me if I am crazy.

time to read 2 min | 326 words

One of the things that we really pride ourselves with Hibernating Rhinos is the level of support. Just to give you some idea, today & yesterday we had core team members on the phone with people (not customers, yes) who are having problems with RavenDB for quite some time.

Now, I understand that you may not always have the information to give, but what you have, give me! So I can help you.

From a recent exchange in the mailing list:

var clickCount = session.Query<TrackerRequest>().Where(t => t.TrackerCreated == doc.Created).Where(t => t.Type == Type.Click).Count();

This gives:

"Non-static method requires a target"

To which I replied:

What is the full error that you get? Compile time? Runtime?

The answer I got:

This is a document I'm trying to 'Count':

{

  "Created": "2012-11-15T16:12:42.1775747",

  "IP": "127.0.0.1",

  "TrackerCreated": "2012-11-15T14:12:16.3951000Z",

  "Referrer": "http://example.com",

  "Type": "Click"

}

Raven terminal gives:

Request # 172: GET     -     3 ms - <system>   - 200 - /indexes/Raven/DocumentsByEntityName?query=Tag%253ATrackerRequests&start=0&pageSize=30&aggregation=None&noCache=-1129797484

        Query: Tag:TrackerRequests

        Time: 2 ms

        Index: Raven/DocumentsByEntityName

        Results: 3 returned out of 3 total.

By the way, you might note that this ISN’T related in any way to his issue. This query (and document) were gotten from the Studio. I can tell by the URL.

Then there was this:

image

I mean, seriously, I am happy to provide support, even if you aren’t a customer yet, but don’t give me some random bit of information that has absolutely nothing to do to the problem at hand and expect me to guess what the issue is.

Relevant information like the stack trace, what build you are on, what classes are involved, etc are expected.

time to read 5 min | 853 words

One of the things that I really hate is to be reminded anew how stupid some people are. Or maybe it is how stupid they think I am.  One of the things that we are doing during interviews is to ask candidates to do some fairly simple code tasks. Usually, I give them an hour or two to complete that (using VS and a laptop), and if they don’t complete everything, they can do that at home and send me the results.

This is a piece of code that one such candidate has sent. To be clear, this is something that the candidate has worked on at home and had as much time for as she wanted:

public int GetTaxs(int salary)
{
    double  net, tax;

    switch (salary)
    {
        case < 5070:
            tax = salary  * 0.1;
            net=  salary  - tax ;
            break;

        case < 8660:
        case > 5071:
            tax = (salary - 5071)*0.14;
            tax+= 5070 * 0.1;
            net = salary-tax;   
            break;
        case < 14070:
        case > 8661:
            tax=  (salary - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net=  salary - tax;
            break;
        case <21240:
        case >14071:
            tax=  (salary- 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case <40230:
        case >21241:
            tax=  (salary- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case > 40230:
            tax= (salary - 40230)*0.45;
            tax+=  (40230- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        default:
            break;
    }
}

Submitting code that doesn’t actually compiles is a great way to pretty much ensures that I won’t hire you.

time to read 3 min | 444 words

I found myself reading this post, and at some point, I really wanted to cry:

We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!

We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.

Let me do the math for a second, okay?

A two terabyte hard drive now costs 120 USD. By my math, that makes:

  • 1 TB = 60 USD
  • 1 GB = 0.058 USD

In other words, that massive saving that they are talking about? 5 cents!

Let me do another math problem, oaky?

Developer costs about 75,000 USD per year.

  • (52 weeks – 2 vacation weeks) x 40 work hours = 2,000 work hours per year.
  • 75,000 / 2,000 = 37.5 $ / hr
  • 37.5 / 60 minutes = 62 cents per minutes.

In other words, assuming that this change cost a single minute of developer time, the entire saving is worse than moot.

And it is going to take a lot more than one minute.

Update: Fixed decimal placement error in the cost per minute. Fixed mute/moot issue.

To those of you pointing out that real server storage space is much higher. You are correct, of course. I am trying to make a point. Even assuming that it costs two orders of magnitudes higher than what I said, that is still only 5$. Are you going to tell me that saving the price of a single cup of coffee is actually meaningful?

To those of you pointing out that MongoDB effectively stores the entire DB in memory. The post talked about disk size, not about memory, but even so, that is still not relevant. Mostly because MongoDB only requires indexes to fit in memory, and (presumably) indexes don't really need to store the field name per each indexed entry. If they do, then there is something very wrong with the impl.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}