Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,551

filter by tags archive

Dazed and confused: The state of the Core

time to read 5 min | 886 words

It feels like I took my eyes off the ball for a second, and suddenly I'm in an entirely new game.

We have been working with CoreCLR (the RC1 release from last year), and have been grateful to be able to run our software oh so easily on both Windows and Linux (heaven and earth difference from trying to run on Mono).

However, stuff changed. I know about the delay in shipping CoreCLR (and I'm whole heartedly approve it, we need something stable), and I have some rough idea about where the general direction is. But what is the current state? No idea.

Googling this find me fragments of information, and not at all useful.

The decision to go with NET Standard instead of the bazillion individual packages is something that I applaud. It should make our life easier. Except that when I tried that I run into a whole bunch of issues that caused me to stop trying for a time.

Basically, here is what I need to know is whatever there currently a way to get a CoreCLR (dotnet cli, whatever the current term is) on the latest framework / libraries that would actually make sense.

In this context, make sense means that I get to do things like use Visual Studio normally (more or less) to build / debug my application. I'm fine with quite a lot of manual / scripts work, but I really like being able to debug. The nice thing about the RC1 bits is that it mostly works, you can debug it, profilers work, the whole thing. It is different than running on the standard framework, because some things don't work, but that is something that we can work with. (In particular, no easy way to get pdbs for stuff like profiling is really annoying).

When I read Hanselman's post about the dotnet cli, I was quite excited. It made sense. But now I feel that instead of having all the usual power that I'm used to have plus all the conveyance of just opening a shell and start pounding code, it is the other way around. It is somewhat like being back working in perl. If stuff works, great, but if it isn't, you might as well just start from scratch.

I'm talking about the build system, the dependencies, "how do I run this?", "how do I debug this?" questions.

What makes this worse is that there doesn't appear to be a roadmap that we can peek at (not that I could find). I can see some discussions in chats and in issues, but there is no real "this is where we are going" sense.

To explain some of my frustration, let me take my simplest project, which current contains the following details:

  "frameworks": {
    "dnxcore50": {
      "dependencies": {
        "Microsoft.CSharp": "4.0.1-beta-23516",
        "System.Collections.Concurrent": "4.0.11-beta-23516",
        "System.Diagnostics.Debug": "4.0.11-beta-23516",
        "System.Diagnostics.Contracts": "4.0.1-beta-23516",
        "System.Runtime.InteropServices.RuntimeInformation": "4.0.0-beta-23516",
        "Microsoft.AspNet.Hosting": "1.0.0-rc1-final",
        "Microsoft.AspNet.Server.Kestrel": "1.0.0-rc1-final",
        "Microsoft.Extensions.Configuration.FileProviderExtensions": "1.0.0-rc1-final",
        "Microsoft.Extensions.Configuration.Json": "1.0.0-rc1-final",
        "Microsoft.AspNet.WebSockets.Server": "1.0.0-rc1-final",
        "NLog": "4.4.0-*"
      }
    }
  },
  "dependencies": {
  }

Now, I want to move this core from DNX RC1 to the latest release. So this probably will look like this:

"frameworks": {
  "dnxcore50": {
    "dependencies": {
      "Microsoft.Extensions.Configuration.FileProviderExtensions": "1.0.0-rc1-final",
      "Microsoft.Extensions.Configuration.Json": "1.0.0-rc1-final",
      "Microsoft.AspNet.WebSockets.Server": "1.0.0-rc1-final",
      "NLog": "4.4.0-*"
    }
  }
},
"dependencies": {
  "NETStandard.Library": "1.0.0-rc3-23909"
},

I think, at least, that this is what it is supposed to be. Everything that I expect to be in the standard library to be in the standard library, and the rest I still need to reference. Now we get to the fun part. Using "dotnet restore / dotnet build" it successfully compiles. But using "dnu restore / dnu build" it does not.

I think that this relates to the magic incantation of the specific nuget feeds that it checks, but I'm not sure. Not being able to also compile that on DNX means that I can't use Visual Studio to actually work on this project.

And a more complex project just doesn't work.

So I'm lost, and feeling quite stupid at this point. And more than a bit frustrated.

The role of logs

time to read 2 min | 344 words

You are probably familiar with logging, and log levels. In particular, the following scheme is very popular.

I have been using this scheme for pretty much every software project I had in over a decade. And it works. Which is good.

But I found that there are actually not really useful for our scenarios. In particular, we found that there are many cases where a component that logged something as a warn or error would generate a support ticket by eagle eyed admin, while it was just a valid part of the software behavior. The other side of that is that when we need to use the logs to find a problem, we never use fine grained logs, we just need it all.

This lead me to believe that we only actually need two and a half levels.

  • Informative – we are reporting so one of our support engineers or a team member can look at that and understand what the software is doing.
  • Operational – something deserves attention from the customer’s operations team.

There isn’t really anything in between those. Either this is something that we fully expect an operation team at the customer to look at, or this is something that the development and/or support engineers need to see.

But I mentioned two and a half, what is that half? Well, above operational, there is another half a level, which I like to call Pay Attention Now. This isn’t actually a log level, it is a way to attract someone’s attention to the log, or to some sort of operational alert. This is an SMS sent, or an email, etc.

But beyond the “something need attention now”, there is nothing else that is needed. The production logs are either for routine operations monitoring (in which case they should be relatively succinct and written with an eye to a reader who isn’t familiar with the internal working of RavenDB) or “what is going on here” mode, which is typically done by a team member who needs the most information possible.

Code through the looking glassAnd a linear search to rule them

time to read 1 min | 187 words

The final piece for this candidate is the code to actually search through the sorted array. As you'll recall, the candidate certainly made our computer pay for that sorting and merging. But we are fine now, we have manage to recover from all of that abuse, and we are ready to rock.

Here is how the search (over sorted array) was implemented.

public List<int> EmailSearcher(string email)
{
    List<int> answer = new List<int>();
    for (int i = 0; i < emailsArray.Length; i++)
    {
        if (emailsArray[i].STR.ToString().Equals(email, StringComparison.OrdinalIgnoreCase))
        {
            answer = emailsArray[i].ID;
            return answer;
        }
    }

    return answer;
}

And with that, I have nothing left to say.

Code through the looking glassSorting large data sets

time to read 3 min | 590 words

The task in question was to read a CSV file (which is large, but can fit in memory) and allow quick searches on the data by email to find the relevant users. Most candidates handle that with a dictionary, but since our candidate decided to flip things around, he also included an implementation of doing this with sorted arrays. That is actually quite nice, in theory. In practice, it ended up something like this:

funny, funny pictures, funny photos, hilarious, wtf, weird, bizarre, funny animals, 17 Bizarre Photoshop Hybrid Animals

In order to truly understand the code, I have to walk you through a  bunch of it.

It starts with the following fields:

List<Structure> emails = new List<Structure>();

Structure[] namesArray = new Structure[0];

Structure, by the way, is a helper class that just has a key and a list of ids.  The duplicate is strange, but whatever, let move on.

The class constructor is reading one line at a time and add an structure instance with the email and the line id to the emails list.

public void ToArray()
{
    emailsArray = this.emails.ToArray();
}

This code just copy emails to the array, we are not sure why yet, but then we have:

public void Sort()
{
    Array.Sort(emailsArray, delegate(Structure x, Structure y) { return x.STR.CompareTo(y.STR); });
}

So far, this is right in line with the "use a sorted array" method that the candidate talked about. There is a small problem here, because emails are allowed to be duplicated, but no fear, our candidate can solve that…

public void UnifyIdLists()
{
    for (int i = 0; i < emailsArray.Length; i++)
    {
        if (i + 1 == emailsArray.Length)
            break;
        if (emailsArray[i].STR.Equals(emailsArray[i + 1].STR))
        {
            emailsArray[i].ID.AddRange(emailsArray[i + 1].ID);
            emailsArray[i + 1] = null;
            List<Structure> temp = emailsArray.ToList<Structure>();
            temp.RemoveAll(item => item == null);
            emailsArray = temp.ToArray();
        }
    }
}

The intent of this code is to merge all identical email values into a single entry in the list.

Now, to put things in perspective, we are talking about a file that is going to be around the 500MB in size, and there are going to be about 3.5 million lines in it.

That means that the emailsArray alone is going to take about 25MB.

Another aspect to consider is that we are using dummy data in our test file. Do you want to guess how many duplicates there are going to be there? Each of which is generating two 25MB allocations and multiple passes over an array of 3.5 million items in size.

Oh, and for the hell of it, the code above doesn't even work. Consider the case when we have three duplicates…

Code through the looking glassI'm the God of all OOP

time to read 3 min | 472 words

We typically don't look too deeply into the overall design on a candidate's code. Most coding tasks just aren't big enough to warrant it.

When we do, it is typically because the candidate has done something… peculiar. For example, we have had a candidate that send a submission with no less than 17 classes to manager reading a CSV file and populating a dictionary. At some point we knew that this candidate wasn't going to move forward, but it became a challenge, to try to outline how the data actually traveled through that particular maze. But as much as we love Mr. Goldberg, I'm afraid that dear Rubbie isn't going to be remembered for much longer.

Here is how this candidate code started:

static void Main(string[] args)
{
    //Call for parsing method
    HelperClass cr = new HelperClass();
    cr.CsvParser();
    cr.ToArray();
    cr.Sort();
    cr.UnifyIdLists();
    //ask the user to enter input
    string input = cr.InputReader();

I actually started explaining the issues in this code, but I run out of… pretty much everything.

I have a challenge for you, can you think of a single OOP principle that isn't being broken here?

For… fun, I guess, you can also look at the following code, which come just after the one above:

Start:
    //check input type
    switch (cr.InputTester(input))
    {
        case 0:
            //valid email structure, but the input is not found in the file
            List<int> emails = new List<int>();
            emails = cr.EmailSearcher(input);
            if (emails.Count == 0)
            {
                Console.WriteLine("The input you have entered has not been found in the file. \nPlease try again.");
                input = cr.InputReader();
                goto Start;
            }

I guess someone heard that loops are a performance bottleneck in most applications and decided to do away with them.

Code through the looking glassAll you need is a dictionary (and O(N) )

time to read 2 min | 362 words

The first question that ask in the coding task goes something like this: "Given a CSV file containing users' data, who is large, but can fully fit into memory, we want to be able to search by email address very quickly and get all the matching user ids. Optimize for fast queries over fast startup".

The intent is basically that you'll read the file and load that into a dictionary, then use that to perform queries.

This candidate has done just that, although things started being strange pretty much from the get go…

Dictionary<int, string> emails = new Dictionary<int, string>();

That seemed like a pretty strange way to set things up, but we have seen crazier. At this point I thought that they are probably storing the hash code of the email in the dictionary, and the string value is the concatenated list of all the matching ids.

The truth was… far more interesting. Here is the code for querying:

public Dictionary<int, int> EmailSearcher(string email)
{
    Dictionary<int, int> answer = new Dictionary<int, int>();
    int count = 0;
    foreach (var entry in emails)
    {
        if (entry.Value.ToString().Equals(email, StringComparison.OrdinalIgnoreCase))
        {
            answer.Add(count, entry.Key);
            count++;
        }

    }
    return answer;
}

This code actually have multiple levels of strangeness. The obvious one is that this is doing a linear search on the dictionary, but look at the return type as well…

The candidate was well aware that this code is not able to handle large amount of information, so the candidate sent us an alternative implementation. But I'll talk about that in my next post.

Code through the looking glassFinding today's log file

time to read 4 min | 615 words

You might have noticed that we are looking for more people again. Mostly because I gripe about some of the bad ones here every now and then.

One of my absolute requirements is that I want to read a candidate's code before hiring them. Some of them have made significant contributions to Open Source code, but a large number don't have any significant body of code that they can show. That is why we ask candidates to send us a coding test.

We had people flat out refuse to do the coding test, and some who thought that the questions were incredibly hard and unrealistic. We had people who sent in code that was so bad it caused migraines, we had people who sent in code that wouldn't compile, we  had people do stuff like read a 15TB file (16,492,674,416,640)! times. And just to clarify, that is factorial(16492674416640). I don't know what the actual value of this number is, but is it big.

The nice thing is that you can usually tell right away when the code is bad. We also play a game called "guess the candidate's background". We have about 89% success rate there*.

Spotting good code is much harder, because a really successful submission is going to be boring. It does what needs to be done, and that is it. Our most recent hire's code submission was so boring, we had to analyze it using our standard metrics to find issues (our standard metrics are for production code running in a high performance DB for months on end, not the same metrics we evaluate code submissions).

And then… then there is this candidate, whose code is so unique that I decided to dedicate a full week to explore it. The code looks good, documented, there are explanations that show what is going on and they are going in the right direction, on the surface of it.

And then there is the devil in the details. I have quite a lot to write about this particular submission, but I'll just start with the following:

//Find today's log file in the directory
public string LogFileFinder()
{
    string[] files = Directory.GetFiles(LoggingAggregation.Program.GlobalVar.path, "*.log");
    for (int i = 0; i < files.Length; i++)
    {
        int slash = files[i].LastIndexOf(@"\");
        int dot = files[i].LastIndexOf(".");
        string fileName = files[i].Substring(slash + 1, dot - slash - 1);
        string fileDate = DateTime.Now.ToString("yyyy-MM-dd");
        if (fileName.Equals(fileDate))
            return fileName + ".log";
    }

    return null;
}

Now, another way to write this code would be:

Path.Combine(LoggingAggregation.Program.GlobalVar.path, DateTime.Now.ToString("yyyy-MM-dd") +".log") 

I literally stared at this piece of code for several minutes, trying to figure out what is actually going on there. I wasn't able to.

As an aside, we sent the candidate a rejection notice, along with a few pointers to some of the more egregious things that were wrong in the code.  They know how to code, it is just that it goes sideway in the middle.

* The game call to guess the candidate's default language settings. That is, someone who write C# code, but has been brought up on C, so have a C style to the code.

Rate limiting fun

time to read 2 min | 352 words

I was talking with a few of the guys at work, and the concept of rate limiting came up. In particular, being able to limit the number of operations a particular user can do against the system. The actual scenario isn't really important, but the idea kept bouncing in my head, so I sat down and wrote a quick test case.

Nitpicker corner: This is scratchpad code, it isn't production worthy, tested or validated.

The idea is probably best explained in code, like so:

private SemaphoreSlim _rateLimit = new SemaphoreSlim(10);

public async Task HandlePath(HttpContext context, string method, string path)
{
    if (await _rateLimit.WaitAsync(3000) == false)
    {
        context.Response.StatusCode = 429; // Too many requests
        return;
    }

    // actually process requests

}

Basically, we define a semaphore, with the maximum number of operations that we want to allow, and we wait on the semaphore when we are starting the operation.

However, there is nothing that actually releases the semaphore. Here we get into design choices.

We can release the semaphore when the request is over, which effectively gives us rate limiting in terms of concurrent requests.

The more interesting approach from my perspective was to use this:

 _timer = new Timer(state =>
 {
     var currentCount = 10 - _rateLimit.CurrentCount;
     if (currentCount == 0)
         return;
     _rateLimit.Release(currentCount);
 }, null, 1000, 1000);

Using this approach, we are actually limited to 10 requests a second.

And yes, this actually allows more concurrent requests than the previous option, because if a request takes more than one second, we'll reset its count on the timer's tick.

I actually tested this using gobench, and it confirmed that this is actually serving exactly 10 requests / second.

The cost of dog fooding

time to read 2 min | 264 words

You might have noticed that there we a few errors on the blog recently.

That is related to a testing strategy that we employ for RavenDB.

In particular, part of our test scenario is to shove a RavenDB build to our own production system, to see how it works in a live production system with real workloads.

Our production systems are entirely running on RavenDB, and we have been playing with all sort of configuration and deployment options recently. The general idea is to give us good indication about the performance of RavenDB and make sure that we don’t have any performance regression.

Here is an example taken from our tracking system, for this blog:

image

You can see that we had a few periods with longer than usual response times. The actual reason for that was that we had some code that was throwing tremendous amount of work to RavenDB, and this is actually exhibiting noisy neighbor syndrome in this case (that is, the blog is behaving fine, but the machine it is on is very busy). That gave us indication about a possible optimization and some better internal metrics.

At any rate, the downside of living on the bleeding edge in production is that we sometimes get a lemon build.

That is the cost of dog fooding, something you need to clean up the results.

Dependencies management in a crisis

time to read 3 min | 480 words

Typically when people talk about dependencies they talk about how easy it is to version them, deploy them, change & replace them, etc. There seems to be a very deep focus on the costs of dependencies during development.

Today I want to talk about another aspect of that. The cost of dependencies when you have a crisis. In particular, consider the case of having a 2 AM support call that is rooted to one of your dependencies. What do you do then?

The customer see a problem in your software, so they call you, and you are asked to resolve it. After you narrowed the problem down to a particular dependency, you now need to check whatever this is your usage of the dependency that is broken, or whatever there is a genuine issue with the dependency.

Let us take a case in point with a recent support call we had. When running RavenDB on a Windows Cluster with both nodes sharing the same virtual IP, authentication doesn’t work. It took us a while to narrow it down to Windows authentication doesn’t work, and that is where we got stuck. Windows authentication is a wonderfully convenient  tool, but if there is an error, just finding out about it require specialized knowledge and skills. After verifying that our usage of the code looked correct, we ended up writing a minimal reproduction with about 20 lines of code, which also reproduced the issue.

At that point, we were able to escalate to Microsoft with the help of the customer. Apparently this is a Kerberos issue and you need to use NTLM and there was a workaround with some network configuration (check our docs if you really care about the details). But the key point here is that we would really have absolutely no way to figure it out on our own. Our usage of Windows authentication was according to the published best practices, but in this scenario you had to do something different to get it to work.

The point here is that if we weren’t able to escalate that to Microsoft, we would be in a pretty serious issue with the customer “we can’t fix this issue” is something that no one wants to hear.

As much as possible, we try to make sure that any dependencies that we take are either:

  • Stuff that we wrote and understand.
  • Open source* components that are well understood.
  • Have a support contract that we can fall back on, with the SLA we require.
  • Non essential / able to be disabled without major loss of functionality.

* Just taking OSS component from some GitHub repo is a bad idea. You need to be able to trust them, which means that you need to be sure that you can go into the code and either fix things or understand why they are broken.

FUTURE POSTS

  1. The worker pattern - 15 minutes from now

There are posts all the way to May 30, 2016

RECENT SERIES

  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats