Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,644 | Comments: 48,400

filter by tags archive

.NET Core 2.1 broke my software, thank you very much!

time to read 1 min | 137 words

We just upgraded our stable branch to .NET Core 2.1. The process was pretty smooth overall, but we did get the following exchange in our internal Slack channel.

It went something like this:

  • is it known that import doesn't work ?
  • As you can imagine, Import is pretty important for us.
  • no
  • does it work on your machine ?
  • checking,,,
  • what's an error?
  • no error.
  • so UI is blocked?
  • image
  • do you have any errors in dev tools console?
  • `TypeError: e is undefined`
    doesn't says to me much
  • same thing in incognito
  • export doesn't work either
  • lol the reason is: dotnet core 2.1
  • the websockets are faster and I had race in code
    will push fix shortly

There you have it, .NET Core 2.1 broke our code. Now I have to go and add Thread.Sleep somewhere…

This will be $11,509,586,000, please (excluding tip)

time to read 2 min | 251 words

imageAbout a decade ago I was working at a client, and I was trying to get some idea about the scope of the project. This was a fairly large B2B system, with some interesting requirements around business logic behaviors.

For the life of me, I couldn’t get the customer to commit to even rough SLA or hard numbers around performance, capacity and scale. Whenever I asked, I got some variant of: “It has to be fast, really fast.”

But what is fast, and under what scenario, that was impossible to figure out. I got quite frustrated by this issue and pressed hard on this topic, and finally got something that was close to a hard number, “it has to be like Google”.

That was a metric I could do something with. I went to last year’s Google financial statements, figured out how much money they spent that year and sent the customer a quote for 11.5 billion US dollars.

As you might imagine, that number caught people’s attention, especially since I sent it to quite a few people at the customer. On a call with the customer I explained, “You want it to be like Google, so I used the same budget as Google”.

From that point it was much easier to actually get performance and scale numbers, although I did have to cut a few zeroes (like, all of them) from the quote.

Giving Demeter PTSD

time to read 1 min | 113 words

The law of Demeter goes like this, a method m of an object O may only invoke the methods of the following kinds of objects:

  1. O itself
  2. m's parameters
  3. Any objects created/instantiated within m
  4. O's direct component objects
  5. A global variable, accessible by O, in the scope of m

And then we have this snippet from a 1,741 lines index that was sent to us to diagnose some performance problems.

image

There are at least two separate leaks of customer data here, by the way, can you spot them?

This is it for this post, I really don’t have anything else left to say.

The things that come out late at night

time to read 2 min | 205 words

The following is the opening paragraphs for discussion RavenDB 4.0 clustering and distribution model in the Inside RavenDB 4.0 book.

You might be familiar with the term "murder of crows" as a way to refer to a group for crows[1]. It has been used in literature and arts many times. Of less reknown is the group term for ravens, which is "unkindness". Personally, in the name of all ravens, I'm torn between being insulted and amused.

Professionally, setting up RavenDB as a cluster on a group of machines is a charming exercise (however, that term is actually reserved for finches) that bring a sense of exaltation (taken too, by larks) by how pain free this is. I'll now end my voyage into the realm of ornithology's etymology and stop speaking in tongues.

On a more serious note, the fact that RavenDB clustering is easy to setup is quite important, because it means that it is much more approachable.


[1] If you are interested in learning why, I found this answer fascinating

It was amusing to write, and it got me to actually start writing that part of the book. Although I’m not sure if this will survive editing and actually end up in the book.

Code through the looking glassAnd a linear search to rule them

time to read 1 min | 187 words

The final piece for this candidate is the code to actually search through the sorted array. As you'll recall, the candidate certainly made our computer pay for that sorting and merging. But we are fine now, we have manage to recover from all of that abuse, and we are ready to rock.

Here is how the search (over sorted array) was implemented.

public List<int> EmailSearcher(string email)
{
    List<int> answer = new List<int>();
    for (int i = 0; i < emailsArray.Length; i++)
    {
        if (emailsArray[i].STR.ToString().Equals(email, StringComparison.OrdinalIgnoreCase))
        {
            answer = emailsArray[i].ID;
            return answer;
        }
    }

    return answer;
}

And with that, I have nothing left to say.

Code through the looking glassSorting large data sets

time to read 3 min | 590 words

The task in question was to read a CSV file (which is large, but can fit in memory) and allow quick searches on the data by email to find the relevant users. Most candidates handle that with a dictionary, but since our candidate decided to flip things around, he also included an implementation of doing this with sorted arrays. That is actually quite nice, in theory. In practice, it ended up something like this:

funny, funny pictures, funny photos, hilarious, wtf, weird, bizarre, funny animals, 17 Bizarre Photoshop Hybrid Animals

In order to truly understand the code, I have to walk you through a  bunch of it.

It starts with the following fields:

List<Structure> emails = new List<Structure>();

Structure[] namesArray = new Structure[0];

Structure, by the way, is a helper class that just has a key and a list of ids.  The duplicate is strange, but whatever, let move on.

The class constructor is reading one line at a time and add an structure instance with the email and the line id to the emails list.

public void ToArray()
{
    emailsArray = this.emails.ToArray();
}

This code just copy emails to the array, we are not sure why yet, but then we have:

public void Sort()
{
    Array.Sort(emailsArray, delegate(Structure x, Structure y) { return x.STR.CompareTo(y.STR); });
}

So far, this is right in line with the "use a sorted array" method that the candidate talked about. There is a small problem here, because emails are allowed to be duplicated, but no fear, our candidate can solve that…

public void UnifyIdLists()
{
    for (int i = 0; i < emailsArray.Length; i++)
    {
        if (i + 1 == emailsArray.Length)
            break;
        if (emailsArray[i].STR.Equals(emailsArray[i + 1].STR))
        {
            emailsArray[i].ID.AddRange(emailsArray[i + 1].ID);
            emailsArray[i + 1] = null;
            List<Structure> temp = emailsArray.ToList<Structure>();
            temp.RemoveAll(item => item == null);
            emailsArray = temp.ToArray();
        }
    }
}

The intent of this code is to merge all identical email values into a single entry in the list.

Now, to put things in perspective, we are talking about a file that is going to be around the 500MB in size, and there are going to be about 3.5 million lines in it.

That means that the emailsArray alone is going to take about 25MB.

Another aspect to consider is that we are using dummy data in our test file. Do you want to guess how many duplicates there are going to be there? Each of which is generating two 25MB allocations and multiple passes over an array of 3.5 million items in size.

Oh, and for the hell of it, the code above doesn't even work. Consider the case when we have three duplicates…

Code through the looking glassI'm the God of all OOP

time to read 3 min | 472 words

We typically don't look too deeply into the overall design on a candidate's code. Most coding tasks just aren't big enough to warrant it.

When we do, it is typically because the candidate has done something… peculiar. For example, we have had a candidate that send a submission with no less than 17 classes to manager reading a CSV file and populating a dictionary. At some point we knew that this candidate wasn't going to move forward, but it became a challenge, to try to outline how the data actually traveled through that particular maze. But as much as we love Mr. Goldberg, I'm afraid that dear Rubbie isn't going to be remembered for much longer.

Here is how this candidate code started:

static void Main(string[] args)
{
    //Call for parsing method
    HelperClass cr = new HelperClass();
    cr.CsvParser();
    cr.ToArray();
    cr.Sort();
    cr.UnifyIdLists();
    //ask the user to enter input
    string input = cr.InputReader();

I actually started explaining the issues in this code, but I run out of… pretty much everything.

I have a challenge for you, can you think of a single OOP principle that isn't being broken here?

For… fun, I guess, you can also look at the following code, which come just after the one above:

Start:
    //check input type
    switch (cr.InputTester(input))
    {
        case 0:
            //valid email structure, but the input is not found in the file
            List<int> emails = new List<int>();
            emails = cr.EmailSearcher(input);
            if (emails.Count == 0)
            {
                Console.WriteLine("The input you have entered has not been found in the file. \nPlease try again.");
                input = cr.InputReader();
                goto Start;
            }

I guess someone heard that loops are a performance bottleneck in most applications and decided to do away with them.

Code through the looking glassFinding today's log file

time to read 4 min | 615 words

You might have noticed that we are looking for more people again. Mostly because I gripe about some of the bad ones here every now and then.

One of my absolute requirements is that I want to read a candidate's code before hiring them. Some of them have made significant contributions to Open Source code, but a large number don't have any significant body of code that they can show. That is why we ask candidates to send us a coding test.

We had people flat out refuse to do the coding test, and some who thought that the questions were incredibly hard and unrealistic. We had people who sent in code that was so bad it caused migraines, we had people who sent in code that wouldn't compile, we  had people do stuff like read a 15TB file (16,492,674,416,640)! times. And just to clarify, that is factorial(16492674416640). I don't know what the actual value of this number is, but is it big.

The nice thing is that you can usually tell right away when the code is bad. We also play a game called "guess the candidate's background". We have about 89% success rate there*.

Spotting good code is much harder, because a really successful submission is going to be boring. It does what needs to be done, and that is it. Our most recent hire's code submission was so boring, we had to analyze it using our standard metrics to find issues (our standard metrics are for production code running in a high performance DB for months on end, not the same metrics we evaluate code submissions).

And then… then there is this candidate, whose code is so unique that I decided to dedicate a full week to explore it. The code looks good, documented, there are explanations that show what is going on and they are going in the right direction, on the surface of it.

And then there is the devil in the details. I have quite a lot to write about this particular submission, but I'll just start with the following:

//Find today's log file in the directory
public string LogFileFinder()
{
    string[] files = Directory.GetFiles(LoggingAggregation.Program.GlobalVar.path, "*.log");
    for (int i = 0; i < files.Length; i++)
    {
        int slash = files[i].LastIndexOf(@"\");
        int dot = files[i].LastIndexOf(".");
        string fileName = files[i].Substring(slash + 1, dot - slash - 1);
        string fileDate = DateTime.Now.ToString("yyyy-MM-dd");
        if (fileName.Equals(fileDate))
            return fileName + ".log";
    }

    return null;
}

Now, another way to write this code would be:

Path.Combine(LoggingAggregation.Program.GlobalVar.path, DateTime.Now.ToString("yyyy-MM-dd") +".log") 

I literally stared at this piece of code for several minutes, trying to figure out what is actually going on there. I wasn't able to.

As an aside, we sent the candidate a rejection notice, along with a few pointers to some of the more egregious things that were wrong in the code.  They know how to code, it is just that it goes sideway in the middle.

* The game call to guess the candidate's default language settings. That is, someone who write C# code, but has been brought up on C, so have a C style to the code.

Nepotism in the interview process

time to read 1 min | 79 words

I admit, she only go the interview because of who her father is. But she still managed to show more enthusiasm for the job than some candidates.

image

Her test was:

try
{
	Tamar.LearnToWalk();
	Tamar.LearnToProgram();
	Tamar.Debug();
}
catch(PoopException)
{
	Tamar.Change(DiaperMode.Clean);
}

She passed, but decided to take a nap instead of the job.

FUTURE POSTS

  1. RavenDB Security Vulnerability Advisory - 3 hours from now

There are posts all the way to Jun 18, 2018

RECENT SERIES

  1. Codex KV (2):
    06 Jun 2018 - Properly generating the file
  2. I WILL have order (3):
    30 May 2018 - How Bleve sorts query results
  3. RavenDB 4.1 features (4):
    22 May 2018 - Highlighting
  4. Inside RavenDB 4.0 (10):
    22 May 2018 - Book update
  5. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats