Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive

Backup goes wild

time to read 1 min | 143 words

image

We have a lot of internal test scenarios that run RavenDB through its paces.

One of them had an issue. It would test that RavenDB backups worked properly, but it would (sometimes) fail to clean up after itself.

The end result was that over time, the test database would accumulate more and more backup tasks that it had to execute (and all of them on the same schedule).

You can see here how RavenDB is allocating them to different nodes in an attempt to spread the load as much as possible.

We fixed the bug in the test case, but I also consider this a win because we now have protection from backup DoS attacks Smile. And I just love this image.

time to read 2 min | 275 words

imageMy daughter is 3​¼ years old now. About the time that she was born, I decided that I needed to make a small change in my language. Whenever I felt the urge to curse, I would say a food’s name. For example, after being puked on, my reaction would be some variant of: “PASTA”, “PASTA BOLOGNESE” or other pasta’s favorites.

As time went by, I got better and better at expressing emotion through increasingly disturbing food references. My current favorite is: “Pasta Bolognese with pickled carrots in a bun with anchovies and raw eggs”.

A couple of days ago, I took my daughter and a friend to an ice cream shop. As expected of an ice cream shop in the middle of (very hot) summer, the place was packed. My daughter was quite excited to go there and expressed her emotions by standing up and shouting at the top of her lungs (she is three, with a voice that carry like a foghorn): “PASTA! PASTA BOLOGNESE” over and over again.

This is a crowded shop, full of small kids and parents. I got some looks for the little girl holding up a full ice cream cone and shouting about pasta, but it was infinitely preferable to the alternative.

An unforeseen side effect, however, is that because I can, I’m very free with pasta based profanities. This had led to what is effectively a competition, with her trying to cause me to go overboard with that.

And now I must go back to work, before the gluten police arrival.

time to read 1 min | 137 words

We just upgraded our stable branch to .NET Core 2.1. The process was pretty smooth overall, but we did get the following exchange in our internal Slack channel.

It went something like this:

  • is it known that import doesn't work ?
  • As you can imagine, Import is pretty important for us.
  • no
  • does it work on your machine ?
  • checking,,,
  • what's an error?
  • no error.
  • so UI is blocked?
  • image
  • do you have any errors in dev tools console?
  • `TypeError: e is undefined`
    doesn't says to me much
  • same thing in incognito
  • export doesn't work either
  • lol the reason is: dotnet core 2.1
  • the websockets are faster and I had race in code
    will push fix shortly

There you have it, .NET Core 2.1 broke our code. Now I have to go and add Thread.Sleep somewhere…

time to read 2 min | 251 words

imageAbout a decade ago I was working at a client, and I was trying to get some idea about the scope of the project. This was a fairly large B2B system, with some interesting requirements around business logic behaviors.

For the life of me, I couldn’t get the customer to commit to even rough SLA or hard numbers around performance, capacity and scale. Whenever I asked, I got some variant of: “It has to be fast, really fast.”

But what is fast, and under what scenario, that was impossible to figure out. I got quite frustrated by this issue and pressed hard on this topic, and finally got something that was close to a hard number, “it has to be like Google”.

That was a metric I could do something with. I went to last year’s Google financial statements, figured out how much money they spent that year and sent the customer a quote for 11.5 billion US dollars.

As you might imagine, that number caught people’s attention, especially since I sent it to quite a few people at the customer. On a call with the customer I explained, “You want it to be like Google, so I used the same budget as Google”.

From that point it was much easier to actually get performance and scale numbers, although I did have to cut a few zeroes (like, all of them) from the quote.

time to read 1 min | 113 words

The law of Demeter goes like this, a method m of an object O may only invoke the methods of the following kinds of objects:

  1. O itself
  2. m's parameters
  3. Any objects created/instantiated within m
  4. O's direct component objects
  5. A global variable, accessible by O, in the scope of m

And then we have this snippet from a 1,741 lines index that was sent to us to diagnose some performance problems.

image

There are at least two separate leaks of customer data here, by the way, can you spot them?

This is it for this post, I really don’t have anything else left to say.

time to read 2 min | 205 words

The following is the opening paragraphs for discussion RavenDB 4.0 clustering and distribution model in the Inside RavenDB 4.0 book.

You might be familiar with the term "murder of crows" as a way to refer to a group for crows[1]. It has been used in literature and arts many times. Of less reknown is the group term for ravens, which is "unkindness". Personally, in the name of all ravens, I'm torn between being insulted and amused.

Professionally, setting up RavenDB as a cluster on a group of machines is a charming exercise (however, that term is actually reserved for finches) that bring a sense of exaltation (taken too, by larks) by how pain free this is. I'll now end my voyage into the realm of ornithology's etymology and stop speaking in tongues.

On a more serious note, the fact that RavenDB clustering is easy to setup is quite important, because it means that it is much more approachable.


[1] If you are interested in learning why, I found this answer fascinating

It was amusing to write, and it got me to actually start writing that part of the book. Although I’m not sure if this will survive editing and actually end up in the book.

time to read 1 min | 187 words

The final piece for this candidate is the code to actually search through the sorted array. As you'll recall, the candidate certainly made our computer pay for that sorting and merging. But we are fine now, we have manage to recover from all of that abuse, and we are ready to rock.

Here is how the search (over sorted array) was implemented.

public List<int> EmailSearcher(string email)
{
    List<int> answer = new List<int>();
    for (int i = 0; i < emailsArray.Length; i++)
    {
        if (emailsArray[i].STR.ToString().Equals(email, StringComparison.OrdinalIgnoreCase))
        {
            answer = emailsArray[i].ID;
            return answer;
        }
    }

    return answer;
}

And with that, I have nothing left to say.

time to read 3 min | 590 words

The task in question was to read a CSV file (which is large, but can fit in memory) and allow quick searches on the data by email to find the relevant users. Most candidates handle that with a dictionary, but since our candidate decided to flip things around, he also included an implementation of doing this with sorted arrays. That is actually quite nice, in theory. In practice, it ended up something like this:

funny, funny pictures, funny photos, hilarious, wtf, weird, bizarre, funny animals, 17 Bizarre Photoshop Hybrid Animals

In order to truly understand the code, I have to walk you through a  bunch of it.

It starts with the following fields:

List<Structure> emails = new List<Structure>();

Structure[] namesArray = new Structure[0];

Structure, by the way, is a helper class that just has a key and a list of ids.  The duplicate is strange, but whatever, let move on.

The class constructor is reading one line at a time and add an structure instance with the email and the line id to the emails list.

public void ToArray()
{
    emailsArray = this.emails.ToArray();
}

This code just copy emails to the array, we are not sure why yet, but then we have:

public void Sort()
{
    Array.Sort(emailsArray, delegate(Structure x, Structure y) { return x.STR.CompareTo(y.STR); });
}

So far, this is right in line with the "use a sorted array" method that the candidate talked about. There is a small problem here, because emails are allowed to be duplicated, but no fear, our candidate can solve that…

public void UnifyIdLists()
{
    for (int i = 0; i < emailsArray.Length; i++)
    {
        if (i + 1 == emailsArray.Length)
            break;
        if (emailsArray[i].STR.Equals(emailsArray[i + 1].STR))
        {
            emailsArray[i].ID.AddRange(emailsArray[i + 1].ID);
            emailsArray[i + 1] = null;
            List<Structure> temp = emailsArray.ToList<Structure>();
            temp.RemoveAll(item => item == null);
            emailsArray = temp.ToArray();
        }
    }
}

The intent of this code is to merge all identical email values into a single entry in the list.

Now, to put things in perspective, we are talking about a file that is going to be around the 500MB in size, and there are going to be about 3.5 million lines in it.

That means that the emailsArray alone is going to take about 25MB.

Another aspect to consider is that we are using dummy data in our test file. Do you want to guess how many duplicates there are going to be there? Each of which is generating two 25MB allocations and multiple passes over an array of 3.5 million items in size.

Oh, and for the hell of it, the code above doesn't even work. Consider the case when we have three duplicates…

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}