Excerpts from the RavenDB Performance team reportEtags and evil code, Part I

time to read 7 min | 1350 words

As part of the performance work we have been doing, we focused on the Etag class we use as a performance hot spot. For this post, I’m going to talk about Etag.Parse(string) and what we did to improve its performance.

Etag is a core type in RavenDB, it is how we represent changes in the database, and we deal with it a lot. As such, it turned up as a performance hot spot in our profiling. This post is based mostly around Maxim’s work to improve the overall performance.

One of the things that we do with Etag is to take a string and turn that into an instance of an Etag. An Etag looks just like a Guid, but its structure has meaning that we care about. Here is what a typical Etag looks like:

01000000-0000-0005-0000-00000000012C

We send them over the wire as strings for a lot of purposes. Here is the initial code that we used for parsing Etags:

public static Etag Parse2(string str)
{
    if (string.IsNullOrEmpty(str))
        throw new ArgumentException("str cannot be empty or null");
    if (str.Length != 36)
        throw new ArgumentException(string.Format("str must be 36 characters (Etag::Parse is {0})", str));

    var buffer = new byte[16]
    {
        byte.Parse(str.Substring(16, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(14, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(11, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(9, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(6, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(4, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(2, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(0, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(34, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(32, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(30, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(28, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(26, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(24, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(21, 2), NumberStyles.HexNumber),
        byte.Parse(str.Substring(19, 2), NumberStyles.HexNumber)
    };

    return new Etag
    {
        restarts = BitConverter.ToInt64(buffer, 0),
        changes = BitConverter.ToInt64(buffer, 8)
    };
}

You can note several things here. First, we are being kinda funky here with the order of parsing. This is because we read the string in Big Endian format, to allow direct comparisons of the bytes strings.

The other issue is the shear number of allocations and calls we are making. Because this is such a small code base, we created 10,000 etags and parsed them a million times .That took 17.3 seconds. For a throughput of roughly 5 Etags per millisecond.

Then we sat down and re-wrote it using a lot of somewhat nasty tricks.

private readonly static int[] _asciisOfHexToNum = CreateHexCharsToNumsTable();

private static int[] CreateHexCharsToNumsTable()
{
    var c = new int['z' + 1];
    for (var i = '0'; i <= '9'; i++)
    {
        c[i] = (char)(i - '0');
    }
    for (var i = 'A'; i <= 'Z'; i++)
    {
        c[i] = (char)((i - 'A') + 10);
    }
    for (var i = 'a'; i <= 'z'; i++)
    {
        c[i] = (char)((i - 'a') + 10);
    }

    return c;
}

public unsafe static Etag Parse(string str)
{
    if (str == null || str.Length != 36)
        throw new ArgumentException("str cannot be empty or null");

    fixed (char* input = str)
    {
        var etag = new Etag();
        int fst = ((byte)(_asciisOfHexToNum[input[0]] * 16 + _asciisOfHexToNum[input[1]])) << 24 |
            ((byte)(_asciisOfHexToNum[input[2]] * 16 + _asciisOfHexToNum[input[3]])) << 16 |
            ((byte)(_asciisOfHexToNum[input[4]] * 16 + _asciisOfHexToNum[input[5]])) << 8 |
            (byte)(_asciisOfHexToNum[input[6]] * 16 + _asciisOfHexToNum[input[7]]);
        int snd = ((byte)(_asciisOfHexToNum[input[9]] * 16 + _asciisOfHexToNum[input[10]])) << 24 |
            ((byte)(_asciisOfHexToNum[input[11]] * 16 + _asciisOfHexToNum[input[12]])) << 16 |
            ((byte)(_asciisOfHexToNum[input[14]] * 16 + _asciisOfHexToNum[input[15]])) << 8 |
            ((byte)(_asciisOfHexToNum[input[16]] * 16 + _asciisOfHexToNum[input[17]]));
        etag.restarts = (uint)snd | ((long)fst << 32);


        fst = ((byte)(_asciisOfHexToNum[input[19]] * 16 + _asciisOfHexToNum[input[20]])) << 24 |
            ((byte)(_asciisOfHexToNum[input[21]] * 16 + _asciisOfHexToNum[input[22]])) << 16 |
            ((byte)(_asciisOfHexToNum[input[24]] * 16 + _asciisOfHexToNum[input[25]])) << 8 |
            ((byte)(_asciisOfHexToNum[input[26]] * 16 + _asciisOfHexToNum[input[27]]));
        snd = ((byte)(_asciisOfHexToNum[input[28]] * 16 + _asciisOfHexToNum[input[29]])) << 24 |
            ((byte)(_asciisOfHexToNum[input[30]] * 16 + _asciisOfHexToNum[input[31]])) << 16 |
            ((byte)(_asciisOfHexToNum[input[32]] * 16 + _asciisOfHexToNum[input[33]])) << 8 |
            ((byte)(_asciisOfHexToNum[input[34]] * 16 + _asciisOfHexToNum[input[35]]));
        etag.changes = (uint)snd | ((long)fst << 32);
        return etag;
    }
}

Here is what we did. We created a translation table, so every possible byte is pre calculated. That means that for the cost of actually parsing the string is basically doing a lookup into an array, which is constant type. There are no allocations, nor is there anything expensive going on. On the same machine the previous code took 17.3 to run, this code can process a million Etags in just 287 milliseconds. Or close to 3,500 Etag per milliseconds.

I leave the rate of improvement as an exercise for the reader. Also, can you think of any way in which we can improve this code even further? We weren’t able to figure out anything, but you never knows.

More posts in "Excerpts from the RavenDB Performance team report" series:

  1. (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
  2. (18 Feb 2015) JSON & Structs in Voron
  3. (13 Feb 2015) Facets of information, Part II
  4. (12 Feb 2015) Facets of information, Part I
  5. (06 Feb 2015) Do you copy that?
  6. (05 Feb 2015) Optimizing Compare – Conclusions
  7. (04 Feb 2015) Comparing Branch Tables
  8. (03 Feb 2015) Optimizers, Assemble!
  9. (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
  10. (29 Jan 2015) Optimizing Memory Comparisons, size does matter
  11. (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
  12. (27 Jan 2015) Optimizing Memory Comparisons
  13. (26 Jan 2015) Optimizing Memory Compare/Copy Costs
  14. (23 Jan 2015) Expensive headers, and cache effects
  15. (22 Jan 2015) The long tale of a lambda
  16. (21 Jan 2015) Dates take a lot of time
  17. (20 Jan 2015) Etags and evil code, part II
  18. (19 Jan 2015) Etags and evil code, Part I
  19. (16 Jan 2015) Voron vs. Esent
  20. (15 Jan 2015) Routing