As part of the performance work we have been doing, we focused on the Etag class we use as a performance hot spot. For this post, I’m going to talk about Etag.Parse(string) and what we did to improve its performance.
Etag is a core type in RavenDB, it is how we represent changes in the database, and we deal with it a lot. As such, it turned up as a performance hot spot in our profiling. This post is based mostly around Maxim’s work to improve the overall performance.
One of the things that we do with Etag is to take a string and turn that into an instance of an Etag. An Etag looks just like a Guid, but its structure has meaning that we care about. Here is what a typical Etag looks like:
01000000-0000-0005-0000-00000000012C
We send them over the wire as strings for a lot of purposes. Here is the initial code that we used for parsing Etags:
public static Etag Parse2(string str) { if (string.IsNullOrEmpty(str)) throw new ArgumentException("str cannot be empty or null"); if (str.Length != 36) throw new ArgumentException(string.Format("str must be 36 characters (Etag::Parse is {0})", str)); var buffer = new byte[16] { byte.Parse(str.Substring(16, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(14, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(11, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(9, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(6, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(4, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(2, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(0, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(34, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(32, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(30, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(28, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(26, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(24, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(21, 2), NumberStyles.HexNumber), byte.Parse(str.Substring(19, 2), NumberStyles.HexNumber) }; return new Etag { restarts = BitConverter.ToInt64(buffer, 0), changes = BitConverter.ToInt64(buffer, 8) }; }
You can note several things here. First, we are being kinda funky here with the order of parsing. This is because we read the string in Big Endian format, to allow direct comparisons of the bytes strings.
The other issue is the shear number of allocations and calls we are making. Because this is such a small code base, we created 10,000 etags and parsed them a million times .That took 17.3 seconds. For a throughput of roughly 5 Etags per millisecond.
Then we sat down and re-wrote it using a lot of somewhat nasty tricks.
private readonly static int[] _asciisOfHexToNum = CreateHexCharsToNumsTable(); private static int[] CreateHexCharsToNumsTable() { var c = new int['z' + 1]; for (var i = '0'; i <= '9'; i++) { c[i] = (char)(i - '0'); } for (var i = 'A'; i <= 'Z'; i++) { c[i] = (char)((i - 'A') + 10); } for (var i = 'a'; i <= 'z'; i++) { c[i] = (char)((i - 'a') + 10); } return c; }public unsafe static Etag Parse(string str) { if (str == null || str.Length != 36) throw new ArgumentException("str cannot be empty or null"); fixed (char* input = str) { var etag = new Etag(); int fst = ((byte)(_asciisOfHexToNum[input[0]] * 16 + _asciisOfHexToNum[input[1]])) << 24 | ((byte)(_asciisOfHexToNum[input[2]] * 16 + _asciisOfHexToNum[input[3]])) << 16 | ((byte)(_asciisOfHexToNum[input[4]] * 16 + _asciisOfHexToNum[input[5]])) << 8 | (byte)(_asciisOfHexToNum[input[6]] * 16 + _asciisOfHexToNum[input[7]]); int snd = ((byte)(_asciisOfHexToNum[input[9]] * 16 + _asciisOfHexToNum[input[10]])) << 24 | ((byte)(_asciisOfHexToNum[input[11]] * 16 + _asciisOfHexToNum[input[12]])) << 16 | ((byte)(_asciisOfHexToNum[input[14]] * 16 + _asciisOfHexToNum[input[15]])) << 8 | ((byte)(_asciisOfHexToNum[input[16]] * 16 + _asciisOfHexToNum[input[17]])); etag.restarts = (uint)snd | ((long)fst << 32); fst = ((byte)(_asciisOfHexToNum[input[19]] * 16 + _asciisOfHexToNum[input[20]])) << 24 | ((byte)(_asciisOfHexToNum[input[21]] * 16 + _asciisOfHexToNum[input[22]])) << 16 | ((byte)(_asciisOfHexToNum[input[24]] * 16 + _asciisOfHexToNum[input[25]])) << 8 | ((byte)(_asciisOfHexToNum[input[26]] * 16 + _asciisOfHexToNum[input[27]])); snd = ((byte)(_asciisOfHexToNum[input[28]] * 16 + _asciisOfHexToNum[input[29]])) << 24 | ((byte)(_asciisOfHexToNum[input[30]] * 16 + _asciisOfHexToNum[input[31]])) << 16 | ((byte)(_asciisOfHexToNum[input[32]] * 16 + _asciisOfHexToNum[input[33]])) << 8 | ((byte)(_asciisOfHexToNum[input[34]] * 16 + _asciisOfHexToNum[input[35]])); etag.changes = (uint)snd | ((long)fst << 32); return etag; } }
Here is what we did. We created a translation table, so every possible byte is pre calculated. That means that for the cost of actually parsing the string is basically doing a lookup into an array, which is constant type. There are no allocations, nor is there anything expensive going on. On the same machine the previous code took 17.3 to run, this code can process a million Etags in just 287 milliseconds. Or close to 3,500 Etag per milliseconds.
I leave the rate of improvement as an exercise for the reader. Also, can you think of any way in which we can improve this code even further? We weren’t able to figure out anything, but you never knows.