Fast bitmap iteration in C#
I had a task for which I need to track a union of documents and then iterate over them in order. It is actually easier to explain in code than in words. Here is the rough API:
As you can see, we initialize the value with a list of streams of ints. Each of the streams can contain any number of values in the range [0 … maxId). Different streams can the same or different ids.
After initialization, we have to allow to query the result, to test whatever a particular id was stored, which is easy enough. If this was all I needed, we could make do with a simple HashSet<int> and mostly call it a day. However, we also need to support iteration, more interesting, we have to support sorted iteration.
A quick solution would be to use something like SortedList<int,int>, but that is going to be massively expensive to do (O(N*logN) to insert). It is also going to waste a lot of memory, which is important. A better solution would be to use a bitmap, which will allow us to use a single bit per value. Given that we know the size of the data in advance, that is much cheaper, and the cost of insert is O(N) to the number of ids we want to store. Iteration, on the other hand, is a bit harder on a bitmap.
Luckily, we have Lemire to provide a great solution. I have taken his C code and translated that to C#. Here is the result:
I’m using BitOperations.TrailingZeroCount, which will use the compiler intrinsics to compile this to a very similar code to what Lemire wrote. This allows us to iterate over the bitmap in large chunks, so even for a large bitmap, if it is sparsely populated, we are going to get good results.
Depending on the usage, a better option might be a Roaring Bitmap, but even there, dense sections will likely use something similar for optimal results.
Awesome post! Thanks @Lemire & @Ayende.
P.S. is it safe to assume the compiler will replace n / 64 with n >> 8?
Yes, but why? The compiler already does that for you.
This is basically a hardware-accelerated pigeonhole sort, right? It's a bit confusing that the result doesn't look like the API at all, but it is trivially converted. I think that the clearing of lower bits on line 22 makes the branch on line 30 redundant.
The docs of ArrayPool.Rent say
` The array returned by this method may not be zero-initialized ~~~ It will mess up the algorithm when you get a previously rented buffer.
You are correct, however,
ArrayPool.Sharedis set up to return clean instances. See : https://github.com/dotnet/runtime/issues/30649
You are absolutely correct about the branch. But I'm not sure why you think this is about sorting.
Since the requirement was to allow iteration in order, this implies that it has sorting properties. Case in point: Iterate(0) would return all input, sorted ascending.
There is a definite similarity in pigeonhole sorting and this algorithm.
I read your linked issue as indication that
ArrayPool.Sharedis explicitly not set up to return clean instances. See also github.com/dotnet/runtime/issues/24416 .
I'm curious where you found that
ArrayPool.Sharedreturns zeroed instances. Looking in the code at ArrayPool.cs says that
Sharedis implemented by
So we look there TlsOverPerCoreLockedStacksArrayPool.cs#L137
buffer = GC.AllocateUninitializedArray<T>(_bucketArraySizes[bucketIndex]);
which is clearly returning an uninitialized instance.
You are correct, I never considered this to be the case, since I'm actually getting the data in a sorted fashion already. Re-reading the issue, I think that you are correct, and I fixed the issue.