Fast bitmap iteration in C#

Oct 26 2020

Fast bitmap iteration in C#

time to read 2 min | 333 words

I had a task for which I need to track a union of documents and then iterate over them in order. It is actually easier to explain in code than in words. Here is the rough API:

As you can see, we initialize the value with a list of streams of ints. Each of the streams can contain any number of values in the range [0 … maxId). Different streams can the same or different ids.

After initialization, we have to allow to query the result, to test whatever a particular id was stored, which is easy enough. If this was all I needed, we could make do with a simple HashSet<int> and mostly call it a day. However, we also need to support iteration, more interesting, we have to support sorted iteration.

A quick solution would be to use something like SortedList<int,int>, but that is going to be massively expensive to do (O(N*logN) to insert). It is also going to waste a lot of memory, which is important. A better solution would be to use a bitmap, which will allow us to use a single bit per value. Given that we know the size of the data in advance, that is much cheaper, and the cost of insert is O(N) to the number of ids we want to store. Iteration, on the other hand, is a bit harder on a bitmap.

Luckily, we have Lemire to provide a great solution. I have taken his C code and translated that to C#. Here is the result:

I’m using BitOperations.TrailingZeroCount, which will use the compiler intrinsics to compile this to a very similar code to what Lemire wrote. This allows us to iterate over the bitmap in large chunks, so even for a large bitmap, if it is sparsely populated, we are going to get good results.

Depending on the usage, a better option might be a Roaring Bitmap, but even there, dense sections will likely use something similar for optimal results.

Tweet Share Share 8 comments

Tags:

Comments

26 Oct 2020
19:17 PM

Ben

Awesome post! Thanks @Lemire & @Ayende.

P.S. is it safe to assume the compiler will replace n / 64 with n >> 8?

26 Oct 2020
19:32 PM

Oren Eini

Ben,

Yes, but why? The compiler already does that for you.

26 Oct 2020
20:17 PM

kpvleeuwen

This is basically a hardware-accelerated pigeonhole sort, right? It's a bit confusing that the result doesn't look like the API at all, but it is trivially converted. I think that the clearing of lower bits on line 22 makes the branch on line 30 redundant.

28 Oct 2020
15:37 PM

kpvleeuwen

The docs of ArrayPool.Rent say ` The array returned by this method may not be zero-initialized ~~~ It will mess up the algorithm when you get a previously rented buffer.

28 Oct 2020
15:51 PM

Oren Eini

kpvleeuwen,

You are correct, however, ArrayPool.Shared is set up to return clean instances. See : https://github.com/dotnet/runtime/issues/30649

28 Oct 2020
15:52 PM

Oren Eini

kpvleeuwen ,

You are absolutely correct about the branch. But I'm not sure why you think this is about sorting.

29 Oct 2020
19:11 PM

kpvleeuwen

Since the requirement was to allow iteration in order, this implies that it has sorting properties. Case in point: Iterate(0) would return all input, sorted ascending.

There is a definite similarity in pigeonhole sorting and this algorithm.

I read your linked issue as indication that ArrayPool.Shared is explicitly not set up to return clean instances. See also github.com/dotnet/runtime/issues/24416 .

I'm curious where you found that ArrayPool.Shared returns zeroed instances. Looking in the code at ArrayPool.cs says that Shared is implemented by TlsOverPerCoreLockedStacksArrayPool.

So we look there TlsOverPerCoreLockedStacksArrayPool.cs#L137

buffer = GC.AllocateUninitializedArray<T>(_bucketArraySizes[bucketIndex]);

which is clearly returning an uninitialized instance.

01 Nov 2020
08:40 AM

Oren Eini

kpvleeuwen,

You are correct, I never considered this to be the case, since I'm actually getting the data in a sorted fashion already. Re-reading the issue, I think that you are correct, and I fixed the issue.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB