Fast bitmap iteration in C#

time to read 2 min | 333 words

I had a task for which I need to track a union of documents and then iterate over them in order. It is actually easier to explain in code than in words. Here is the rough API:

void Init(int maxId, IEnumerable<IEnumerable<int>> items);
bool Exists(int id);
int GetNextId(int id);
view raw challenge.cs hosted with ❤ by GitHub

As you can see, we initialize the value with a list of streams of ints. Each of the streams can contain any number of values in the range [0 … maxId). Different streams can the same or different ids.

After initialization, we have to allow to query the result, to test whatever a particular id was stored, which is easy enough. If this was all I needed, we could make do with a simple HashSet<int> and mostly call it a day.  However, we also need to support iteration, more interesting, we have to support sorted iteration.

A quick solution would be to use something like SortedList<int,int>, but that is going to be massively expensive to do (O(N*logN) to insert). It is also going to waste a lot of memory, which is important. A better solution would be to use a bitmap, which will allow us to use a single bit per value. Given that we know the size of the data in advance, that is much cheaper, and the cost of insert is O(N) to the number of ids we want to store. Iteration, on the other hand, is a bit harder on a bitmap.

Luckily, we have Lemire to provide a great solution. I have taken his C code and translated that to C#. Here is the result:

public class FastBitArray : IDisposable
{
private ulong[] _bits;
public FastBitArray(int countOfBits)
{
_bits = ArrayPool<ulong>.Shared.Rent(countOfBits / 64 + (countOfBits % 64 == 0 ? 0 : 1));
new Span<ulong>(_bits).Clear();
}
public void Set(int index)
{
_bits[index / 64] |= 1UL << index % 64;
}
public IEnumerable<int> Iterate(int from)
{
// https://lemire.me/blog/2018/02/21/iterating-over-set-bits-quickly/
int i = from / 64;
if (i >= _bits.Length)
yield break;
ulong bitmap = _bits[i];
bitmap &= ulong.MaxValue << (from % 64);
while (true)
{
while (bitmap != 0)
{
ulong t = bitmap & (ulong)-(long)bitmap;
int count = BitOperations.TrailingZeroCount(bitmap);
int setBitPos = i * 64 + count;
if (setBitPos >= from)
yield return setBitPos;
bitmap ^= t;
}
i++;
if (i >= _bits.Length)
break;
bitmap = _bits[i];
}
}
public void Dispose()
{
if (_bits == null)
return;
ArrayPool<ulong>.Shared.Return(_bits);
_bits = null;
}
}
view raw FastBitArray.cs hosted with ❤ by GitHub

I’m using BitOperations.TrailingZeroCount, which will use the compiler intrinsics to compile this to a very similar code to what Lemire wrote. This allows us to iterate over the bitmap in large chunks, so even for a large bitmap, if it is sparsely populated, we are going to get good results.

Depending on the usage, a better option might be a Roaring Bitmap, but even there, dense sections will likely use something similar for optimal results.