## Integer compressionSIMD bit packing and unusual usages

time to read 3 min | 430 words

I talked a bit before about the nature of bit packing and how the simdcomp library isn’t actually doing compression. Why do I care about that, then?

Because the simdcomp library provides a very useful building block. A simple set of functions that allows you to very quickly pack and unpack bits. That is important, since those operations are fast enough that we can treat them as effectively free. Once that exists, we can now start finding interesting usages for those.

Let’s assume that we have the following functions:

• Pack(Span<uint> input, Stream output, int bit);
• Unpack(Stream input, Span<uint> output, int bit);

It get a set of numbers and the number of bits and encode or decode them as needed.

Because those methods are fast, it means that we can play with various interesting designs. For example, let’s take dictionary compression. Assume that we have a set of values, such as country names. This looks like this:

[ "United States", "China", "Japan", "India", "United States", "Brazil", "Germany", "India", "France", "China", "Japan", "Brazil", … ]

This array contains thousands or millions of entries. We want to encode that in an efficient manner, how can we do that? Here is a simple way to utilize the speed of bit-packing to do something useful:

What is going on here?

• We first iterate over the set of countries, finding the unique names and giving an index for each one of them.
• Along the way, we produce an integer array of the indexes of country names.
• Then we compute the maximum number of bits that we have to use to store those indexes.
• To the output, we write the number of unique country names, the names themselves and then the bit-packed set of indexes.

Let’s assume that we have 30 unique values in this list, but the list itself is 5 million items in size. What is the actual cost here?

We’ll assume an average country name of 8 bytes, to make the math easy. That means that to store the list of countries would cost us 38MB, if we did that using the most obvious approach.

Using the code above, assuming 30 unique values, we’ll have 5 million values with 5 bits each, leading to a total cost of about 3MB, instead of 38MB.

The nice thing about this approach is that once you have fast bit-packing routines, you can now start using them in all sorts of interesting ways.

The reason I talked about the topic before starting to discuss FastPFor is that it makes a lot more sense to understand how bit-packing can be applied separately from FastPFor, since that is just making interesting use of the capability.