Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,007 | Comments: 44,761

filter by tags archive

Decompression code & Discussion

time to read 4 min | 763 words

As I said, a good & small interview question is this one, it is a good one because it is short, relatively simple to handle, but it should show a lot of things about your code. To start with, being faced with a non trivial task that most people are not that familiar with.

Implement a Decompress(Stream input, Stream output, byte[][] dictionary) routine for the following protocol:

Prefix byte – composed of the highest bit + 7 bits len

If the highest bit is not set, this is a uncompressed data, copy the next len from the input to the output.

If the highest bit is set, this is compressed data, the next byte will be the index in the dictionary, copy len bytes from that dictionary entry to the output.

I couldn’t resist doing this myself, and I came up with the following:

public void Decompress(Stream input, Stream output, byte[][] dictionary)
    var tmp = new byte[128];
    while (true)
        var readByte = input.ReadByte();
        if (readByte == -1)
        var prefix = (byte) readByte;
        var compressed = (prefix & 0x80) != 0;
        var len = prefix & 0x7f;

        if (compressed == false)
            while (len > 0)
                var read = input.Read(tmp, 0, len);
                if(read == 0)
                    throw new InvalidDataException("Not enough data to read from compressed input stream");
                len -= read;
                output.Write(tmp, 0, read);
            readByte = input.ReadByte();
            if(readByte == -1)
                throw new InvalidDataException("Not enough data to read from compressed input stream");
            output.Write(dictionary[readByte], 0, len);

Things to pay attention to: Low memory allocations, error handling, and handling of partial reads from the stream.

But that is just part of the question. After reading the protocol, and implementing it. The question now turns to what does the protocol says about this kind of compression scheme. The use of just 7 bits to store len drastically limit the compression utility in a general format. It also requires an external dictionary, which most compression formats don’t use, they use the actual compressed text itself as the dictionary.  Of course, I’ve been reading compression algorithms for a while now, so that isn’t that fair. But I would expect people to note that that 7 bit limits the compression usability.

And hopefully, with a bit of a hint, they should note that the external dictionary is useful for small data sets where the repetitions are actually between entry, not per entry.


Joe Buschmann

My solution is close to yours. One difference is the while (len > 0) loop when compressed is false which my solution was missing. Is this because Stream.Read() may return fewer bytes than requested?

    public static void Decompress(Stream input, Stream output, byte[][] dictionary)
        const int prefixMask = 128;
        int prefix = input.ReadByte();

        while (prefix != -1)
            bool isCompressed = (prefix & prefixMask) == prefixMask;

            if (isCompressed)
                int len = prefix ^ prefixMask;
                int index = input.ReadByte();
                byte[] buffer = dictionary[index];
                output.Write(buffer, 0, len);
                int len = prefix;
                byte[] buffer = new byte[len];
                input.Read(buffer, 0, len);
                output.Write(buffer, 0, len);

            prefix = input.ReadByte();

I wrote a tiny solution just for fun, but I wasn't aware that streams could read fewer than the number of bytes requested outside of the end of the stream.

[code] static void Decompress(Stream i, Stream o, byte[][] d) { for (dynamic b = i.ReadByte(), u = new byte[127]; b > 0; b = i.ReadByte()) if (b < 128) o.Write(u, 0, i.Read(u, 0, b)); else o.Write(d[i.ReadByte()], 0, b & 127); } [/code]

Ayende Rahien

Joseph, Yes, Read() may return less bytes than you wanted, for any number of reasons.

Ayende Rahien

Aaron, That code is ugly. And why on earth would you use dynamic here?


Ayende: yes it is extremely ugly. As I said, I wrote a tiny answer just for fun. Using dynamic was fewer characters than two declarations outside the for loop. (There's probably a much smaller solution, but I'm not a code golf enthusiast.)

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. Speaking (3):
    23 Sep 2015 - Build Stuff 2015 (Lithuania & Ukraine), Nov 18 - 24
  2. Production postmortem (11):
    22 Sep 2015 - The case of the Unicode Poo
  3. Technical observations from my wife (2):
    15 Sep 2015 - Disk speeds
  4. Find the bug (5):
    11 Sep 2015 - The concurrent memory buster
  5. Buffer allocation strategies (3):
    09 Sep 2015 - Bad usage patterns
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats