Ayende @ Rahien

Refunds available at head office

Riddle me this, why won’t this code work?

The following code will not result in the expected output:

using(var mem = new MemoryStream())
{
    using(var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen:true))
    {
        gzip.WriteByte(1);
        gzip.WriteByte(2);
        gzip.WriteByte(1);
        gzip.Flush();
    }
    
    using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true))
    {
        gzip.WriteByte(2);
        gzip.WriteByte(1);
        gzip.WriteByte(2);
        gzip.Flush();
    }

    mem.Position = 0;

    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }


    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }
}

Why? And what can be done to solve this?

Comments

Simon
12/27/2012 10:16 AM by
Simon

It's because the memory stream is left at the wrong position after the first GZipStream reads.

Either you can set it to the right position manually by assigning it after the first using block to a variable and re-setting it after the first set of reads, or you can do the writes in one using block and the reads in one using block, or use separate memory streams for each read/write pair.

rodster
12/27/2012 10:22 AM by
rodster

The Flush() does not nothing? You should dispose the stream to flush..

rodster
12/27/2012 10:25 AM by
rodster

Never mind, they are being disposed :-)

Ali Kheyrollahi
12/27/2012 10:27 AM by
Ali Kheyrollahi

AFAIK, GZipStream on the does its bit:

1) at dispose time for compression 2) at creation for decompression

Considering many compression algorithms do not quite work with writing small buffers (have to work on bigger chunks), this is natural.

Solution is to use stream's CopyTo to copy to another stream and read from that.

Erik
12/27/2012 10:37 AM by
Erik

The problem is that DeflateStream (which is used by GZipStream) uses a buffer when reading. It tries to read up to 8 kB of the source stream, regardless of the contents, after which the contents of the buffer are parsed. Might be that the easiest solution is to prepend the compressed length of each block, and then feeding only that block to GZipStream.

Rémi BOURGAREL
12/27/2012 10:37 AM by
Rémi BOURGAREL

your Flush are useless http://msdn.microsoft.com/fr-fr/library/system.io.compression.gzipstream.flush.aspx but the dispose should do it. it seems that the first readByte on your GziptStreamm is reading the full memory stream (its position is 250). But I can't figure out where between the Inflater class or the OutputWindow class this is done. Still searching...

PJL
12/27/2012 10:54 AM by
PJL

You can make the code work by setting the underlying MemoryStreams Position at position 23 between the two decompression blocks. Cfr what Erik says it uses internelly a buffer. So you will need to store the memorystream positionsafter every compress block.

kerrubin
12/27/2012 11:00 AM by
kerrubin
        using (var mem = new MemoryStream())
        {
            using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true))
            {
                // mem => Length: 0, Position: 0, Capacity: 0
                gzip.WriteByte(1); // mem => Length: 110, Position: 110, Capacity: 256
                gzip.WriteByte(2); // mem => Length: 112, Position: 112, Capacity: 256
                gzip.WriteByte(1); // mem => Length: 114, Position: 114, Capacity: 256
            }

            // mem => Length: 129, Position: 129, Capacity: 256
            using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true))
            {
                // mem => Length: 129, Position: 129, Capacity: 256
                gzip.WriteByte(2); // mem => Length: 239, Position: 239, Capacity: 256
                gzip.WriteByte(1); // mem => Length: 241, Position: 241, Capacity: 256
                gzip.WriteByte(2); // mem => Length: 243, Position: 243, Capacity: 256
            }

            // mem => Length: 258, Position: 258, Capacity: 512
            mem.Position = 0;
            // mem => Length: 258, Position: 0, Capacity: 512

            using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
            {
                Console.WriteLine(gzip.ReadByte()); // mem => Length: 258, Position: 258, Capacity: 512, has read the whole MemoryStream
                Console.WriteLine(gzip.ReadByte());
                Console.WriteLine(gzip.ReadByte());
            }

            mem.Position = 129; // size of the MemoryStream after the first compression

            using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
            {
                Console.WriteLine(gzip.ReadByte());
                Console.WriteLine(gzip.ReadByte());
                Console.WriteLine(gzip.ReadByte());
            }
        }
rvin100
12/27/2012 11:11 AM by
rvin100

As PJL says, storing the MemoryStream position after each using block seems to be the only solution to make this code works...

payam yazdkhasti
12/27/2012 09:23 PM by
payam yazdkhasti

The problem is how GzipStream reads it's base stream. kerrubin answer is correct but is not good in real application. I prefer this approch: first create GzipDecorator like this: public class GzipStreamDecorator : Stream {

    private readonly GZipStream _inputStream;
    private long _position = 0;
    public GzipStreamDecorator(GZipStream inputStream)
    {
        _inputStream = inputStream;
    }
    private Stream BaseStream
    {
        get
        {
            return _inputStream.BaseStream;
        }
    }
    public override bool CanRead
    {
        get { return _inputStream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return _inputStream.CanSeek; }
    }

    public override bool CanWrite
    {
        get { return _inputStream.CanWrite; }
    }

    public override void Flush()
    {
        _inputStream.Flush();
    }

    public override long Length
    {
        get { return _inputStream.Length; }
    }

    public override long Position
    {
        get
        {
            return _position;
        }
        set
        {
            _position = value;
        }
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        if (_position != BaseStream.Position)
        {
            BaseStream.Position = _position;
        }
        var bytesRead = _inputStream.Read(buffer, offset, count);
        _position += bytesRead;
        return bytesRead;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return _inputStream.Seek(offset, origin);
    }

    public override void SetLength(long value)
    {
        _inputStream.SetLength(value);
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        _inputStream.Write(buffer, offset, count);
    }
}

then change your code like this: using (var mem = new MemoryStream()) { using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true)) { gzip.WriteByte(1); gzip.WriteByte(2); gzip.WriteByte(1); gzip.Flush(); }

            using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true))
            {
                gzip.WriteByte(2);
                gzip.WriteByte(1);
                gzip.WriteByte(2);
                gzip.Flush();
            }


            using (var gzip = new GzipStreamDecorator(new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true)))
            {
                Console.WriteLine(gzip.ReadByte());
                Console.WriteLine(gzip.ReadByte());
                Console.WriteLine(gzip.ReadByte());
            }


            using (var gzip = new GzipStreamDecorator(new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true)))
            {
                Console.WriteLine(gzip.ReadByte());
                Console.WriteLine(gzip.ReadByte());
                Console.WriteLine(gzip.ReadByte());
            }
        }
Jer
12/28/2012 12:38 AM by
Jer

Ooh, a puzzle! Intentionally jumping to the bottom to not taint my brain, so apols if this is already answered:

The GZipStream, if I recall correctly, consumes the whole underlying stream in decompress mode...if not that, it definitely won't leave your position where you'd expect it to. A "fix" would be to keep track of where you were after each batch of writes, albeit annoying:

using(var mem = new MemoryStream())
{
    using(var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen:true))
    {
        gzip.WriteByte(1);
        gzip.WriteByte(2);
        gzip.WriteByte(1);
    }
    var first = mem.Position;

    using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true))
    {
        gzip.WriteByte(2);
        gzip.WriteByte(1);
        gzip.WriteByte(2);
    }
    var second = mem.Position;

    mem.Position = 0;       
    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }
    mem.Position = first;

    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }
}
payam yazdkhasti
12/28/2012 11:13 AM by
payam yazdkhasti

Sorry, correct code is: public class GzipStreamDecorator : Stream {

    private readonly GZipStream _inputStream;
    private int _count = 0;
    private long _basePosition = 0;
    public GzipStreamDecorator(GZipStream inputStream)
    {
        _inputStream = inputStream;
        _basePosition = _inputStream.BaseStream.Position;
    }
    private Stream BaseStream
    {
        get
        {
            return _inputStream.BaseStream;
        }
    }
    public override bool CanRead
    {
        get { return _inputStream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return _inputStream.CanSeek; }
    }

    public override bool CanWrite
    {
        get { return _inputStream.CanWrite; }
    }

    public override void Flush()
    {
        _inputStream.Flush();
    }

    public override long Length
    {
        get { return _inputStream.Length; }
    }

    public override long Position
    {
        get
        {
            return BaseStream.Position;
        }
        set
        {
            BaseStream.Position = value;
            _basePosition = value;
        }
    }

    private long GetCurrentPosition()
    {
        return _basePosition + ((_count * 8) - 1);
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var bytesRead = _inputStream.Read(buffer, offset, count);
        _count += bytesRead;
        BaseStream.Position = GetCurrentPosition();
        return bytesRead;
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return _inputStream.Seek(offset, origin);
    }

    public override void SetLength(long value)
    {
        _inputStream.SetLength(value);
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        _inputStream.Write(buffer, offset, count);
        _count += count;
    }
}
wal
12/29/2012 05:36 AM by
wal

Hi Ayende, are you hiring and is this the interview question? :)

could you please give us some feedback on the answers and perhaps some more context around this issue. (personally I like PJL's answer)

Brandon
01/09/2013 04:48 AM by
Brandon

The disposal of the GZipStreams is disposing of the MemoryStream as well even though we want to append to it later. There is no need for all of the inner using statements.

Comments have been closed on this topic.