Ayende @ Rahien

Refunds available at head office

Finding chrome bugs

That one was annoying to figure out. Take a look at the following code:

static void Main(string[] args)
{
    var listener = new HttpListener();
    listener.Prefixes.Add("http://+:8080/");
    listener.Start();

    Console.WriteLine("Started");

    while(true)
    {
        var context = listener.GetContext();
        context.Response.Headers["Content-Encoding"] = "deflate";
        context.Response.ContentType = "application/json";
        using(var gzip = new DeflateStream(context.Response.OutputStream, CompressionMode.Compress))
        using(var writer = new StreamWriter(gzip, Encoding.UTF8))
        {
            writer.Write("{\"CountOfIndexes\":1,\"ApproximateTaskCount\":0,\"CountOfDocuments\":0}");
            writer.Flush();
            gzip.Flush();
        }
        context.Response.Close();
    }
}

FireFox and IE have no trouble using this. But here is how it looks on Chrome.

image

To make matter worse, pay attention to the conditions of the bug:

  • If I use Gzip instead of deflate, it works.
  • If I use "text/plain” instead of “application/json”, it works.
  • If I tunnel this through Fiddler, it works.

I hate stupid bugs like that.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Igal Tabachnik
08/22/2010 09:13 AM by
Igal Tabachnik

I had something like this happen when I saved a batch file using Notepad2, which defaulted to "UTF-8 with Signature". What you're seeing is the BOM (byte order mark)...

anton
08/22/2010 09:30 AM by
anton

seems like a UTF8 BOM, clean up the files that introduce this and you'll be good to go

13xforever
08/22/2010 09:58 AM by
13xforever

I agree with anton. You'd be better off with

new UTF8Encoding(false)
Rik Hemsley
08/22/2010 10:13 AM by
Rik Hemsley

I haven't checked the relevant RFCs, but as others have said, looks like a BOM where there shouldn't be one. As far as I am aware, BOMs are for files only.

Oh and this blog software still doesn't remember me properly. And no it's not a bug in my browser.

Itamar Syn-Hershko
08/22/2010 10:52 AM by
Itamar Syn-Hershko

@Rik, BOM identifies the encoding used for a stream of text. It is good to have whenever you are fetching a textual stream - from FS or not.

@Ayende, try adding a charset header. Apparently all other browsers detect the BOM even when it isn't provided, although Chrome is perfectly alright when not doing so.

Not providing a BOM is possible, but you may hit walls later on when this code is used with other encodings (UTF16/32 for CJK for example).

configurator
08/22/2010 11:56 AM by
configurator

Like everyone said it's the BOM. Chrome shows everything for encodings that don't have specific rules about not showing them, like text/plain; application/json is good for applications, not for showing the text. Why is this a problem? Does the json not get parsed properly? A charset header should fix it - chrome is probably using the wrong charset here.

tobi
08/22/2010 12:49 PM by
tobi

Ok, I am the 10th person to confirm: It is the BOM-header^^ Such bugs make me believe that it would be very beneficial for most standards to have a reference implementation. That way the standards body can detect mistakes by themselves and implementers hopefully get even such details right.

tobi
08/22/2010 02:09 PM by
tobi

Maybe I am the real Joel Spolsky in disguise of a nickname... You will never know for sure ;-)

humpbacked lout
08/22/2010 03:43 PM by
humpbacked lout

The best part is the title: how to elevate responsibility from your own lameness to someone else (Chrome this case). The more posts I read from this person, the more I see disguised lamer. Not only there was a lack of charset in declaration (which is violation of the standard) but also lack of (lame but working) solution with StreamWriter constructor that explicitly specifies no BOM. I think there also was no clue what BOM is...

Frank
08/22/2010 05:02 PM by
Frank

My goodness, what a bunch of crap commentary directed at Ayende.

Have a look at the RFC 4627 standard, third part about encoding.

http://www.faqs.org/rfcs/rfc4627.html

"JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets."

In other words, no charset that you need to specify in the headers. The BOM will specify the encoding.

Frank
08/22/2010 05:03 PM by
Frank

Hmmm, I'll have to correct myself about the BOM. The browser needs to check it based upon the null characters.

PandaWood
08/23/2010 04:53 AM by
PandaWood

I'd like to shake it up a bit and go with the argument that this is a Chrome bug (or unacceptably dumb behaviour).

Software (text editors) that can't handle the BOM are usually referred to as "Older Software" - from Wikipedia: "Older text editors may display the BOM as "" at the start of the document, even if the UTF-8 file contains only ASCII and would otherwise display correctly".

So In this case, the document would display correctly if Chrome were simply able to recognise the BOM, ignore it and read the remaining text. That doesn't sound like much to expect from software written sometime after 2000...?

So, I would ask: is there any real excuse for modern software to fail to interpret the BOM and therefore leave the page in the state shown above (ie completely broken)? Is it not an "obvious" requirement to be able to interpret BOM and no BOM in UTF-8?

Daniel Fernandes
08/23/2010 08:33 AM by
Daniel Fernandes

I experienced nasty bugs with Chrome in the past too.

I guess Chrome could do with more if statements ;)

Ayende Rahien
08/23/2010 09:40 AM by
Ayende Rahien

I love it how no one has actually run the code.

Guys, if I use the charset=utf-8, there is still a problem.

So yes, it is a bug.

James_2JS
08/23/2010 02:44 PM by
James_2JS

1) I ran the code, and did this with a fresh install of Chrome... by default the page encoding was set to Unicode (UTF-8). Choosing auto-detect and re-running removed the BOM

2) You can force the removal of the BOM by changing the code to this:

using (StreamWriter writer = new StreamWriter(gzip, new UTF8Encoding(false)))

Comments have been closed on this topic.