Ayende @ Rahien

It's a girl

How to get REALLY fast benchmarks

I was very excited when I tried this code, it produced amazing performance:

private static void LoadDataFor(string searchPattern)
{
    foreach (var file in Directory.GetFiles("Docs", searchPattern).OrderBy(x=>x))
    {
        var sp = Stopwatch.StartNew();
        var httpWebRequest = (HttpWebRequest)WebRequest.Create("http://localhost:8080/bulk_docs");
        httpWebRequest.Method = "POST";
        using(var requestStream = httpWebRequest.GetRequestStream())
        {
            var readAllBytes = File.ReadAllBytes(file);
            requestStream.Write(readAllBytes, 0, readAllBytes.Length);
        }
        Console.WriteLine("{0} - {1}", Path.GetFileName(file), sp.Elapsed);
    }
}

Can you figure out what the fatal flaw in this code is?

Comments

Marco De Sanctis
04/25/2010 09:17 AM by
Marco De Sanctis

Uhm... perhaps you're not flushing the request stream and thus you're only measuring how long does it take to write data onto the stream's internal buffer?

Demis Bellot
04/25/2010 09:28 AM by
Demis Bellot

What @Marco said, plus you're not checking the response so you don't know if the request was successful.

Marco De Sanctis
04/25/2010 09:41 AM by
Marco De Sanctis

Uh oh... nope sorry, made a mistake.. The request is within a using block, so you're actually flushing it! It's Sunday morning here in Italy, and my brain started in Safe Mode today, LOL ;-)

Philipp
04/25/2010 10:04 AM by
Philipp

Another try (my last comment was flagged as spam): It might be that the server doesn't get the request until you invoke GetResponse due to buffering.

Not related to the performance issue: Stopping the Stopwatch might be a good idea, or directly encapsulating the whole measurement. I did something similar here:

http://tinyurl.com/39op9j7

Sam
04/25/2010 10:12 AM by
Sam

If your trying to benchmark the overall performance of loading all the documents in the directory, then I'm guessing that the 'var sp = Stopwatch.StartNew();' should be outside the foreach loop.

Otherwise the elapsed time is going to get reset on each iteration.

Merill Fernando
04/25/2010 10:23 AM by
Merill Fernando

Marco was right, you need to call GetResponse to perform the upload. If not, nothing is going to be uploaded.

leonard
04/25/2010 10:43 AM by
leonard

Should this httpWebRequest be created in the loop?

Ken Egozi
04/25/2010 11:36 AM by
Ken Egozi

As people pointed out, GetResponse actually fire the HTTP request.

@Leonard - you should not reuse HttpWebRequest instances. Calling consecutive calls for GetResponse() will return a cached result of the first call, without re-issuing an HTTP request.

btw, I'd use WebClient as long as you do not need to mingle too much with the returned Response in term of headers, certificates etc.

manningj
04/25/2010 01:18 PM by
manningj

Haven't tested it much (would definitely only work for sync, for instance), but:

public class TimedWebClient : WebClient

{

    public TimeSpan LastResponseTime { get; private set; }

    protected override WebResponse GetWebResponse(WebRequest request)

    {

        var stopWatch = Stopwatch.StartNew();

        var response = base.GetWebResponse(request);

        this.LastResponseTime = stopWatch.Elapsed;


        return response;

    }

}

consume like:

    static void Main(string[] args)

    {

        var webClient = new TimedWebClient();

        var url = "

http://www.google.com";

        webClient.DownloadString(url);

        Console.WriteLine("Downloading {0} took {1} ms", url, webClient.LastResponseTime.TotalMilliseconds);

        Console.ReadLine();

    }
Shane
04/25/2010 03:10 PM by
Shane

You initialized your variable in the for loop so technically you are measuring only the last request and none of the file access or any of the other requests.

var sp = Stopwatch.StartNew();

foreach (var file in Directory.GetFiles("Docs", searchPattern).OrderBy(x=>x))

{

    //var sp = Stopwatch.StartNew();
Aaron Carlson
04/25/2010 05:34 PM by
Aaron Carlson

You never hit Stop on the stop watch.

Magesh
04/26/2010 06:13 AM by
Magesh

The http request is not submitted until the "GetResponse()" method of the HttpWebRequest class is invoked. The code in the blog post writes the data to the local stream and discards it without initiating the actual web request.

Justin
04/26/2010 06:36 AM by
Justin

The statement "Directory.GetFiles("Docs", searchPattern).OrderBy(x=>x)"

By default, the GetFiles would usually return the files within the alphabetical order. Also, "GetFiles" is a blocking operation waiting for the entire array to be built up.

Therefore, the Stopwatch would only start measuring after the more expensive operations began. This would also only measure the very last cycle in the foreach loop... not the overall response time of the ENTIRE method.

Manu
04/26/2010 07:51 AM by
Manu

You must call the Stream.Close method to close the stream and release the connection for reuse. Failure to close the stream causes your application to run out of connections.

Leyu Sisay
04/26/2010 08:02 AM by
Leyu Sisay

@Justin I think what you said above is correct, but when using Directory.GetFiles the order of the returned file names is not guaranteed, so using OrderBy is necessary.

meisinger
04/26/2010 04:56 PM by
meisinger

you have to either consume or close the ResponseStream

the very least that you have to do is call GetResponse().Close()

the "fatal flaw" being that you are not closing the underlying HTTP connection

at some point in time you are going to run out of available connections and be blocked waiting for an HTTP connection to timeout

testing and even "production" running code will work given a small number of files

Steve Py
04/27/2010 07:30 AM by
Steve Py

I'd say it's the missing requestStream.Close within the loop. My guess would be that the Close operation would block the thread until the upload was complete so without it, (aside from the risk of running out of connections) your timer wouldn't reflect the actual upload performance. Hence, a false report of smashing performance.

Gavin
04/27/2010 08:48 AM by
Gavin

I would use DirectoryInfo instead of Directory to get the file listing. It does less security checks when combined with FileInfo.OpenText it should speed the process when processing a significant number of files ...

Magesh
04/28/2010 09:19 AM by
Magesh

Would you care to post the solution?

Ayende Rahien
04/28/2010 10:08 AM by
Ayende Rahien

Magesh,

KAE figured that out :-)

Comments have been closed on this topic.