Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

@

Posts: 5,947 | Comments: 44,539

filter by tags archive

How to get REALLY fast benchmarks


I was very excited when I tried this code, it produced amazing performance:

private static void LoadDataFor(string searchPattern)
{
    foreach (var file in Directory.GetFiles("Docs", searchPattern).OrderBy(x=>x))
    {
        var sp = Stopwatch.StartNew();
        var httpWebRequest = (HttpWebRequest)WebRequest.Create("http://localhost:8080/bulk_docs");
        httpWebRequest.Method = "POST";
        using(var requestStream = httpWebRequest.GetRequestStream())
        {
            var readAllBytes = File.ReadAllBytes(file);
            requestStream.Write(readAllBytes, 0, readAllBytes.Length);
        }
        Console.WriteLine("{0} - {1}", Path.GetFileName(file), sp.Elapsed);
    }
}

Can you figure out what the fatal flaw in this code is?


Comments

Marco De Sanctis

Uhm... perhaps you're not flushing the request stream and thus you're only measuring how long does it take to write data onto the stream's internal buffer?

Demis Bellot

What @Marco said, plus you're not checking the response so you don't know if the request was successful.

Marco De Sanctis

Uh oh... nope sorry, made a mistake.. The request is within a using block, so you're actually flushing it! It's Sunday morning here in Italy, and my brain started in Safe Mode today, LOL ;-)

Philipp

Another try (my last comment was flagged as spam): It might be that the server doesn't get the request until you invoke GetResponse due to buffering.

Not related to the performance issue: Stopping the Stopwatch might be a good idea, or directly encapsulating the whole measurement. I did something similar here:

http://tinyurl.com/39op9j7

Sam
Sam

If your trying to benchmark the overall performance of loading all the documents in the directory, then I'm guessing that the 'var sp = Stopwatch.StartNew();' should be outside the foreach loop.

Otherwise the elapsed time is going to get reset on each iteration.

Merill Fernando

Marco was right, you need to call GetResponse to perform the upload. If not, nothing is going to be uploaded.

leonard

Should this httpWebRequest be created in the loop?

Ken Egozi

As people pointed out, GetResponse actually fire the HTTP request.

@Leonard - you should not reuse HttpWebRequest instances. Calling consecutive calls for GetResponse() will return a cached result of the first call, without re-issuing an HTTP request.

btw, I'd use WebClient as long as you do not need to mingle too much with the returned Response in term of headers, certificates etc.

manningj

Haven't tested it much (would definitely only work for sync, for instance), but:

public class TimedWebClient : WebClient

{

    public TimeSpan LastResponseTime { get; private set; }

    protected override WebResponse GetWebResponse(WebRequest request)

    {

        var stopWatch = Stopwatch.StartNew();

        var response = base.GetWebResponse(request);

        this.LastResponseTime = stopWatch.Elapsed;


        return response;

    }

}

consume like:

    static void Main(string[] args)

    {

        var webClient = new TimedWebClient();

        var url = "

http://www.google.com";

        webClient.DownloadString(url);

        Console.WriteLine("Downloading {0} took {1} ms", url, webClient.LastResponseTime.TotalMilliseconds);

        Console.ReadLine();

    }
Shane

You initialized your variable in the for loop so technically you are measuring only the last request and none of the file access or any of the other requests.

var sp = Stopwatch.StartNew();

foreach (var file in Directory.GetFiles("Docs", searchPattern).OrderBy(x=>x))

{

    //var sp = Stopwatch.StartNew();
Aaron Carlson

You never hit Stop on the stop watch.

Magesh

The http request is not submitted until the "GetResponse()" method of the HttpWebRequest class is invoked. The code in the blog post writes the data to the local stream and discards it without initiating the actual web request.

Justin

The statement "Directory.GetFiles("Docs", searchPattern).OrderBy(x=>x)"

By default, the GetFiles would usually return the files within the alphabetical order. Also, "GetFiles" is a blocking operation waiting for the entire array to be built up.

Therefore, the Stopwatch would only start measuring after the more expensive operations began. This would also only measure the very last cycle in the foreach loop... not the overall response time of the ENTIRE method.

Manu

You must call the Stream.Close method to close the stream and release the connection for reuse. Failure to close the stream causes your application to run out of connections.

Leyu Sisay

@Justin I think what you said above is correct, but when using Directory.GetFiles the order of the returned file names is not guaranteed, so using OrderBy is necessary.

meisinger

you have to either consume or close the ResponseStream

the very least that you have to do is call GetResponse().Close()

the "fatal flaw" being that you are not closing the underlying HTTP connection

at some point in time you are going to run out of available connections and be blocked waiting for an HTTP connection to timeout

testing and even "production" running code will work given a small number of files

Steve Py

I'd say it's the missing requestStream.Close within the loop. My guess would be that the Close operation would block the thread until the upload was complete so without it, (aside from the risk of running out of connections) your timer wouldn't reflect the actual upload performance. Hence, a false report of smashing performance.

Gavin

I would use DirectoryInfo instead of Directory to get the file listing. It does less security checks when combined with FileInfo.OpenText it should speed the process when processing a significant number of files ...

Magesh

Would you care to post the solution?

Ayende Rahien

Magesh,

KAE figured that out :-)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB Sharding (2):
    21 May 2015 - Adding a new shard to an existing cluster, the easy way
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats