Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,124 | Comments: 45,475

filter by tags archive

Not all bytes weight exactly 8 bits

time to read 2 min | 264 words

Or, pay attention to how you write to the disk. Here is a simple example:

static void Main(string[] args)
{
	var count = 10000000;

	Stopwatch stopwatch = Stopwatch.StartNew();

	using (var stream = CreateWriter())
	using (var bw = new BinaryWriter(stream))
	{
		for (var i = 0; i < count; i++)
		{
			bw.Write(i);
		}
		bw.Flush();
	}
	stopwatch.Stop();
	Console.WriteLine("Binary Writer: " + stopwatch.ElapsedMilliseconds);

	stopwatch = Stopwatch.StartNew();

	using (var stream = CreateWriter())
	{
		for (var i = 0; i < count; i++)
		{
			var bytes = BitConverter.GetBytes(i);
			stream.Write(bytes, 0, 4);
		}
		stream.Flush();
	}
	stopwatch.Stop();
	Console.WriteLine("BitConverter: " + stopwatch.ElapsedMilliseconds);


	stopwatch = Stopwatch.StartNew();

	using (var stream = CreateWriter())
	using (var ms = new MemoryStream())
	{
		for (var i = 0; i < count; i++)
		{
			var bytes = BitConverter.GetBytes(i);
			ms.Write(bytes, 0, 4);

		}
		var array = ms.ToArray();
		stream.Write(array, 0, array.Length);
		stream.Flush();
	}
	stopwatch.Stop();
	Console.WriteLine("Memory stream: " + stopwatch.ElapsedMilliseconds);


	stopwatch = Stopwatch.StartNew(); 
using (var stream = CreateWriter()) { byte[] buffer = new byte[sizeof(int) * count]; int index = 0; for (var i = 0; i < count; i++) { buffer[index++] = (byte)i; buffer[index++] = (byte)(i >> 8); buffer[index++] = (byte)(i >> 16); buffer[index++] = (byte)(i >> 24); } stream.Write(buffer, 0, buffer.Length); stream.Flush(); } stopwatch.Stop(); Console.WriteLine("Single buffer: " + stopwatch.ElapsedMilliseconds); } private static FileStream CreateWriter() { return new FileStream(Path.GetTempFileName(), FileMode.Create, FileAccess.Write, FileShare.Read, 0x10000, FileOptions.SequentialScan | FileOptions.WriteThrough); }

And the results:

Binary Writer: 1877
BitConverter: 1985
Memory stream: 1702
Single buffer: 1022


Comments

Bil Simser

Is there a line missing in the code when you wrote it to the blog? There's no call to StartNew() in the last chunk.

Ayende Rahien

BIl,

Yeah, sense. You found a bug :-)

I Updated the post accordingly

Rik Hemsley

Similar results here, though MemoryStream and 'Single buffer' seem proportionally faster for some reason (tried many iterations, same results):

Binary writer: 1581

BitConverter: 1608

MemoryStream: 1016

Single buffer: 709

With the target stream being a MemoryStream rather than FileStream:

Binary writer: 362

BitConverter: 479

MemoryStream: 683

Single buffer: 349

Davy Landman

I find it rather logical that when you create your own buffering system with knowledge of the data size, it will be faster than the default buffering in a framework (which aims for overall average performance)

I looked at the source of the FileStream class, and it indeed holds an internal buffer of 4096 bytes. When write is called, the data is copied to the buffer and when the buffer is full, it's flushed tot the actual file handle.

So using the binary writer and bitconverter you have 10000000 copies to a internal buffer and 19532 separate flushes.

While the single buffer avoids the buffering of the FileStream class and therefore doesn't copy the memory but writes it directly to the file handle.

I suspect the memory stream uses a different buffering mechanism, but that's for someone else to look at?

Alessandro Riolo

It is not related, but a byte is not ever 8 bits. There are (mostly were, i.e. the PDP-10) many architectures where a byte has a different weight.

Davy Landman

Seeing the fact that the buffering of the filestream doesn't slow us down with the single buffer method, maybe it's possible we could convert the int array faster to an byte array...

I created a faster variant, but its ugly (Unmanaged code) and I wouldn't use it unless this part was really a bottleneck.

stopwatch = Stopwatch.StartNew();

        using (var stream = CreateWriter())

        {

            byte[] buffer = new byte[sizeof(int) * count];

            int[] data = new int[count];

            for (int i = 0; i < count; i++)

            {

                data[i] = i;

            }

            IntPtr tempBuffer = Marshal.AllocHGlobal(buffer.Length);

            Marshal.Copy(data, 0, tempBuffer, count);

            Marshal.Copy(tempBuffer, buffer, 0, buffer.Length);

            Marshal.FreeHGlobal(tempBuffer);


            stream.Write(buffer, 0, buffer.Length);

            stream.Flush();

        }

        stopwatch.Stop();


        Console.WriteLine("Single buffer (Marshalling): " + stopwatch.ElapsedMilliseconds);
Davy Landman

Correction..

So using the binary writer and bitconverter you have 10000000 copies to a internal buffer and 19532 separate flushes.

So using the binary writer or the bitconverter solution you have 10000000 copies to the internal buffer of the filestream and 153 separate flushes to the real file.

(Looked over the buffer paramater)

tcmaster

It seems you really like the "var" thing.

I'm really a stupid guy, and I like to be able to figure out meaning of code by reading 1st, then debugging. But "var" does well to prevent this

Ayende Rahien

tcmaster,

var is always initialized, just look at what the value is.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.5 whirl wind tour: You want all the data, you can’t handle all the data - one day from now
  2. The design of RavenDB 4.0: Making Lucene reliable - about one day from now
  3. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - 3 days from now
  4. The design of RavenDB 4.0: Physically segregating collections - 4 days from now
  5. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 5 days from now

And 14 more posts are pending...

There are posts all the way to May 30, 2016

RECENT SERIES

  1. RavenDB 3.5 whirl wind tour (14):
    29 Apr 2016 - A large cluster goes into a bar and order N^2 drinks
  2. The design of RavenDB 4.0 (13):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats