Fast transaction logWindows
In my previous post, I have tested journal writing techniques on Linux, in this post, I want to do the same for Windows, and see what the impact of the various options are the system performance.
Windows has slightly different options than Linux. In particular, in Windows, the various flags and promises and very clear, and it is quite easy to figure out what is it that you are supposed to do.
We have tested the following scenarios
- Doing buffered writes (pretty useless for any journal file, which needs to be reliable, but good baseline metric).
- Doing buffered writes and calling FlushFileBuffers after each transaction (which is pretty common way to handle committing to disk in databases), and the equivalent of calling fsync.
- Using FILE_FLAG_WRITE_THROUGH flag and asking the kernel to make sure that after every write, everything will be flushed to disk. Note that the disk may or may not buffer things.
- Using FILE_FLAG_NO_BUFFERING flag to bypass the kernel’s caching and go directly to disk. This has special memory alignment considerations
- Using FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING flag to ensure that we don’t do any caching, and actually force the disk to do its work. On Windows, this is guaranteed to ask the disk to flush to persisted medium (but the disk can ignore this request).
Here is the code:
We have tested this on an AWS macine ( i2.2xlarge – 61 GB, 8 cores, 2x 800 GB SSD drive, 1GB /sec EBS), which was running Microsoft Windows Server 2012 R2 RTM 64-bits. The code was compiled for 64 bits with the default release configuration.
What we are doing is write 1 GB journal file, simulating 16 KB transactions and simulating 65,535 separate commits to disk. That is a lot of work that needs to be done.
First, again, I run it on the system drive, to compare how it behaves:
Method | Time (ms) | Write cost (ms) |
Buffered |
396 |
0.006 |
Buffered + FlushFileBuffers |
121,403 |
1.8 |
FILE_FLAG_WRITE_THROUGH |
58,376 |
0.89 |
FILE_FLAG_NO_BUFFERING |
56,162 |
0.85 |
FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING |
55,432 |
0.84 |
Remember, this is us running on the system disk, not on the SSD drive. Here are those numbers, which are much more interesting for us.
Method | Time (ms) | Write cost (ms) |
Buffered |
410 |
0.006 |
Buffered + FlushFileBuffers |
21,077 |
0.321 |
FILE_FLAG_WRITE_THROUGH |
10,029 |
0.153 |
FILE_FLAG_NO_BUFFERING |
8,491 |
0.129 |
FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING |
8,378 |
0.127 |
And those numbers are very significant. Unlike the system disk, where we basically get whatever spare cycles we have, in both Linux and Windows, the SSD disk provides really good performance. But even on identical machine, running nearly identical code, there are significant performance differences between them.
Let me draw it out to you:
Options |
Windows |
Linux |
Difference |
Buffered |
0.006 |
0.03 |
80% Win |
Buffered + fsync() / FlushFileBuffers() |
0.32 |
0.35 |
9% Win |
O_DSYNC / FILE_FLAG_WRITE_THROUGH |
0.153 |
0.29 |
48% Win |
O_DIRECT / FILE_FLAG_NO_BUFFERING |
0.129 |
0.14 |
8% Win |
O_DIRECT | O_DSYNC / FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING |
0.127 |
0.31 |
60% Win |
In pretty much all cases Windows has been able to out perform Linux on this specific scenario. In many cases by a significant margin. In particular, in the scenario that I actually really care about, we see 60% performance advantage to Windows.
One of the reasons for this blog post and the detailed code and scenario is the external verification of these numbers. I’ll love to know that I missed something that would make Linux speed comparable to Windows, because right now this is pretty miserable.
I do have a hunch about those numbers, though. SQL Server is a major business for Microsoft, so they have a lot of pull in the company. And SQL Server uses FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING internally to handle the transaction log it uses. Like quite a bit of other Win32 APIs (WriteGather, for example), it looks tailor made for database journaling. I’m guessing that this code path has been gone over multiple times over the years, trying to optimize SQL Server by smoothing anything in the way.
As a result, if you know what you are doing, you can get some really impressive numbers on Windows in this scenario. Oh, and just to quite the nitpickers:
Comments
its not a secret that windows perform faster than linux, and c# is faster than java, most of us already know this. what's interesting that even with the lower levels of kernel api, windows still better.
Have you tried to change the default linux disk I/O scheduler?
http://www.hecticgeek.com/2014/10/change-disk-i-0-scheduler-cfq-ubuntu-14-10/
@brian: in the comments at the url you pointed they clearly say that, on SSDs, the Deadline scheduler is still faster than CFQ.
This workload should not require any IO scheduling. There is no concurrency except in the buffered case which nobody cares about.
I always wondered why OS'es provide different performance for such simple workloads. It does not get much more simple than that. Small, aligned unbuffered IOs at no concurrency to a pre-allocated file.
Interestingly, Windows has it's flaws, too. For example, you get different speeds at different buffer sizes with buffered IO even when using big buffers (64KB+). This should not be. The read-ahead and write-behind should be perfect in those simple cases
This just landed on HackerNews, prepare yourselves
Did you run these tests on several instances? Sometimes ec2 instance performance can be variable or bad.
@Uri
C# faster than Java? LoL! The .net platform is nowhere near the JVM and its GC. Just look at the benchmarks - you'll find some on the internet. Also, comparing OS performance by logging... Noobs.
I would speculate that there is variability between Windows and Linux benchmarks due to running on shared hardware. I would LOVE to see these numbers again but using dedicated hardware. It wouldn't be to much trouble to rent a SoftLayer box for an hour and run the Windows benchmark then re-image the box as Linux and run that benchmark as well...
Brian, This is on SSD, and we tried that, it didn't have measurable impact
andrew, Yes, to eliminate variability on lemon ec2 instance, we run each test on three different machines (terminated and recreated new ones). There weren't any major differences between them.
Jonathan, I highly doubt that. We see similar results when running on physical hardware. I'm posting the results of EC2 instances here to ensure that they can be easily reproduced, but we are we have two identical boxes in the office that sit there are show Windows being much faster in this kind of thing.
There is a slew of filesystems for Linux, maybe this article is of interest for making an optimized choice: Filesystem benchmarks
Anders, None of which are testing the same thing, durable writes.
LOL I will never forget one of our developers optimizing with write caching enabled :)
The are however some reasons to use and to not use FlushFileBuffers.
Have you tested on a networked drive?
Greg, What would be the point in networked drive?
a) Depending on configuration, this isn't going to pass the right flags and actually persist to disk. b) The I/O cost is way to high, we are trying to build something that would work locally and take advantage of that. c) The cost effectiveness is similar, to a certain degree, bigger writes, fewer calls.
Comment preview