Ayende @ Rahien

It's a girl

Code Critique: Transactional File

Update: To clarify, the only purpose of this code is to give atomic writes, it either complete successfully or it is rolled back.

Okay, this looks like it is too simple to work, thoughts?

/// <summary>
/// Allow to overwrite a file in a transactional manner.
/// That is, either we completely succeed or completely fail in writing to the file.
/// Read will correct previous failed transaction if a previous write has failed.
/// Assumptions:
///  * You want to always rewrite the file, rathar than edit it.
///  * The underlying file system has at least transactional metadata.
///  * Thread safety is provided by the calling code.
/// 
/// Write implementation:
///  - Rename file to "[file].old_copy" (overwrite if needed
///  - Create new file stream with the file name and pass it to the client
///  - After client is done, flush & close the stream
///  - Delete old file
/// 
/// Read implementation:
///  - If old file exists, remove new file and rename old file
/// 
/// </summary>
public static class TransactionalFile
{
	public static void Read(string name, Action<Stream> action)
	{
		if (File.Exists(name + ".old_copy"))
		{
			File.Delete(name);
			File.Move(name + ".old_copy", name);
		}

		using (
			var stream = new FileStream(name, FileMode.Open, FileAccess.Read, FileShare.None, 0x1000,
										FileOptions.SequentialScan))
		{
			action(stream);
		}
	}

	public static void Write(string name, Action<Stream> action)
	{
		File.Delete(name + ".old_copy");

		if (File.Exists(name))
			File.Move(name, name + ".old_copy");

		using (
			var stream = new FileStream(name, FileMode.CreateNew, FileAccess.Write, FileShare.None, 0x1000,
			                            FileOptions.WriteThrough | FileOptions.SequentialScan))
		{
			action(stream);
			stream.Flush();
		}

		File.Delete(name + ".old_copy");
	}
}

Comments

Justin Chase
07/27/2008 09:50 PM by
Justin Chase

There seem to be a few potential problems here:

1.) This doesn't seem to be very thread safe, which is probably a very important part of being transactional. I suppose this is something that should be handled higher up than these methods since you can't actually control how other processes might be accessing these files but if you can guarantee that your application is the only one accessing them then a little locking could go a long way.

2.) There doesn't seem to be any actual rollback going on. I mean if you always use these two methods to read and write the files it should work. But suppose a write fails then it will leaves files in the ".old_copy" state then other processes won't be able to find the real file. What if no read ever goes on again... I'm guessing this is the only process that accesses these files though? If that's true this is probably a minor concern.

3.) What if you try Read or Write to the same file multiple times in the same transaction?

4.) Performance... deleting and moving files around a lot must get expensive. I'm not sure what the solution for this would be though.

I guess basically what I'm saying is that it seems less like it's 'transactional' and more like it's a Read/Write with a primitive failsafe... which is better than nothing but doesn't quite meet my expectations for being transactional.

Ayende Rahien
07/27/2008 10:03 PM by
Ayende Rahien

Justin,

1) Note the huge comment block? It is explicitly not thread / process safe.

2) The assumption is that you always read / write using those methods.

3) This is not a supported operation.

4) Moving / deleting files is actually cheap, all you do is make a metadata update.

Agreed about the transactional part being misleading somewhat

Jason Olson
07/27/2008 10:12 PM by
Jason Olson

Not to be knit-picky, but in the sense of "ACID" transactions like you get with databases, this is far from being a "transactional file".

First of all, there's no "scope" to the transaction. So there's no way to start a transaction, and make many writes within the transaction. Also, a good transactional file should not only have a scope, but should also have a resource manager that can interop with a transaction coordinate (let's say, to be able to do file operations within the same transaction as a SQL transaction for example).

What if the action writes some stuff to the file and then throws an exception (but some of it has already been written to disk)? How is it "rolled back"? What happens if the machine crashes half-way through the final flush? Now you have no consistency with the file. What about isolation?

There are definitely problems around the using of a temporary file as well.

Programming a transaction resource to work around the success cases is easy. The hard part of transactional programming is guaranteeing that no matter what fails, consistency and durability is guaranteed (up-to, and including OS crashes at the most inopportune moments). This is also why a lot of companies writing transactional systems using native code, often being drivers as well.

Of course, if you are in Vista or WS08, you can use the new Transactional functionality built directly into Transactional NTFS. That doesn't help you with Windows XP or Windows Server 2003 unfortunately as this functionality relies on kernel changes :(.

El Guapo
07/28/2008 12:22 AM by
El Guapo

Sorry but "transactional" implies some level of isolation. There is no isolation offered by this code. It's fine, for what it does, just don't call it that.

Ayende Rahien
07/28/2008 12:28 AM by
Ayende Rahien

Actually, in this case, what I am trying to guarantee is atomicity.

Transaction implies ACID

Ayende Rahien
07/28/2008 12:33 AM by
Ayende Rahien

Jason,

If I didn't wanted nit picking, I wouldn't have posted that.

The purpose of this code is to provide an atomic update ( I updated the post accordingly).

That is, either the write complete successfully, or it is rolled back.

I agree about the complexity of transactional systems.

All of that said, all the failure scenarios that you pointed out are handled by the code above.

Jeremy Wiebe
07/28/2008 12:41 AM by
Jeremy Wiebe

This type of solution looks like it would work well in FTP situations. I worked on a project where we were monitoring for files to arrive in an FTP folder. For sufficiently large files we'd start pulling a new file down before the client pushing up the file had finished uploading the file. Our solution was very similar to this in that the client uploaded to a temporary-named file and once uploaded renamed it to the target filename.

Sammy
07/28/2008 07:32 AM by
Sammy

Alright, maybe this is not directly related to this post but I've been waiting for a chance to jump at you with this question for so long, so here it goes!

I'm desperately looking for a library/algorithm/method/whatever way to store files on disk in a hierarchical manner, in such a way where I can retrieve them using my own API. A simple example would be: store this video file, video file later (by a back end service) gets encoded into other formats, then I would ask the store provider to get me the file with ID x but with, not it's original format, but format y instead. Do you know of such a thing? FWIW I did Google a lot with no luck :(

Ayende Rahien
07/28/2008 08:18 AM by
Ayende Rahien

File.Write("video1.avi")

File.Read("video1.wmv") ??

Ramon Smits
07/28/2008 09:34 AM by
Ramon Smits

Why no exception handling? For example, writing and then failing because no disk space for example or application shutdown should then immediately restore the previous version?

This code can be used for some simple scenarios but what about versioning? The code is not thread safe (as noted) but the current code lacks information about the last write.

Ayende Rahien
07/28/2008 09:50 AM by
Ayende Rahien

If you failed because of disk space, next read will fix it to the previous state.

I am not following the issue with regards to versioning

Stephen Denne
07/28/2008 12:15 PM by
Stephen Denne

If you fail while writing, you better not try writing the same file again till after you've called Read, otherwise you lose your backup.

Are you in control of the names being passed in? So as to avoid passing in directory names, or illegal names, or names already ending in .old_copy?

Ayende Rahien
07/28/2008 12:20 PM by
Ayende Rahien

Stephen,

Great! thanks for catching that issue.

Yes, the names are assumed to be valid

Bill Pierce
07/28/2008 03:04 PM by
Bill Pierce

I know it locks you in to Vista/Server 2008 but have you considered TxF?

http://msdn.microsoft.com/en-us/library/aa363764(VS.85).aspx

alberto
07/28/2008 05:26 PM by
alberto

I guess extracting methods ((WasPreviousWriteComplete(), RestorePreviousVersion(), BackupVersion()) would help eliminate the clutter and would have made easier to spot Stephen's bug.

Oran
08/01/2008 05:48 AM by
Oran

I built one of these (thread-safe journaled data store) and was bitten by the fact that FileStream.Flush doesn't actually flush to disk, it only flushes to the OS. If you want consistency through power outages, you need to supplement it with:

[DllImport("kernel32", SetLastError=true)]

private static extern bool FlushFileBuffers(IntPtr handle);

and perhaps with hashing to detect corruption / incomplete writes.

Ayende Rahien
08/01/2008 10:07 AM by
Ayende Rahien

Oran,

You likely didn't specify WriteThrough in the flush options.

Oran
08/01/2008 04:12 PM by
Oran

This was written in .NET 1.1 before FileOptions.WriteThrough existed, but that is good to know, I overlooked it in your example.

Comments have been closed on this topic.