Code Critique: Transactional File

time to read 2 min | 333 words

Update: To clarify, the only purpose of this code is to give atomic writes, it either complete successfully or it is rolled back.

Okay, this looks like it is too simple to work, thoughts?

/// <summary>
/// Allow to overwrite a file in a transactional manner.
/// That is, either we completely succeed or completely fail in writing to the file.
/// Read will correct previous failed transaction if a previous write has failed.
/// Assumptions:
///  * You want to always rewrite the file, rathar than edit it.
///  * The underlying file system has at least transactional metadata.
///  * Thread safety is provided by the calling code.
/// 
/// Write implementation:
///  - Rename file to "[file].old_copy" (overwrite if needed
///  - Create new file stream with the file name and pass it to the client
///  - After client is done, flush & close the stream
///  - Delete old file
/// 
/// Read implementation:
///  - If old file exists, remove new file and rename old file
/// 
/// </summary>
public static class TransactionalFile
{
	public static void Read(string name, Action<Stream> action)
	{
		if (File.Exists(name + ".old_copy"))
		{
			File.Delete(name);
			File.Move(name + ".old_copy", name);
		}

		using (
			var stream = new FileStream(name, FileMode.Open, FileAccess.Read, FileShare.None, 0x1000,
										FileOptions.SequentialScan))
		{
			action(stream);
		}
	}

	public static void Write(string name, Action<Stream> action)
	{
		File.Delete(name + ".old_copy");

		if (File.Exists(name))
			File.Move(name, name + ".old_copy");

		using (
			var stream = new FileStream(name, FileMode.CreateNew, FileAccess.Write, FileShare.None, 0x1000,
			                            FileOptions.WriteThrough | FileOptions.SequentialScan))
		{
			action(stream);
			stream.Flush();
		}

		File.Delete(name + ".old_copy");
	}
}

Tweet Share Share 18 comments

Tags:

Reviews

Comments

27 Jul 2008
21:50 PM

Justin Chase

There seem to be a few potential problems here:

1.) This doesn't seem to be very thread safe, which is probably a very important part of being transactional. I suppose this is something that should be handled higher up than these methods since you can't actually control how other processes might be accessing these files but if you can guarantee that your application is the only one accessing them then a little locking could go a long way.

2.) There doesn't seem to be any actual rollback going on. I mean if you always use these two methods to read and write the files it should work. But suppose a write fails then it will leaves files in the ".old_copy" state then other processes won't be able to find the real file. What if no read ever goes on again... I'm guessing this is the only process that accesses these files though? If that's true this is probably a minor concern.

3.) What if you try Read or Write to the same file multiple times in the same transaction?

4.) Performance... deleting and moving files around a lot must get expensive. I'm not sure what the solution for this would be though.

I guess basically what I'm saying is that it seems less like it's 'transactional' and more like it's a Read/Write with a primitive failsafe... which is better than nothing but doesn't quite meet my expectations for being transactional.

27 Jul 2008
22:03 PM

Ayende Rahien

Justin,

1) Note the huge comment block? It is explicitly not thread / process safe.

2) The assumption is that you always read / write using those methods.

3) This is not a supported operation.

4) Moving / deleting files is actually cheap, all you do is make a metadata update.

Agreed about the transactional part being misleading somewhat

27 Jul 2008
22:12 PM

Jason Olson

Not to be knit-picky, but in the sense of "ACID" transactions like you get with databases, this is far from being a "transactional file".

First of all, there's no "scope" to the transaction. So there's no way to start a transaction, and make many writes within the transaction. Also, a good transactional file should not only have a scope, but should also have a resource manager that can interop with a transaction coordinate (let's say, to be able to do file operations within the same transaction as a SQL transaction for example).

What if the action writes some stuff to the file and then throws an exception (but some of it has already been written to disk)? How is it "rolled back"? What happens if the machine crashes half-way through the final flush? Now you have no consistency with the file. What about isolation?

There are definitely problems around the using of a temporary file as well.

Programming a transaction resource to work around the success cases is easy. The hard part of transactional programming is guaranteeing that no matter what fails, consistency and durability is guaranteed (up-to, and including OS crashes at the most inopportune moments). This is also why a lot of companies writing transactional systems using native code, often being drivers as well.

Of course, if you are in Vista or WS08, you can use the new Transactional functionality built directly into Transactional NTFS. That doesn't help you with Windows XP or Windows Server 2003 unfortunately as this functionality relies on kernel changes :(.

28 Jul 2008
00:22 AM

El Guapo

Sorry but "transactional" implies some level of isolation. There is no isolation offered by this code. It's fine, for what it does, just don't call it that.

28 Jul 2008
00:28 AM

Ayende Rahien

Actually, in this case, what I am trying to guarantee is atomicity.

Transaction implies ACID

28 Jul 2008
00:33 AM

Ayende Rahien

Jason,

If I didn't wanted nit picking, I wouldn't have posted that.

The purpose of this code is to provide an atomic update ( I updated the post accordingly).

That is, either the write complete successfully, or it is rolled back.

I agree about the complexity of transactional systems.

All of that said, all the failure scenarios that you pointed out are handled by the code above.

28 Jul 2008
00:41 AM

Jeremy Wiebe

This type of solution looks like it would work well in FTP situations. I worked on a project where we were monitoring for files to arrive in an FTP folder. For sufficiently large files we'd start pulling a new file down before the client pushing up the file had finished uploading the file. Our solution was very similar to this in that the client uploaded to a temporary-named file and once uploaded renamed it to the target filename.

28 Jul 2008
07:32 AM

Sammy

Alright, maybe this is not directly related to this post but I've been waiting for a chance to jump at you with this question for so long, so here it goes!

I'm desperately looking for a library/algorithm/method/whatever way to store files on disk in a hierarchical manner, in such a way where I can retrieve them using my own API. A simple example would be: store this video file, video file later (by a back end service) gets encoded into other formats, then I would ask the store provider to get me the file with ID x but with, not it's original format, but format y instead. Do you know of such a thing? FWIW I did Google a lot with no luck :(

28 Jul 2008
08:18 AM

Ayende Rahien

File.Write("video1.avi")

File.Read("video1.wmv") ??

28 Jul 2008
09:34 AM

Ramon Smits

Why no exception handling? For example, writing and then failing because no disk space for example or application shutdown should then immediately restore the previous version?

This code can be used for some simple scenarios but what about versioning? The code is not thread safe (as noted) but the current code lacks information about the last write.

28 Jul 2008
09:50 AM

Ayende Rahien

If you failed because of disk space, next read will fix it to the previous state.

I am not following the issue with regards to versioning

28 Jul 2008
12:15 PM

Stephen Denne

If you fail while writing, you better not try writing the same file again till after you've called Read, otherwise you lose your backup.

Are you in control of the names being passed in? So as to avoid passing in directory names, or illegal names, or names already ending in .old_copy?

28 Jul 2008
12:20 PM

Ayende Rahien

Stephen,

Great! thanks for catching that issue.

Yes, the names are assumed to be valid

28 Jul 2008
15:04 PM

Bill Pierce

I know it locks you in to Vista/Server 2008 but have you considered TxF?

http://msdn.microsoft.com/en-us/library/aa363764(VS.85).aspx

28 Jul 2008
17:26 PM

alberto

I guess extracting methods ((WasPreviousWriteComplete(), RestorePreviousVersion(), BackupVersion()) would help eliminate the clutter and would have made easier to spot Stephen's bug.

01 Aug 2008
05:48 AM

Oran

I built one of these (thread-safe journaled data store) and was bitten by the fact that FileStream.Flush doesn't actually flush to disk, it only flushes to the OS. If you want consistency through power outages, you need to supplement it with:

[DllImport("kernel32", SetLastError=true)]

private static extern bool FlushFileBuffers(IntPtr handle);

and perhaps with hashing to detect corruption / incomplete writes.

01 Aug 2008
10:07 AM

Ayende Rahien

Oran,

You likely didn't specify WriteThrough in the flush options.

01 Aug 2008
16:12 PM

Oran

This was written in .NET 1.1 before FileOptions.WriteThrough existed, but that is good to know, I overlooked it in your example.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB