Ayende @ Rahien

Refunds available at head office

So, how does this works on Linux?

For the past few days I have been talking about our findings with regards to creating ACID storage solution. And mostly I’ve been focusing on how it works with Windows, using Windows specific terms and APIs.

The problem is that I am not sure if those are still relevant if we talk about Linux. I know that fsync perf is still an issue (if only because both Win & Lin are running on the same hardware). But would the same solutions apply?

For example, the nearest that I can find to FILE_FLAG_NO_BUFFERING is O_DIRECT and FILE_FLAG_WRITE_THROUGH appears to be similar to  O_SYNC. But I am not sure if they are actually behaving in the same fashion.

Any ideas? Anyone has something like Process Monitor for Linux and can look at the actual behavior of industry grade databases commit behavior?

From my exploring, it appears that PostgreSQL is using fdatasync() as the default approach, but it can use O_DIRECT and O_DSYNC as well, so that is promising. But I would like to have someone who actually know Linux intimately tell me if I am even in the right direction.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Greg young
11/25/2013 10:17 AM by
Greg young

It's a bit more involved. Which file system are you using? Ext3?

Other things: do you have noatime nodiratime set? The file systems in Linux are far more varied/configurable than windows. In general though o-direct is what you are looking for.

Cheers,

Greg

Howard Chu
11/25/2013 02:15 PM by
Howard Chu

Use strace to see what system calls are issued on linux.

OmariO
11/25/2013 02:22 PM by
OmariO

It may be interesting for you how Sybase does it

http://www.sybase.com/content/1043413/DirectIO-082906-wp.pdf

Or
11/25/2013 06:31 PM by
Or

fio is a very nice tool for read/write benchmarks, and has many 'engines', which are just methods of writing to the disk (sync, libaio, mmap, ...).

http://git.kernel.dk/?p=fio.git;a=tree

Ants Aasma
11/27/2013 12:11 PM by
Ants Aasma

Check out the pgtestfsync in PostgreSQL contrib modules:

http://www.postgresql.org/docs/devel/static/pgtestfsync.html

In general if you can arrange for the data to be written with single write() calls use ODSYNC, add in ODIRECT if you know that your writes are aligned and you won't need to read the data afterwards (e.g. in PostgreSQL replication and WAL archiving receive WAL by reading it back from the OS).

Beware that with kernel <2.6.33 or glibc <2.12 you OSYNC actually means ODSYNC. You need explicit fsyncs for metadata operations like creating new files.

Comments have been closed on this topic.