A different sort of cross platform bug
During the move of RavenDB to Linux we had to deal with the usual share of cross platform bugs. The common ones are things like file system case sensitivity, concatenating paths with \ separators, drive letters, etc. The usual stuff.
Note that the hairy stuff (different threading model, different I/O semantics, etc) I’m not counting, we already had dealt with them before.
The real problems had to do with the file systems, in particular, ext3 doesn’t support fallocate, which we kind of wanna use to allocate our disk space in advance (which can give the file system information so it can allocate all of that together). But that was pretty straightforward. ENOSUP is clear enough for me, and we could just use ext4, which works.
But this bug was special. To start with, it snuck in and lain there dormant for quite a while, and it was a true puzzle. Put simply, a certain database would work just fine as long as we didn’t restart it. On restart, some of its indexes would be in an error state.
It took a while to figure out what was going on. At first, we thought that there was some I/O error, but the issue turned out to be much simpler. We assumed that we would be reading indexes in order, in fact, our code output the index directories in such a way that they would be lexically sorted. That worked quite well, and had the nice benefit that on NTFS, we would always read the indexes in order.
But on Linux, there is no guaranteed order, and on one particular machine, we indeed hit the out of order issue. And in that case, we had an error raised because we were trying to create an index whose id was too low. Perfectly understandable, when you realize what is going on, and trivial to fix, but quite hard to figure out, because we kept assuming that we did something horrible to the I/O system.
Comments
I love happy endings to horror stories like this. For those of us venturing out of our Windows shell (all puns intended), this is very important information.
Comment preview