Production post mortem: The opener of files and destroyer of peace
A customer opened a support call about a recurring problem that they had in their system.
System.InvalidOperationException: Errno: 1224='The requested operation cannot be performed on a file with a user-mapped section open. ' (rc=0) - 'Failed to rvn_write_header 'headers.one', reason : FailOpenFile'. FailCode=FailOpenFile. at Sparrow.Server.Platform.PalHelper.ThrowLastError(FailCodes rc, Int32 lastError, String msg) in C:\Builds\RavenDB-Stable-4.2\42040\src\Sparrow.Server\Platform\PalHelper.cs:line 35 at Voron.StorageEnvironmentOptions.DirectoryStorageEnvironmentOptions.WriteHeader(String filename, FileHeader* header) in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\StorageEnvironmentOptions.cs:line 706 at Voron.Impl.FileHeaders.HeaderAccessor.Modify(ModifyHeaderAction modifyAction) in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\Impl\FileHeaders\HeaderAccessor.cs:line 200 at Voron.Impl.Journal.WriteAheadJournal.JournalApplicator.SyncOperation.UpdateDatabaseStateAfterSync() in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\Impl\Journal\WriteAheadJournal.cs:line 998 at Voron.Impl.Journal.WriteAheadJournal.JournalApplicator.LockTaskResponsible.RunTaskIfNotAlreadyRan() in C:\Builds\RavenDB-Stable-4.2\42040\src\Voron\Impl\Journal\WriteAheadJournal.cs:line 1192
And the code that failed was:
Based on that information, the issue is obvious. Someone is poking their nose into RavenDB’s files. It is important to understand that this will cause the I/O operation to fail. And since this is part of a critical path in managing the file, it will force the entire database to unload.
We asked the customer to make sure that all indexing services, anti virus programs and the like would be instructed to ignore the RavenDB directories and were assured that this is already the case.
That led us to a problem. If no one else is touching our files, is it possible that it is our own code that is causing the issue?
8:13:31.4865562 AM MsMpEng.exe 3112 CreateFileMapping D:\RavenDB\Server\RavenData\System\headers.one FILE LOCKED WITH WRITERS SyncType: SyncTypeCreateSection, PageProtection: PAGE_EXECUTE
We had our culprit, Windows Defender would mess around in our files, causing some I/O operations to fail. The customer added RavenDB’s directories to Windows Defender’s excluded list as well, and that should be the end of things.
Overall, this is a short issue, but it was only short because we were able to prove what other process was causing us issues.
why would any antivirus or indexer need a write lock?
I didn't write it, no clue.
That said, maybe they want to make sure that the examine a stable version (with no modifications) of the file to verify if this is a good / bad item?
Not unlike the COVID-19 measures, the troubles antivirus programs cause are way more visible than the throuble they prevent. They work so well, that I have had four times more issues with antivirus (all brands) than with actual viruses in our deployments.