reHow memory mapped files, filesystems and cloud storage works

time to read 4 min | 641 words

Kelly has an interesting post about memory mapped files and the cloud. This is in response to a comment on my post where I stated that we don’t reserve space up front in Voron because we  support cloud providers that charge per storage.

From Kelly’s post, I assume she thinks about running it herself on her own cloud instances, and that is what here pricing indicates. Indeed, if you want to get a 100GB cloud disk from pretty much anywhere, you’ll pay for the full 100GB disk from day 1. But that isn’t the scenario that I actually had in mind.

I was thinking about the cloud providers. Imagine that you want to go to RavenHQ, and get a db there. You sigh up for a 2 GB plan, and all if great. Except that on the very first write, we allocate a fixed 10 GB, and you start paying overage charges. This isn’t what you pay when you run on your own hardware. This is what you would have to deal with as a cloud DBaaS provider, and as a consumer of such a service.

That aside, let me deal a bit with the issues of memory mapped files & sparse files. I created 6 sparse files, each of them 128GB in size in my E drive.

As you can see, this is a 300GB disk, but I just “allocated” 640GB of space in it.


This also shows that there has been no reservation of space on the disk. In fact, it is entirely possible to create files that are entirely too big for the volume they are located on.


I did a lot of testing with mmap files & sparseness, and I came to the conclusion that you can’t trust it. You especially can’t trust it in a cloud scenario.

But why? Well, imagine the scenario where you need to use a new page, and the FS needs to allocate one for you. At this point, it need to find an available page. That might fail, let us imagine that this fails because of no free space, because that is easiest.

What happens then? Well, you aren’t access things via an API, so there isn’t an error code it can return, or an exception to be thrown.

In Windows, it will use Standard Exception Handler to throw the error. In Linux, that will be probably generate a SIVXXX error. Now, to make things interesting, this may not actually happen when you are writing to the newly reserved page, it may be deferred by the OS to a later point in time (or if you call msync / FlushViewOfFile).  At any rate, that means that at some point the OS is going to wake up and realize that it promised something it can’t deliver, and in that point (which, again, may be later than the point you actually wrote to that page) you are going to find yourself in a very interesting situation. I’ve actually tested that scenario, and it isn’t a good one form the point of view of reliability. You really don’t want to get there, because then all bets are off with regards to what happens to the data you wrote. And you can’t even do graceful error handling at that point, because you might be past the point.

Considering the fact that disk full is one of those things that you really need to be aware about, you can’t really trust this intersection of features.

More posts in "re" series:

  1. (02 Jun 2022) BonsaiDb performance update
  2. (14 Jan 2022) Are You Sure You Want to Use MMAP in Your Database Management System?
  3. (09 Dec 2021) Why IndexedDB is slow and what to use instead
  4. (23 Jun 2021) The performance regression odyssey
  5. (27 Oct 2020) Investigating query performance issue in RavenDB
  6. (27 Dec 2019) Writing a very fast cache service with millions of entries
  7. (26 Dec 2019) Why databases use ordered indexes but programming uses hash tables
  8. (12 Nov 2019) Document-Level Optimistic Concurrency in MongoDB
  9. (25 Oct 2019) RavenDB. Two years of pain and joy
  10. (19 Aug 2019) The Order of the JSON, AKA–irresponsible assumptions and blind spots
  11. (10 Oct 2017) Entity Framework Core performance tuning–Part III
  12. (09 Oct 2017) Different I/O Access Methods for Linux
  13. (06 Oct 2017) Entity Framework Core performance tuning–Part II
  14. (04 Oct 2017) Entity Framework Core performance tuning–part I
  15. (26 Apr 2017) Writing a Time Series Database from Scratch
  16. (28 Jul 2016) Why Uber Engineering Switched from Postgres to MySQL
  17. (15 Jun 2016) Why you can't be a good .NET developer
  18. (12 Nov 2013) Why You Should Never Use MongoDB
  19. (21 Aug 2013) How memory mapped files, filesystems and cloud storage works
  20. (15 Apr 2012) Kiip’s MongoDB’s experience
  21. (18 Oct 2010) Diverse.NET
  22. (10 Apr 2010) NoSQL, meh
  23. (30 Sep 2009) Are you smart enough to do without TDD
  24. (17 Aug 2008) MVC Storefront Part 19
  25. (24 Mar 2008) How to create fully encapsulated Domain Models
  26. (21 Feb 2008) Versioning Issues With Abstract Base Classes and Interfaces
  27. (18 Aug 2007) Saving to Blob
  28. (27 Jul 2007) SSIS - 15 Faults Rebuttal
  29. (29 May 2007) The OR/M Smackdown
  30. (06 Mar 2007) IoC and Average Programmers
  31. (19 Sep 2005) DLinq Mapping