Implementing a file pager in ZigWhat do we need?
A file pager is a component in database systems that is responsible for reading and writing pages (typically 8KB blocks) from the file system. The pager is responsible for the I/O operations and is crucial for the overall performance of the system. Ideally, it should manage details such as caching pages in memory, reduce I/O costs and continuously optimize the overall behavior of the storage.
That can be a pretty big chunk of a storage system, and it can have a significant impact on the way the storage system behaves. Here is the most basic version that I can think of:
The idea is that whenever you need a particular page, you’ll call it using tryGet() which will return the document if it is already in memory, but it will not block. You can call getBlocking() to force the current thread to wait for the page to be in memory. That allows the calling code to perform some really nice optimizations.
Once we got the page, the Pager is charged with keeping it in memory until we will release it. Note that I’m talking about a Page, but that might actually contain multiple sequential pages. The release() call tells the Pager that the memory is no longer in active use, the Pager may decide to do something about that.
Finally, we have the write() method, which will write the data from the in-memory page to storage, and the sync() method, which will ensure that all previous writes are durable to disk.
There aren’t that many moving pieces, right? Not in particular that we don’t have the notion of transactions here, this is lower level than that. This API has the following properties:
- The same page will always be represented in memory by the same location. However, if we release and get the page again, it may move.
- The methods tryGet(), getBlocking() and release() have no locking or threading limits. You may call them in any context and the Pager will deal with any concurrency internally.
- The write() and sync() calls, on the other hand, require synchronization by the client. There can be no concurrency between the two.
With that in place, we can build quite a sophisticated storage system. But we’ll focus on how the pager works for now.
There are a bunch of ways to implement this, so I’ll have at least a couple of posts on the topic. How would you approach implementing this?
More posts in "Implementing a file pager in Zig" series:
- (24 Jan 2022) Pages, buffers and metadata, oh my!
- (21 Jan 2022) Write behind implementation
- (19 Jan 2022) Write behind policies
- (18 Jan 2022) Write durability and concurrency
- (17 Jan 2022) Writing data
- (12 Jan 2022) Reclaiming memory
- (11 Jan 2022) Reading from the disk
- (10 Jan 2022) Managing the list of files
- (05 Jan 2022) Reading & Writing from the disk
- (04 Jan 2022) Rethinking my approach
- (28 Dec 2021) Managing chunk metadata
- (27 Dec 2021) Overall design
- (24 Dec 2021) Using mmap
- (23 Dec 2021) What do we need?
The semantics for tryGet is a bit unclear, since from a developers point of view it could mean either to only get if already loaded, or to prefetch. IMHO it would be better if that was two different functions. Another thing is that Page struct is not actually a page but any number of pages. It would make much more sense if it used a scatter gather like struct instead.