Ayende @ Rahien

Ayende @ Rahienhttp://ayende.comAyende @ RahienCopyright (C) Ayende Rahien 2004 - 2021 (c) 202660zihotki commented on Building data stores – Append OnlyAlso one of the good points of append only data store is that you don't have to lock the data store or shut down it to do a backup. Backups a really fast and cheap. Thanks a lot for the series and for pointing me to very interesting topics. http://ayende.com/4542/building-data-stores-append-only#comment10http://ayende.com/4542/building-data-stores-append-only#comment10Tue, 13 Jul 2010 11:37:08 GMTAyende Rahien commented on Building data stores – Append OnlyOmariO, SSDs means that random reads are _much_ faster, but random writes are still more expensive than sequential writes. http://ayende.com/4542/building-data-stores-append-only#comment9http://ayende.com/4542/building-data-stores-append-only#comment9Wed, 23 Jun 2010 10:29:26 GMTOmariO commented on Building data stores – Append OnlyHas SSDs changed anything? http://ayende.com/4542/building-data-stores-append-only#comment8http://ayende.com/4542/building-data-stores-append-only#comment8Tue, 22 Jun 2010 23:56:41 GMTAjai Shankar commented on Building data stores – Append OnlyCouchDB has great documentation, and just linking to the free online book: [http://books.couchdb.org/relax/appendix/btrees](http://books.couchdb.org/relax/appendix/btrees) [http://books.couchdb.org/relax/](http://books.couchdb.org/relax/) Just helps to appreciate how raven db brings document + search so very nicely packaged for .NET developers... Ajai http://ayende.com/4542/building-data-stores-append-only#comment7http://ayende.com/4542/building-data-stores-append-only#comment7Sun, 20 Jun 2010 18:41:54 GMTAyende Rahien commented on Building data stores – Append OnlyFrans, In Windows you can call FlushFileBuffers to ensure that the data is on the disk. Without that facility, you couldn't build databases on Windows. Yes, in a mutli application system, it is entirely possible that two different applications would queue different requests to different parts of the disk at the same time. This reduce the benefit of seekless writes, but doesn't eliminate them. Deletes are done by removing the data from the index and writing the new metadata. Yes, that wastes space, and append only data stores usually handle that by running a compaction process every now and then. http://ayende.com/4542/building-data-stores-append-only#comment6http://ayende.com/4542/building-data-stores-append-only#comment6Sat, 19 Jun 2010 10:06:45 GMTFrans Bouma commented on Building data stores – Append OnlyIn Windows, Disk IO is abstracted to a level where you'll never be able to say "this byte is now physically stored on disk at this very moment", you'll leave that to windows, to the HDD low level subsystem, to the HDD hardware and command queue. So this comes down to bundling commands to the HDD in such a way that the HDD diskhead steps as less as possible due to YOUR actions. You can never anticipate on another process getting the CPU and doing a disk step as well, and you shouldn't, that's what the low level subsystems are for. So you bundle as much commands as possible in such a way that diskstepping is avoided as much as possible _inside your bundle_, and that's what you can do, nothing more. What Andrew suggests seems to me impossible due to the nature of how windows handles IO. About the article: what if you delete elements from the data? rdbms's use their own filesystem inside files, using pages (which typically are the size of a disk block or smaller, but fit in a diskblock without splitting them up) and have to have a form of re-use to avoid fragmentation. Appending to a file ultimately runs you into the situation where deletes fragment the file a lot. Unless your db is insert only, which is what you aren't striving for IMHO. http://ayende.com/4542/building-data-stores-append-only#comment5http://ayende.com/4542/building-data-stores-append-only#comment5Sat, 19 Jun 2010 09:39:48 GMTAyende Rahien commented on Building data stores – Append OnlyUriel, Writes to a single sector are supposed to be atomic. The problem is that HD no longer use sectors anymore. I find it safest not to assume that and assume that writes aren't really atomic at the HD level until fsync is called. http://ayende.com/4542/building-data-stores-append-only#comment4http://ayende.com/4542/building-data-stores-append-only#comment4Fri, 18 Jun 2010 21:40:18 GMTUriel Katz commented on Building data stores – Append Onlyisn`t a single write/flush atomic? what i mean if you take all your record in a buffer then write it to a file and flush,is it written as a whole or not? if yes you can have a sequential id(those can be made thread safe with CAS operations) in each record of the append only file,and when you read backwards you check if you got the biggest id. http://ayende.com/4542/building-data-stores-append-only#comment3http://ayende.com/4542/building-data-stores-append-only#comment3Fri, 18 Jun 2010 17:52:30 GMTAyende Rahien commented on Building data stores – Append OnlyAndrew, Yeah, that is a nice optimization, but I consider the downside to be pretty bad. http://ayende.com/4542/building-data-stores-append-only#comment2http://ayende.com/4542/building-data-stores-append-only#comment2Fri, 18 Jun 2010 16:28:39 GMTAndrew commented on Building data stores – Append OnlyGood to see you posting again. I'm not sure if its appropriate for your application, but there is an optimization you can use to possibly do bulk writes instead of doing them one at a time, to improve write performance/concurrency. When a transaction calls Commit(), instead of taking the only writer and writing that transaction to disk and then releasing the writer (causing other transactions to repeat this step in sequence in a blocking fashion, waiting for IO to complete), you could do async or bulk writes. This technique has been used before on InnoDB database by having a Prepare() method simply reserve the amount of space needed in the data file by advancing the 'next write position' pointer by the size of the data to be written (so the next transaction can start after our reserved space). Then the actual Commit() method does the writing in one hit (eg several transactions are written to disk at once). Of course this means you lose transaction safety when Commit() supposedly returns after the client requests it be committed, it still may be in memory pending a bulk write. http://ayende.com/4542/building-data-stores-append-only#comment1http://ayende.com/4542/building-data-stores-append-only#comment1Fri, 18 Jun 2010 16:20:17 GMT