﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Torvin commented on Designing RavenFS</title><description>I personally think
  
GET /path/to/file/metadata
  
is better than
  
GET /metadata/path/to/file
  
because '/path/to/file' can't be both a file and a folder on most filesystems, so we won't confuse 'metadata' with a file. And it looks nicer that way :3
</description><link>http://ayende.com/4828/designing-ravenfs#comment60</link><guid>http://ayende.com/4828/designing-ravenfs#comment60</guid><pubDate>Thu, 05 May 2011 16:25:19 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Aaron,
  
Yes, that will be supported.
</description><link>http://ayende.com/4828/designing-ravenfs#comment59</link><guid>http://ayende.com/4828/designing-ravenfs#comment59</guid><pubDate>Wed, 04 May 2011 13:13:01 GMT</pubDate></item><item><title>Aaron Olson commented on Designing RavenFS</title><description>What kind of streaming scenarios will RavenFS support? In particular, could one client be streaming a file into RavenFS while another client reads the same file simultaneously?
  
  
Thanks!
</description><link>http://ayende.com/4828/designing-ravenfs#comment58</link><guid>http://ayende.com/4828/designing-ravenfs#comment58</guid><pubDate>Wed, 04 May 2011 12:57:30 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Jalchr,
  
Hibernating Torrents:
  
[ayende.com/.../...-torrent-server-for-Windows.aspx](http://ayende.com/Blog/archive/2008/04/08/Hibernating-Torrent-A-simple-torrent-server-for-Windows.aspx)</description><link>http://ayende.com/4828/designing-ravenfs#comment57</link><guid>http://ayende.com/4828/designing-ravenfs#comment57</guid><pubDate>Wed, 04 May 2011 09:00:03 GMT</pubDate></item><item><title>jalchr commented on Designing RavenFS</title><description>MonoTorrent (C#)
  
A cross platform open source .NET Framework based BitTorrent Client written in C#. MonoTorrent is a cross platform and open source implementation of the BitTorrent protocol. It supports many advanced features such as Encryption, DHT, Peer Exchange, Web Seeding and Magnet Links. Frontends: Curses TUI, Gtk GUI, WinForms GUI
  
  
Homepage: 
[projects.qnetp.net/projects/show/monotorrent](http://projects.qnetp.net/projects/show/monotorrent)  
Source: 
[anonsvn.mono-project.com/viewvc/trunk/bitsharp/](http://anonsvn.mono-project.com/viewvc/trunk/bitsharp/)  
WinForms GUI homepage: 
[http://code.google.com/p/monotorrent/](http://code.google.com/p/monotorrent/)  
Blog: 
[http://monotorrent.blogspot.com/](http://monotorrent.blogspot.com/)  
  
Monsoon (C#)
  
Monsoon Project is a GTK+ BitTorrent client based on C# and MonoTorrent.
</description><link>http://ayende.com/4828/designing-ravenfs#comment56</link><guid>http://ayende.com/4828/designing-ravenfs#comment56</guid><pubDate>Wed, 04 May 2011 08:58:52 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Jalchr,
  
I am looking at those as well, but the way we are expecting to do things are quite different.
  
For example, eMule / BitTorrent assumes that the file doesn't exists in the other end, while I assume that it most probably does
</description><link>http://ayende.com/4828/designing-ravenfs#comment55</link><guid>http://ayende.com/4828/designing-ravenfs#comment55</guid><pubDate>Wed, 04 May 2011 08:57:13 GMT</pubDate></item><item><title>jalchr commented on Designing RavenFS</title><description>I like the idea, but I do think it is already implemented in peer-to-peer applications like emule and bittorrent ... 
  
  
[http://sourceforge.net/projects/emule/](http://sourceforge.net/projects/emule/)  
  
They utilize http and they handle GBs of files ... in very unreliable medium of communication.
  
I'm sure you can make it better ... 
</description><link>http://ayende.com/4828/designing-ravenfs#comment54</link><guid>http://ayende.com/4828/designing-ravenfs#comment54</guid><pubDate>Wed, 04 May 2011 08:39:32 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Francisco,
  
Random access will be supported., yes.
</description><link>http://ayende.com/4828/designing-ravenfs#comment53</link><guid>http://ayende.com/4828/designing-ravenfs#comment53</guid><pubDate>Tue, 03 May 2011 04:47:55 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Jason,
  
That is pretty much my point.
</description><link>http://ayende.com/4828/designing-ravenfs#comment52</link><guid>http://ayende.com/4828/designing-ravenfs#comment52</guid><pubDate>Tue, 03 May 2011 04:47:30 GMT</pubDate></item><item><title>Francisco A. Lozano commented on Designing RavenFS</title><description>What about random access? both read/write, either for modifications or for appends... I've been looking for a solution to this without much success
</description><link>http://ayende.com/4828/designing-ravenfs#comment51</link><guid>http://ayende.com/4828/designing-ravenfs#comment51</guid><pubDate>Mon, 02 May 2011 23:39:35 GMT</pubDate></item><item><title>Jason Hurdlow commented on Designing RavenFS</title><description>Video files are actually a bad example. When editing video, you don't typically edit the original video files at all, but rather build a set of edits &amp; effects (saved in a scene file of some sort) that are then rendered to a new file. So once you had a copy of the large video files they could be read-only without issue. No video editor in their right mind would overwrite or modify their original footage files. The rendered output might change though, and that could be applicable here.
</description><link>http://ayende.com/4828/designing-ravenfs#comment50</link><guid>http://ayende.com/4828/designing-ravenfs#comment50</guid><pubDate>Mon, 02 May 2011 23:32:10 GMT</pubDate></item><item><title>Jimmy Shimizu commented on Designing RavenFS</title><description>Interesting project. I have been looking into similiar systems myself and just wanted to tip you about 
[http://www.xtreemfs.org/](http://www.xtreemfs.org/) which might incorporate some of the features you are aiming for. 
</description><link>http://ayende.com/4828/designing-ravenfs#comment49</link><guid>http://ayende.com/4828/designing-ravenfs#comment49</guid><pubDate>Mon, 02 May 2011 11:52:26 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Joshka,
  
Then I might have got it wrong, because my understanding was that the sender sent the hashes per each fixed output, and the reciever than sent back the matches.
</description><link>http://ayende.com/4828/designing-ravenfs#comment48</link><guid>http://ayende.com/4828/designing-ravenfs#comment48</guid><pubDate>Mon, 02 May 2011 08:25:47 GMT</pubDate></item><item><title>Joshka commented on Designing RavenFS</title><description>Ayende,
  
I'm missing something here. What I wrote above is a pictorial example of the algorithm on p52. In the article, 'A' is the sender and 'B' is the receiver.
</description><link>http://ayende.com/4828/designing-ravenfs#comment47</link><guid>http://ayende.com/4828/designing-ravenfs#comment47</guid><pubDate>Mon, 02 May 2011 08:22:24 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Joshka,
  
Yes, that is why I referred to this as reversed rsync, since this is exact opposite of how it works.
  
In rsync, it is the reciever who does all of the hashing and matching
</description><link>http://ayende.com/4828/designing-ravenfs#comment46</link><guid>http://ayende.com/4828/designing-ravenfs#comment46</guid><pubDate>Mon, 02 May 2011 08:06:29 GMT</pubDate></item><item><title>Joshka commented on Designing RavenFS</title><description>Here's a simplified example with a block size of 4 (obviously the overhead of the checksums is way too high).
  
  
receiver: ABCD EFGH IJKL
  
Example receiver rolling block checksums:
  
  sum(ABCD)=11 -&gt; Receiver sends 'sum(block 1) is 11'
  
  sum(EFGH)=22 -&gt; Receiver sends 'sum(block 2) is 22'
  
  sum(IJKL)=33 -&gt; Receiver sends 'sum(block 3) is 33'
  
  
sender: 0ABCDEFGHIJKL
  
sender rolling byte checksums:
  
  sum(0ABC) = 00 -&gt; sender sends '0'
  
  sum(ABCD) = 11 -&gt; sender sends ' block 1 match found'
  
  sum(EFGH) = 22 -&gt; sender sends ' block 2 match found'
  
  sum(IJKL) = 33 -&gt; sender sends ' block 3 match found'
  
  
This demonstrates that the receiver only needs to calculate / store / know the block running checksum, not a byte running checksum.
  
The checksums can be calculated during upload time - i.e. a point when you'll have a copy of the file streaming through your machines memory on its way to persistent storage. You'd need to do this anyway for your proposal.
</description><link>http://ayende.com/4828/designing-ravenfs#comment45</link><guid>http://ayende.com/4828/designing-ravenfs#comment45</guid><pubDate>Mon, 02 May 2011 07:58:23 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Joshka,
  
The difference is where this is happening, client side / server side.
  
I am actually thinking of doing a reverse mode, where we ask the server for what it has, then do the change calculation on the client side.
  
Still thinking on this.
</description><link>http://ayende.com/4828/designing-ravenfs#comment44</link><guid>http://ayende.com/4828/designing-ravenfs#comment44</guid><pubDate>Mon, 02 May 2011 06:18:46 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Joshka,
  
No, because you need to do a per byte rolling hash to check for the insertion case.
</description><link>http://ayende.com/4828/designing-ravenfs#comment43</link><guid>http://ayende.com/4828/designing-ravenfs#comment43</guid><pubDate>Mon, 02 May 2011 06:17:06 GMT</pubDate></item><item><title>joshka commented on Designing RavenFS</title><description>... becomes invalid for al blocks after the insertion point...
  
  
I don't understand the point about having to read the entire file in this context. In your original proposed solution, an insertion would require a full recalculation of all future block hashes. A store operation would require transmission of all blocks and hence the effective read of the entire file.
</description><link>http://ayende.com/4828/designing-ravenfs#comment42</link><guid>http://ayende.com/4828/designing-ravenfs#comment42</guid><pubDate>Mon, 02 May 2011 02:09:49 GMT</pubDate></item><item><title>joshka commented on Designing RavenFS</title><description>Re caching, I believe it's only necessary for the receiver to store the checksum per block, not per byte. The cache of those bytes only becomes invalid when byte insertion occurs.
</description><link>http://ayende.com/4828/designing-ravenfs#comment41</link><guid>http://ayende.com/4828/designing-ravenfs#comment41</guid><pubDate>Sun, 01 May 2011 22:39:47 GMT</pubDate></item><item><title>Markus commented on Designing RavenFS</title><description>Ayende,
  
Have you evaluated SynchronEX? I haven't looked at what kind of algorithm it uses, but wanted to mention it just in case you haven't yet stumbled upon it. 
[http://www.xellsoft.com/SynchronEX.html](http://www.xellsoft.com/SynchronEX.html)  
  
Thanks for a great blog!
  
</description><link>http://ayende.com/4828/designing-ravenfs#comment40</link><guid>http://ayende.com/4828/designing-ravenfs#comment40</guid><pubDate>Sun, 01 May 2011 21:43:37 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Joshka,
  
In order to calculate the rolling hash, you have to read the entire file.
  
If you want to cache the rolling checksum, you would have to store a 32 bits for every bytes, which is.. unadvisable.
  
Without doing this on a byte boundary, you are vulnerable to missing the "one byte changed at beginning of file".
  
  
I read the rsync papers, they are facinating, but they also detail something with very high cost to solve the general problem in very large files.
  
  
Show me a solution for a single byte addition in a 500 MB file. I admit that I find the rsync approach both insightful and brilliant, but I don't really see a way to make it work for my case.
  
I would be very happy to be proven wrong, mistaken and stupid.
</description><link>http://ayende.com/4828/designing-ravenfs#comment39</link><guid>http://ayende.com/4828/designing-ravenfs#comment39</guid><pubDate>Sun, 01 May 2011 17:43:53 GMT</pubDate></item><item><title>joshka commented on Designing RavenFS</title><description>Ayende, the rsync paper details algorithms to calculate the best block size. It doesn't require a full file read at destination on every upload, only the blocks where there is a change. Calculating the checksums can be done once, either client or server side. It also uses two "hashes" - md5 and a rolling checksum and handles the 1 byte change problem. In effect the Delta of that 1 byte addition situation is a transfered as that byte and 25k acks. Your method is 1byte plus 100mb xfer. The paper is worth reading regardless of whether you use the same method. It also mentions potential HTTP implementations and related tools e.g rsdiff
  
</description><link>http://ayende.com/4828/designing-ravenfs#comment38</link><guid>http://ayende.com/4828/designing-ravenfs#comment38</guid><pubDate>Sun, 01 May 2011 17:29:46 GMT</pubDate></item><item><title>Thomas Krause commented on Designing RavenFS</title><description>I would suggest looking at BitTorrent a bit more as a protocol option...
  
  
It already does a lot of what you want to achieve uses http and it supports multiple peers synchronizing with each other very efficiently.
  
  
There is certainly a business model packaging the protocol into an application which is more suitable to synchronize servers etc.
</description><link>http://ayende.com/4828/designing-ravenfs#comment37</link><guid>http://ayende.com/4828/designing-ravenfs#comment37</guid><pubDate>Sun, 01 May 2011 15:11:21 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Jeff,
  
1) That would be handled, sure. We are thinking about large files, but small ones would be supported.
  
2) I assume you meant writes followed by reads, in which case, the local RavenFS is fully transactional. 
  
3) Replication is more an issue of configuration &amp; reliability, than anything else. Assuming that you have configured RavenFS to replicate to a remote node, it will do so as long as it is able.
  
4) Failover to a remote repository is possible, I guess. We will probably not handle that by default, because the cost of accessing very large files remotely can be very big. But I'll make sure to abstract that to a strategy that you can override.
  
5) Catch up - already handled.
</description><link>http://ayende.com/4828/designing-ravenfs#comment36</link><guid>http://ayende.com/4828/designing-ravenfs#comment36</guid><pubDate>Sun, 01 May 2011 14:46:58 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Configurator,
  
That would actually be pretty hard to do. How do you detect this change? 
  
How do you manage the next change?
  
As I said, it is actually a lot of work to do, and would result in pages that are 1 byte long eventually.
  
  
As for video editing software, that really depend on the software and the format. As you noted, saves are painful, so most software moves to a way to append mode / fixed size.
</description><link>http://ayende.com/4828/designing-ravenfs#comment35</link><guid>http://ayende.com/4828/designing-ravenfs#comment35</guid><pubDate>Sun, 01 May 2011 14:43:48 GMT</pubDate></item><item><title>Jeff Lewis  commented on Designing RavenFS</title><description>I've been looking for something like this for some time.  I've said that what I want is S3 that I can host locally.  My requirements are slightly different though, so I'll list them in hopes that some might get included:
  
  
-most of the files I deal with are less that a Meg, but can be much larger depending on the attachments.  
  
-must allow immediate consistency.  Reads often immediately follow writes and must be up to date.  When unable to sync between repositories, consistency between repositories can/must be relaxed so that system remains active
  
-multiple copies: I'd like to have 2 copies of files locally and one, or more, thousands of miles away
  
-no single point of failure: if local repos is down, writes and reads go to remote repositories
  
-catch-up: if the connection to remote repositories goes down, when restored, changes should be replicated 
</description><link>http://ayende.com/4828/designing-ravenfs#comment34</link><guid>http://ayende.com/4828/designing-ravenfs#comment34</guid><pubDate>Sun, 01 May 2011 14:27:39 GMT</pubDate></item><item><title>configurator commented on Designing RavenFS</title><description>Ayende, for different page sizes consider this example.
  
(Maximal) page size = 4.
  
File on server: [1,2,3,5], [6,7,8,9], [10,11,12,13]
  
We add a byte, 4.
  
File on server will now be: [1,2,3], [4,5], [6,7,8,9], [10,11,12,13]
  
Note that only the changed page needs to be sent - the last two pages are untouched. However, the pages need to be able to have different sizes.
  
  
Regarding your data-in-big-files-doesn't-move hypothesis: I've used video editing software more than once where a change (e.g. cut an ad out of the middle/beginning) caused the entire file to move by an odd (i.e. strange) number of bytes, although there were very few changes inside the file (I checked with a binary diff). What happened was the change was quick, but saving the file was excruciating...
</description><link>http://ayende.com/4828/designing-ravenfs#comment33</link><guid>http://ayende.com/4828/designing-ravenfs#comment33</guid><pubDate>Sun, 01 May 2011 13:32:32 GMT</pubDate></item><item><title>Ryan Heath commented on Designing RavenFS</title><description>@ayende
  
Now you've mentioned 'commit' and Huberto's suggestion I think you may want a UI which shows which files are the 'latest version' or need to be 'committed'.
  
Is that an idea?
  
  
// Ryan
</description><link>http://ayende.com/4828/designing-ravenfs#comment32</link><guid>http://ayende.com/4828/designing-ravenfs#comment32</guid><pubDate>Sun, 01 May 2011 13:27:59 GMT</pubDate></item><item><title>Ayende Rahien commented on Designing RavenFS</title><description>Uriel,
  
Pages are immutable, if you want to update a page, you need to send a new version of it.
  
That makes keeping versions around _very_ cheap.
</description><link>http://ayende.com/4828/designing-ravenfs#comment31</link><guid>http://ayende.com/4828/designing-ravenfs#comment31</guid><pubDate>Sun, 01 May 2011 13:17:13 GMT</pubDate></item></channel></rss>