Ayende @ Rahien

It's a girl

Thoughts about building your own source control

Let me start by stating that you really don't want to do that. This is not something that you want to do, period.

Now that we are over that, I had the chance lately to go fairly deeply into SCM and how they are implemented from two fairly different perspectives. This is a randomly collected set of observations about SCM systems. As usual, the order is arbitrary, and no attempt was made to make any coherent idea out of this.

  • It is all about the client. The client in an SCM system has significant responsibilities. It is in charge of reporting the client state, managing all the errors hat the user can cause, and shoulder a lot of the burden.
  • It is all about the protocol. Anyone who designs a SCM system should be given a lousy DSL line with disconnects every 15 minutes. Oh, and they should also have to work on a plane a lot.
  • On the wire, it is all so simple. It is really surprising to see how the SCM complexity is really just a lot of tiny, easy to handle, details.
  • The devil is in the details, though.
  • Complexities on the server side:
    • Space management - do you save the diff or the whole file?
    • If just the diff, how do you construct an arbitrary version
    • Keeping history around for branches and copies
    • Cheap copies

  • Complexities on the client side:
    • Do you have one version on the client, per the working copy?
    • Do you have multiple versions, one per each file?
    • Handling inconsistencies between server version, working copy version and original version.

  • What do you optimize for? Bandwidth? Roundtrips?
    • I know of one SCM product who is lousy optimizer for both

  • Distributed SCM can be handled on top of centralized SCM.
  • It is not hard at all, except for all the details.
  • Don't write your own SCM.
  • Trust matters, and you really don't want to be in the situation where you don't trust your SCM.
  • Remember that SCM is temporal, you can go backward in time, and even sideway, to a branch.
  • There are only three types of operations in SCM:
    • Generate a change set between two paths at two versions
    • Apply a diff to a path, generating a new version
    • Reporting (logs, mostly, and outputting various formats of a changeset)
Overall, it is very simple endeavor. It gets complex when you start talking beyond the wire protocol. As a simple example, how much does it cost you to branch? How much does it cost you to find out if there has been any changes to the working copy?
The other major issue is: How do you ensure that it is reliable?
Now, let me repeat myself, do not write your own source control!

Comments

Marcus Wyatt aka. Maruis Marais
04/30/2008 12:39 AM by
Marcus Wyatt aka. Maruis Marais

Currently, my preferred SCM is Git. I've used the following SCM's:

CVS - Just horrible

SVN - Still use it day by day because I have to. It is still ok.

Team System - To heavy and quite brittle.

Source Safe - No thanks....

In short Git is a distributed SCM that makes tasks like branch & merging extremely easy. You also have features like stash and what is nice, is that you can run Git locally using an remote SVN repo, while the rest of the team is totally oblivious about this fact.

When you first look at Git, the whole distributed without a single main repo (ala svn style), just sounds weird. But once you start using Git, you very quickly realize what an awesome SCM it is. When I can share my branch of the code with your git repo while the remote (let's call it main repo) has the official branch. You can then merge my branch into your branch easily and then rebase your branch with the official main branch. Or you can fork the main branch and take it in a completely separate direction. Anyways, as you can see there is so much you can do. And Git makes tasks you would normally not attempt likely, as easy as saying cheese... (if you know git of course)

There is multiple good sources of information on the web about Git. (Google Video of Linus, PeepCode, etc.)

Ken Sykora
04/30/2008 12:51 AM by
Ken Sykora

For some reason, I feel very strongly that I should not write my own source control.

cristian
04/30/2008 12:54 AM by
cristian

Yep, Git is good but their support for Windows Plataforms still sucky, I prefer myself Mercurial.

Chad Myers
04/30/2008 01:07 AM by
Chad Myers

Sweet! Ayende announces Rhino.SCM! I know I'm not alone when I say that I look eagerly to your estimated end-of-May beta release date.

shanebush
04/30/2008 01:37 AM by
shanebush

"Distributed SCM can be handled on top of centralized SCM."

First thing I thought of: Git. First comment on blog post: about Git.

As much as I use and like Subversion, I do believe that if the client tools were there for Git like they are for svn, Git would soon surpass it.

Since you have so much experience now with SVN now, why not go all out and help out Toravalds with a Git implementation that rivals TortoiseSVN.

RhinoGit... got a good ring to it! You could also holler "Rhino! Git!" while making commits in Rhino Tools.

Shane

axl
04/30/2008 09:12 AM by
axl

Hmm, crap. :) I have just begun writing my own after giving up on finding one that does things the way I want it to.

As christian says, Git lacks decent Windows support and requires manual db management.

Mercurial and Bazaar each have at least one feature the other doesn't and I want both. Trying to build either one from source to extend them failed miserably after I spent two days trying to build Python from source so it would accept extensions compiled in VS 2008. That's open source dependency hell for you.

Perforce has served me well for almost ten years, but its lack of good move/rename functionality as well as being cumbersome when it comes to branching is beginning to get on my nerves. It's got to go.

Subversion is not an option, its only upside is that it's free, everything else annoys me.

ClearCase is too expensive, too slow, too big and has too much legacy to be a real option.

BitKeeper looks nice, but the licensing model and license agreement, as well as the available documentation and the general attitude of the company puts me off.

AccuRev seem to have gotten the back-end right, but their GUI sucks eggs when you actually try to work in it.

And there's a bunch of other commercial alternatives that all lack in features or force me to work in ways I don't like.

So, even if it takes forever, I'm going to write my own.

And I'll use Rhino.Mocks with xUnit.Net to test it. :)

Neil Mosafi
04/30/2008 10:59 AM by
Neil Mosafi

Hmm... I wonder if anyone ever thought of building a SCM on top of Microsoft's new FeedSync protocol?

Dan
05/01/2008 08:51 AM by
Dan

..or on top of MS Mesh, and itegrate it into sharepoint and then use WCF. Oh and be sure to use SSIS at some point , just to top it all off.

94640d74-ea07-42d2-9c8c-def42488b8e3

Ayende Rahien
05/01/2008 08:58 AM by
Ayende Rahien

Dan,

I may never recover from this suggestion

jdn
05/01/2008 08:55 PM by
jdn

Don't forget to export to Excel and run it through a web service as part of the check-in process.

Neil Mosafi
05/02/2008 08:34 AM by
Neil Mosafi

I'll just get me coat then...

Comments have been closed on this topic.