While driving to work today, I started wondering what pieces of code are owned by someone, and how you can detect it. Owned means that they are the only one that can touch that code. Whatever it is by policy or simply because they are the only one with the skills / ability to do so.
I wonder if you can use the source control history to figure it out. Something like:
- Find all files changes within the last year
- Remove all files whose changes are over a period of less than two weeks (that usually indicate a completed feature).
- Remove all the files that are modified by more than 2 people.
- Show the result and the associated names.
That might be a good way to indicate a risky location, some place that only very few people can touch and modify.
I started to think about how to do this in Git, but I got lost. Anyone want to try and take that up?
Following my post yesterday, I decided that for now, we will have the following system:
- A github hosted repository with the binaries
- A github hook to post on notification
- A server side listener that would get that notification
On push, we will simply call git pull in the website directory. We use git ignores for the logs & data files, but that is about it.
It is primitive in the extreme, and it likely have many failure scenarios, but for now, it works. And very nicely, too.
Basically, I currently have a very simple procedure for deploying software, it is called: git push, and I really want to be able to do that for my web applications as well.
I know that Rob Conery has talked about this in the past:
And I know about Heroku and AppHarbor, that isn’t what I am talking about.
On my own server, I have a set of web application that I want to be able to update using git push.
- It has to be an explicit operation (pushing to a specific branch is okay).
- It can’t be something that happens periodically, I want to push, and as soon as possible, be able to see the changes. Waiting 5 minutes for the periodic check is going to be a non starter.
- It has to take into account local information (logs, data, etc).
- I have to be able to easily rollback.
- I don’t really care for things like migrations, those are handled by the application, or manually.
I specifically don’t care about actually building the code, I am perfectly fine with pushing binaries to the git repository.
At first I thought about simply making the site a git repository and just push there. But you can’t push to non bare repositories by default (and rightly so). When I gave it some more thought, I realized that there are more reasons to want to avoid that.
Any thoughts? Any existing solutions?
I just love git pull requests, but the new behavior from GitHub is beyond moronic. Take a look at a typical pull request:
The problem is that clicking on this button would actually merge the changes to the public repository. I don’t know about you, but there are very few cases where this is what I want to do.
In 99.9999% of the cases, I want to merge this locally to see what the bloody changes are, run some tests, maybe modify the changes before I am taking them. In this case, this particular pull request contains a failing test. I never want to commit that to the public repo automatically.
What is worse is that I now need to manually construct the pull command in the command line, whereas GitHub previously offered the option to generate that for me, which I liked much more.
The builtin answer for sharing code between multiple projects is quite simple…
But it introduces several problems along the way:
- You can’t just git clone the repository, you need to clone the repository, then call git submodule init & git submodule update.
- You can’t just download the entire source code from github.
- You can’t branch easily with submodules, well, you can, but you have to branch in the related projects as well. And that assumes that you have access to them.
- You can’t fork easily with submodules, well, you can, if you really feel like updating the associations all the time. Which is really nasty.
Let me present you with a simple scenario, okay? I have two projects that share a common license. Obviously I want all projects to use the same license and the whole thing to be under source control.
Here is our basic setup:
PS C:\Work\temp> git init R1 Initialized empty Git repository in C:/Work/temp/R1/.git/ PS C:\Work\temp> git init R2 Initialized empty Git repository in C:/Work/temp/R2/.git/ PS C:\Work\temp> git init Lic Initialized empty Git repository in C:/Work/temp/Lic/.git/ PS C:\Work\temp> cd R1 PS C:\Work\temp\R1> echo "Hello Dolly" > Dolly.txt PS C:\Work\temp\Lic> cd ..\R1 PS C:\Work\temp\R1> git add --all PS C:\Work\temp\R1> git commit -m "initial commit" [master (root-commit) 498ab77] initial commit 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 Dolly.txt PS C:\Work\temp\R1> cd ..\R2 PS C:\Work\temp\R2> echo "Hello Jane" > Jane.txt PS C:\Work\temp\R2> git add --all PS C:\Work\temp\R2> git commit -m "initial commit" [master (root-commit) deb45bc] initial commit 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 Jane.txt PS C:\Work\temp\R2> cd ..\Lic PS C:\Work\temp\Lic> echo "Copyright Ayende (C) 2011" > license.txt PS C:\Work\temp\Lic> git add --all PS C:\Work\temp\Lic> git commit -m "initial commit" [master (root-commit) 8e8b1b4] initial commit 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 license.txt
This just gives us the basics. Now I want to share the license.txt file between the projects. I can do that with submodules, like so:
PS C:\Work\temp\R1> git submodule init PS C:\Work\temp\R1> git submodule add C:\Work\temp\Lic Legal Cloning into Legal... done. PS C:\Work\temp\R1> cd ..\R2 PS C:\Work\temp\R2> git submodule init PS C:\Work\temp\R2> git submodule add C:\Work\temp\Lic Legal Cloning into Legal... done.
Now, this looks nice, and it works beautifully. Until you start sharing this with other people. Then it starts to become somewhat messy.
For example, let us say that I want to add a disclaimer in R1:
PS C:\Work\temp\R1\Legal> echo "Not for Jihad use" > Disclaimer.txt PS C:\Work\temp\R1\Legal> git add .\Disclaimer.txt PS C:\Work\temp\R1\Legal> git commit -m "adding disclaimer" [master db3987c] adding disclaimer 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 Disclaimer.txt
And here is where the problems starts. Let us assume that I want to make a change that is local to just this project.
Well, guess what, you can’t. Not if you intend to share this with other people. You need to push your changes to the submodules somewhere, and that means that if you need to fork the original project, update references to the project. Of course, if there is an update to the original submodule, you need to have two stages to update that.
And we haven’t spoken yet on the fun of pushing the main repository but forgetting to push the submodule. It gives a new meaning to “it works on my machine”.
In short, git submodules looks like a good idea, but they aren’t really workable in the real world. I’ll have a new post shortly showing how to deal with the issue
I am getting really sick of git submodules, and I am trying to find alternatives.
So far, I have discovered the following options:
- git subtree – a shell script that doesn’t work on Windows (https://github.com/apenwarr/git-subtree)
- Braid – a ruby script that fails to run on windows (https://github.com/evilchelu/braid)
PS C:\Work\RavenDB> braid add email@example.com:ravendb/raven.munin.git F, [2011-01-09T18:41:09.788525 #224] FATAL -- : uninitialized constant Fcntl::F_SETFD (NameError) C:/Ruby186/lib/ruby/gems/1.8/gems/open4-1.0.1/lib/open4.rb:20:in `popen4' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/lib/braid/operations.rb:103:in `exec' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/lib/braid/operations.rb:114:in `exec!' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/lib/braid/operations.rb:51:in `version' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/lib/braid/operations.rb:57:in `require_version' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/lib/braid/operations.rb:78:in `require_version!' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/lib/braid/command.rb:51:in `verify_git_version!' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/lib/braid/command.rb:10:in `run' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/bin/braid:58:in `run' C:/Ruby186/lib/ruby/gems/1.8/gems/main-4.4.0/lib/main/program/class_methods.rb:155:in `run!' C:/Ruby186/lib/ruby/gems/1.8/gems/main-4.4.0/lib/main/program/class_methods.rb:155:in `run' C:/Ruby186/lib/ruby/gems/1.8/gems/main-4.4.0/lib/main/program/class_methods.rb:144:in `catch' C:/Ruby186/lib/ruby/gems/1.8/gems/main-4.4.0/lib/main/program/class_methods.rb:144:in `run' C:/Ruby186/lib/ruby/gems/1.8/gems/main-4.4.0/lib/main/factories.rb:18:in `run' C:/Ruby186/lib/ruby/gems/1.8/gems/main-4.4.0/lib/main/factories.rb:25:in `Main' C:/Ruby186/lib/ruby/gems/1.8/gems/evilchelu-braid-0.5/bin/braid:13 C:/Ruby186/bin/braid:19:in `load' C:/Ruby186/bin/braid:19
- Piston – another ruby script that fails to run (http://piston.rubyforge.org/)
Does anyone know about a good solution that will work on Windows? Most specifically, I am looking for something that is plug & play, I don’t want to write code or to understand how git works. I just wanna it to work
One of the things that some people fear in a distributed source control is that they might run into conflicts all the time.
My experience has shown that this isn’t the case, but even when it is, there isn’t really anything really scary about that.
Here is an example of a merge conflict in Git:
Double click the conflicting file, and you get the standard diff dialog. You can then resolve the conflict, and then you are done.
Today, I had two separate incidents in which my git repository was corrupted! To the point that nothing, git fsck or git reflog or git just-work-or-i-WILL-shoot-you didn’t work.
The first time, there was no harm done, I just cloned my repository again, and moved on. The second time that it happened, it was after I had ~10 commits locally that weren’t pushed. I had my working copy intact, but I didn’t want to lose the history. I asked around, and got a couple of suggestion to move to mercurial instead, because git has no engineering behind it.
Based on that feedback, I …
Oh, wait, it isn’t this sort of a post.
What I actually did was setup Process Monitor and watched what git.exe was actually doing. I noticed that it was searching for a .git/objects directory, and couldn’t find it anywhere in the path. Indeed, looking there myself, it appeared clear that there was no objects directory under the .git dir. And checking in other repositories showed that they had it. So now I knew why, but I still had no idea who the #*@# decided to randomly @#$%( my repository, totally derailing my productivity.
That is where having multiple personalities come in handy, he did it. The one that isn’t writing this blog post, at some point during the day, there was a need to zip the repository and send it somewhere. Since the working copy is full of crap, that idiot issued the following:
ls -R obj | rm –F
ls -R bin | rm –F
(Not the exact commands, the idiot used the UI to do a search & delete).
You can guess the following from there. At this point, having come to this astounding discovery, I heroically went to the recycle bin, found the objects directory there, and rescued it! All is well, except that there is still a thrashing for uncommon stupidity owed.
And remember, it wasn’t me, it was the other one who did that!
And yes, the spelling mistake in the title is intentional.
I just had to go through a code base where I had a bunch of of comments.
Instead of going with the usual route of just noting the changes that I think should be done, I decided to do something else. I fixed each individual change, and commit them individually.
This is how it looks like, each commit is usually less than a single screen of changes (diff mode).
I wonder if it is something that I can apply more generally.
One of the things that I like about Git is that I don’t have to think about operations that I make in my source. For example, I am working on the refactoring from NH Prof to ÜberProf, and I wanted to change the directories & project files. So I just went to explorer and renamed them.
Then I had to fix some namespaces references in the project file. It looks like this:
Notice that it capture both the rename and the content change?
You can also see how it looks in the log file:
Trying to do stuff like that with SVN is just PITA, with Git, I didn’t have to think about it.
I had a short discussion with Steve Bohlen about distributed source control, and how it differs from centralized source control. After using Git for a while, I can tell you that there are several things that I am not really willing to give up.
- Fast commits
- Local history
- Easy merging
To be sure, a centralized SCM will have commits, history and merging. But something like Git takes it to a whole new level. Looking at how it changed my workflow is startling. There is no delay to committing, so I can commit every minute or so. I could do it with SVN, but it would take 30 seconds to a minute to execute, blocking my work, so I use bigger commits with SVN.
Having local history means that I can deal with a lot of small commits, because diffing a file from two commits ago is as fast as diffing the local copy. I tend to browse around in the history quite a lot, especially when I am doing stuff like code reviews, or trying to look at how I did something three weeks ago.
Merging is another thing that DVCS excels at. Not so much because of better merge algorithms (although that is a part of this), but simply because having all the information locally make the merge process so much faster.
All in all, it end up being a much easier process to work with. It takes time to get used to it, though.
And given those requirements, Fast commits, Local history, Easy merging, you pretty much end up with a distributed solution. Even with a DVCS, you still have the master repository, but just the fact that you have full local history frees you from having to manage all of that.
If I needed more reasons to move to Git, this would be it:
Just to make things more interesting, a couple of those are pull requests, but the middle is just a patch that I got sent. That patch also include a binary file.
Automatically tracking who did what and where, even for people who aren’t members of the project? Not having to handle binary files in patches in a special way?
That is just makes things so much simpler…
I have been using Git for the past week or so, enough to get a good handle on its benefits and disadvantages.
I moved to Git from Subversion, after having done a stint of almost 6 years of using Subversion. A stint which also included doing some development on Subversion.
Despite appearances, I actually took a fairly structured (and long running) approach to learning Git, I got a book and read it, I played around with it, and I mostly dismissed it as “it isn’t solving my problem” and “I already know how source control works”.
Last week I had the chance to pick Aaron Jensen mind about Git, and he was able to clear up some conceptual issues about how Git operates. I made a bigger effort since to learn how I can make better use of Git and I think that I now know enough to be able to talk about it.
First, let me talk about Subversion a little bit. As I said, I am a long time user of Subversion, and I consider it an excellent source control system. It is, however, strongly aimed at meeting corporate development scenarios.
What do I mean by that? In Subversion, you have the root repository, and everything else falls out from there. That doesn’t sounds like such a problem, until you realize that except for your local working copy, every single operation is a remote operation.
Just to give you an idea why this is a problem, looking up a history of changes (including file diffs) is a real pain in Subversion. Merging (the actual act, not keeping track of it) is a pretty long operation as well.
That was the final deal breaker for me. I feel insulted whenever I have to wait for the machine, it should be the other way around, damn it!
But there are other issues with Subversion usage, specifically, for Open Source projects I don’t believe that the centralized model works anymore.
Consider the workflow for getting a patch in Rhino Mocks. You get the code, make the patch, send the patch. In the meantime, the project is moving ahead and you are forced to keep up with what is basically a dirty and unversioned working copy.
Worse, for me, when you send me the patch, it has to go back to the server for any old versions (slow! slow! slow!) and make me do a lot of the work.
Having a single source of truth is important, for official releases. But in the meantime, I like the idea of having multiple disparate copies that people are working on independently. They should.
Some other important thoughts, with absolutely no order:
- Local history is another major important aspect. I mentioned that this is something that I sometimes do, and it is a total pain to go through that with Subversion, and it totally painless to do so with Git. I have TortoiseGit installed specifically for that, so I get a UI that I am very familiar with but with no network round tripping time.
- I am not sure how to qualify this point, but it feels like Git is faster even when it does goes on the network. Project checkouts and remote commits seems to be faster. Even though on the face of it Subversion should be much faster at least for checkouts (Git gets the entire history, Subversion gets a single revision).
- People keep mentioning having private commits as an advantage. I guess I see the point, but I am not sold on that yet. Sure, it is fun to be able to do that, but this is a paper advantage so far. What I really do like is that commits are fast. Which means that in many cases I can commit & push on the background while resuming work.
Github is another consideration, its major advantage is that it is taking care of a lot of details related to actually managing git.
For open source work, I love looking at this:
This gives me a good indication about the actual interest in a project.
Other things, like pull requests and their management makes me tingle all over, since they represent how people actually work in OSS project in real life and this represent a significant time saving.
There are other stuff as well, having a download button in the site means that I don’t get questions from people that have no Subversion tools or are behind firewalls.
Github is also a huge disadvantage since in less than a week I caught it broken at least twice. There are some things that I want to be stable, the place where I put my source code is one of them, and I don’t care if the unstable parts aren’t the Git repositories. As far as I am concerned, unstable site equals to nervousness on my part.
Anyway, those are my reasons for moving to Git. The tooling are pretty good, I got used to the git command line, gitk and git gui fairly quickly.