Idle musing while commuting: The ownership index
While driving to work today, I started wondering what pieces of code are owned by someone, and how you can detect it. Owned means that they are the only one that can touch that code. Whatever it is by policy or simply because they are the only one with the skills / ability to do so.
I wonder if you can use the source control history to figure it out. Something like:
- Find all files changes within the last year
- Remove all files whose changes are over a period of less than two weeks (that usually indicate a completed feature).
- Remove all the files that are modified by more than 2 people.
- Show the result and the associated names.
That might be a good way to indicate a risky location, some place that only very few people can touch and modify.
I started to think about how to do this in Git, but I got lost. Anyone want to try and take that up?
Comments
What about forks which were not pulled back?
Scooletz, This assume that you are doing this check on a single branch
Another way to find troublesome code is to analyze which file(s) are being changed most often. This could indicate a violation of SRP, perhaps other things too. There are tools that do this already, but I can't remember any names right now.
Interesting idea. I found this (http://stackoverflow.com/questions/6572728/svn-list-of-files-changed-exclusively-by-1-user) solution for SVN. I guess it is pretty simple to port it to Git.
Actually the code need a small modification as it solves the task "to find files modified exclusively by specific user" but not "to find files modified exclusively by one user". But the general idea is the same: retrieve the list of current files by "list" command and then pass this list to "log" command which returns modifiers.
What about automatic code reformatting that could be made by anyone but is not an indication of knowledge sharing?
This can only mean one thing, Ayende is about to create his own DVCS xD
I can see it already: RavenVC, running on a RavenDB backend. A "second generation distributed version control system"!
:)
Nice idea. Btw, I would be interested in how you handle code ownership in you company. Does every developer own a piece? Microsoft does it this way.
Define a nice model around this question, create a git hook which updates some ravendb docs (curl) on each commit and let indexing do the rest?
There's prior academic research in this area. As a starting point, here's a paper by some smart folks at Microsoft Research: http://research.microsoft.com/en-us/um/people/abegel/papers/codebook-icse2010.pdf
More generally, there's the Mining Software Repositories conference: http://2012.msrconf.org/
Michael Feathers has a couple of interesting blog posts very similar to yours: http://michaelfeathers.typepad.com/michael_feathers_blog/2011/01/measuring-the-closure-of-code.html http://michaelfeathers.typepad.com/michael_feathers_blog/2011/03/data-rich-development.html http://michaelfeathers.typepad.com/michael_feathers_blog/2011/09/temporal-correlation-of-class-changes.html
Maybe get in touch with him?
Comment preview