Ayende @ Rahien

It's a girl

Idle musing while commuting: The ownership index

While driving to work today, I started wondering what pieces of code are owned by someone, and how you can detect it. Owned means that they are the only one that can touch that code. Whatever it is by policy or simply because they are the only one with the skills / ability to do so.

I wonder if you can use the source control history to figure it out. Something like:

  • Find all files changes within the last year
  • Remove all files whose changes are over a period of less than two weeks (that usually indicate a completed feature).
  • Remove all the files that are modified by more than 2 people.
  • Show the result and the associated names.

That might be a good way to indicate a risky location, some place that only very few people can touch and modify.

I started to think about how to do this in Git, but I got lost. Anyone want to try and take that up?

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Scooletz
10/11/2011 10:47 AM by
Scooletz

What about forks which were not pulled back?

Ayende Rahien
10/11/2011 10:49 AM by
Ayende Rahien

Scooletz, This assume that you are doing this check on a single branch

Daniel Lidström
10/11/2011 11:01 AM by
Daniel Lidström

Another way to find troublesome code is to analyze which file(s) are being changed most often. This could indicate a violation of SRP, perhaps other things too. There are tools that do this already, but I can't remember any names right now.

Alexander
10/11/2011 11:38 AM by
Alexander

Interesting idea. I found this (http://stackoverflow.com/questions/6572728/svn-list-of-files-changed-exclusively-by-1-user) solution for SVN. I guess it is pretty simple to port it to Git.

Alexander
10/11/2011 11:53 AM by
Alexander

Actually the code need a small modification as it solves the task "to find files modified exclusively by specific user" but not "to find files modified exclusively by one user". But the general idea is the same: retrieve the list of current files by "list" command and then pass this list to "log" command which returns modifiers.

Ivan
10/11/2011 12:43 PM by
Ivan

What about automatic code reformatting that could be made by anyone but is not an indication of knowledge sharing?

Eber Irigoyen
10/11/2011 03:51 PM by
Eber Irigoyen

This can only mean one thing, Ayende is about to create his own DVCS xD

David Thibault
10/11/2011 04:36 PM by
David Thibault

I can see it already: RavenVC, running on a RavenDB backend. A "second generation distributed version control system"!

:)

tobi
10/11/2011 07:19 PM by
tobi

Nice idea. Btw, I would be interested in how you handle code ownership in you company. Does every developer own a piece? Microsoft does it this way.

Remco Ros
10/11/2011 11:31 PM by
Remco Ros

Define a nice model around this question, create a git hook which updates some ravendb docs (curl) on each commit and let indexing do the rest?

Lorin Hochstein
10/12/2011 03:07 AM by
Lorin Hochstein

There's prior academic research in this area. As a starting point, here's a paper by some smart folks at Microsoft Research: http://research.microsoft.com/en-us/um/people/abegel/papers/codebook-icse2010.pdf

More generally, there's the Mining Software Repositories conference: http://2012.msrconf.org/

Michael Chandler
10/12/2011 08:04 AM by
Michael Chandler

Michael Feathers has a couple of interesting blog posts very similar to yours: http://michaelfeathers.typepad.com/michaelfeathersblog/2011/01/measuring-the-closure-of-code.html http://michaelfeathers.typepad.com/michaelfeathersblog/2011/03/data-rich-development.html http://michaelfeathers.typepad.com/michaelfeathersblog/2011/09/temporal-correlation-of-class-changes.html

Maybe get in touch with him?

Comments have been closed on this topic.