Patch management approaches using centralized SCM
Without getting to the centralized vs. decentralized SCM argument (I understand the differences, I just don't grok them), patch management is important in many scenarios. Contributing to OSS projects is a major one, I admit, but I have previous used these techniques to be able to take emergency fixes on productions and merge them into the development trunk.
The question came up in the NHibernate Contrib mailing list, and Josh Robb has commented on that at length. I thought that it would be a good idea to take that and expand on this a bit.
We want to submit a changeset to a project, without having direct access to its source control. The solution is to generate a patch and send it to the destination.
So far, it is simple. It gets complex when you need to deal with more than a single changeset that hasn't been merged to the root.
Let us say that we have several changesets that we have generated. Let us see how we treat them, according to the different scenarios we encounter. A scenario, in this case, is the dependence between the changesets.
Scenario #1 - No dependencies between the patches.
This is a common scenario if you are working on several things in parallel. A classic case is when you are fixing several bugs. In most cases, the changes in each bug fix are unrelated to each other, and can be applied independently.
In this case, you usually generate separate patches for each changeset. This allow to evaluate each patch in isolation, which significantly ease the acceptance of each patch.
This lead us to the First Rule of Patches: keep them small. It is easier to go through seven small patches than 1 big one.
Scenario #2 - No dependencies between the patches, but touching the same files.
This is the case if two changesets has touched the same file, but there is no logical dependency between the patches. In this case, we still want to get separate patches. Usually, I generate one patch, revert to base, work on the second one, generate a patch, etc...
Scenario #3 - Logical dependencies between the patches
One patch relies on behavior / API created in another patch. In this case, the best solution is to create a patch for each distinct behavior, and number them, so it is still possible to review them in isolation, but the merge order is clear.
Scenario #4 - Several revisions of the same patch
In this case, you sumbitted a patch, but continued to work on the same feature/bug and have a new patch before the first one was applied. In this case, the later patch supercede the previous one, which can now be discarded. You need to be careful with this scenario, because too much disconnected work can create huge patches. It is better to review you work and see if you are in situation #3 or really situation #4.
Anything that I missed?
Maybe this will help you with the grokking bit:
You bugger! ;)
I've got a draft of this in my queue....
Write a better one!
You are missing the hard part of the problem. 1-4 deal with 1 person submitting patches to a project; in each of these cases the solution you propose is perfectly valid.
The trouble is in the We part of the problem. If several people are working on related components of the system, there comes a point where people are having to deal with dependencies on either work that hasn't been committed centrally yet, or functionality that is changing by patches from another person.
This is the hard part (in a centralized vcs) because it inevitably comes down to somebody maintaining a merge of someone else's work as part of their patch (normally we see this in terms of patches rotting and needing to be updated to current sources). While this works somewhat (it fails to scale past a small-medium amount of contributors), it is a failure from a separation of concerns perspective.
Scenario 5: multiple people with patches at the same time somehow in conflict with each other
review/apply 1 patch, other patches must be resubmitted as they no longer apply (patch contributor takes on burden of merging the patch forward with new source, submits new patch for review)
review/apply 1 patch, commit and update to revision without patch applied; review/apply another patch, update and modify to merge the work together, finally committing a patch modified from the one contributed. repeat until complete (or opt out to one of the other solutions)
(this second patch is an implicit branch, someone has to do the work to merge them together because only explicit branches exist in a centralized vcs)
review/apply patch 1 and commit
explicitly create a branch for patch 2; review, apply and commit
repeat until complete
have someone create patches for merging and review as necessary
I have seen all 3 solutions used in various OSS projects. A is how most seem to work, I've seen B on some very small projects, and C is used when a project wants to always have a shippable main branch
You are absolutely correct in that. But note the part about centralized SCM, I don't think you can get away with that as long as you are centralized.
There IS one truth in the centralized model.