The problem with Git Submodules
The builtin answer for sharing code between multiple projects is quite simple…
But it introduces several problems along the way:
- You can’t just git clone the repository, you need to clone the repository, then call git submodule init & git submodule update.
- You can’t just download the entire source code from github.
- You can’t branch easily with submodules, well, you can, but you have to branch in the related projects as well. And that assumes that you have access to them.
- You can’t fork easily with submodules, well, you can, if you really feel like updating the associations all the time. Which is really nasty.
Let me present you with a simple scenario, okay? I have two projects that share a common license. Obviously I want all projects to use the same license and the whole thing to be under source control.
Here is our basic setup:
PS C:\Work\temp> git init R1 Initialized empty Git repository in C:/Work/temp/R1/.git/ PS C:\Work\temp> git init R2 Initialized empty Git repository in C:/Work/temp/R2/.git/ PS C:\Work\temp> git init Lic Initialized empty Git repository in C:/Work/temp/Lic/.git/ PS C:\Work\temp> cd R1 PS C:\Work\temp\R1> echo "Hello Dolly" > Dolly.txt PS C:\Work\temp\Lic> cd ..\R1 PS C:\Work\temp\R1> git add --all PS C:\Work\temp\R1> git commit -m "initial commit" [master (root-commit) 498ab77] initial commit 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 Dolly.txt PS C:\Work\temp\R1> cd ..\R2 PS C:\Work\temp\R2> echo "Hello Jane" > Jane.txt PS C:\Work\temp\R2> git add --all PS C:\Work\temp\R2> git commit -m "initial commit" [master (root-commit) deb45bc] initial commit 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 Jane.txt PS C:\Work\temp\R2> cd ..\Lic PS C:\Work\temp\Lic> echo "Copyright Ayende (C) 2011" > license.txt PS C:\Work\temp\Lic> git add --all PS C:\Work\temp\Lic> git commit -m "initial commit" [master (root-commit) 8e8b1b4] initial commit 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 license.txt
This just gives us the basics. Now I want to share the license.txt file between the projects. I can do that with submodules, like so:
PS C:\Work\temp\R1> git submodule init PS C:\Work\temp\R1> git submodule add C:\Work\temp\Lic Legal Cloning into Legal... done. PS C:\Work\temp\R1> cd ..\R2 PS C:\Work\temp\R2> git submodule init PS C:\Work\temp\R2> git submodule add C:\Work\temp\Lic Legal Cloning into Legal... done.
Now, this looks nice, and it works beautifully. Until you start sharing this with other people. Then it starts to become somewhat messy.
For example, let us say that I want to add a disclaimer in R1:
PS C:\Work\temp\R1\Legal> echo "Not for Jihad use" > Disclaimer.txt PS C:\Work\temp\R1\Legal> git add .\Disclaimer.txt PS C:\Work\temp\R1\Legal> git commit -m "adding disclaimer" [master db3987c] adding disclaimer 1 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 Disclaimer.txt
And here is where the problems starts. Let us assume that I want to make a change that is local to just this project.
Well, guess what, you can’t. Not if you intend to share this with other people. You need to push your changes to the submodules somewhere, and that means that if you need to fork the original project, update references to the project. Of course, if there is an update to the original submodule, you need to have two stages to update that.
And we haven’t spoken yet on the fun of pushing the main repository but forgetting to push the submodule. It gives a new meaning to “it works on my machine”.
In short, git submodules looks like a good idea, but they aren’t really workable in the real world. I’ll have a new post shortly showing how to deal with the issue
quote: " they aren’t really workable in the real world" - a bit harsh. With the adoption rate of git, I'd say that probably at least one team have found it workable in a real world.
As for your use-case, another quote - "Let us assume that I want to make a change that is local to just this project" - well then it is no longer really a shared resource, as it is manifested differently in every project using it, right?
if you keep it simple, put shared projects in a shared location, and localized stuff (like the disclaimer) within your main repo, then from "principle of least surprise" things are simple and straightforward. Have your build-script copy whatever needed to the artifacts/legal folder, and you're done.
You might want to take a look here:
Not workable in the real world means that they introduce a huge amount of pain.
As for a local change, what is this isn't really a local change, but part of a work in progress that I want to do in a branch?
Or if I really want to make local changes to a remote repository that is part of my project?
You are also ignoring the problem of what happen for people who are not using my build scripts.
For example, people going to github to get the source by clicking the download button
I actually do not know how the download button works. I'd say that if you want to give people the ability to get the source without cloning (which makes them poor OSS citizens), then you'd host it somewhere and link from the Readme.
Now I never said that submodules are terrific or anything. They do work, for some people (most people do not submodule 25 projects anyway), just not as smoothly as we'd wish them to.
There are many reasons for people to want to get just the source. And my problems with submodules doesn't happen with 25 modules, it happen with 1, in the presence of branching.
I started looking at submodule for two projects. One where we are going to switch to Git and one where the code was already in git.
As soon as I found out that new developers needed to set up the submodule themselves after they pulled a repository I quickly gave up that idea.
I agree, submodule is a good idea, but its setup needs to follow the parent repository.
The whole point of having to pull them separately is when they aren't necessarily needed. I have an application that has two optional dependancies, I include them in the vendors folder and if someone wants to run my unit tests they can git submodule init vendor/something and then update it themselves. I'm not incurring extra burden in my pulls to make it easier for someone who's too lazy to type two commands.