git multiple repository management

I am working on a project where we manage external libs/headers and qa with git. Here is what every developers’ directory structure looks like:

~/dev/proj 
~/dev/ext 
~/dev/qa

proj, ext and qa are different git repositories. Under svn, synchronization of these dirs was simple: a single update under ~/dev would update all of them recursively. With git, we need to do ‘git pull’ separately for each dir. This is not nice; someone will always forget to update (git pull) one of these dirs and his project will be out of sync (e.g. new qa will not pass with old code). I looked into ‘git submodules’ and it doesn’t provide a single point for ‘git pull’ to update these three separate modules at the same time [Correction: I was wrong here but please read my answer below].

  • Download only part of a GitHub repository on a TeamCity build server
  • fatal: 'origin' does not appear to be a git repository
  • git svn rebase is failing with a conflict to a file that does not exist in git
  • Is Git's commit atomic?
  • Escape comment character (#) in git commit message
  • How push directly to remote server just using push?
  • You could argue that we should have put proj, ext and qa under the same git repository but I thought that would have been against the git philosophy of keeping different concepts in different repositories.

    Does anyone have a solution (other than writing a script to do git pull on every dir under ~/dev) to this trivial problem?

    Thanks,

    Altan

  • Can I use mSysGit and Cygwin's git?
  • Git - error: RPC failed; result=22, HTTP code = 401 fatal: The remote end hung up unexpectedly
  • git best practices for quickly switching between branches
  • Git Credential Management - allow to change passwords (Windows, TortoiseGit)
  • Is there a commercial grade Git server product
  • Git flow with Bitbucket pull requests
  • 8 Solutions collect form web for “git multiple repository management”

    Herr Doktor,

    You are comparing apples to oranges. git-submodules is similar to svn:externals, aka svn-submodules. In fact, when you use the -r to attach an svn submodule at a specific revision, the behavior is nearly identical. To commit with svn-submodules, you have to commit in each submodule directory separately, just as with git-submodules.

    There is a big difference though: Most devs, at least during some phase of development, prefer to attach to a branch of each submodule, which is not supported by git-submodules. That can be useful for coordinated development. (Google’s Repo tool a wrapper around Git, meant for use with Gerrit, a code-review tool, is sort of similar. But trust me: Stay away from Repo. It solves a different problem.) The huge drawback is that you cannot recover an exact contour of your codebase. That seems fine for awhile, but I’ve heard nasty war stories.

    The alternative for you is not Subversion, but simply a single repository, which could be in Git, Subversion, or whatever. But you actually want a combination of single repo and multiple repos, right? You want the benefits of each. So you need a more sophisticated solution.

    One idea is to have one project repo, where you do most of your development, plus several separate repos, from which you distribute modules:

    proj/.git
    proj/subA
    proj/subB
    subA/.git
    subB/.git
    

    You could move code between them using rsync. The beauty is that you’ve made a sharp distinction between development and distribution. You develop your large project as normal, with branches, merges, etc. When you are ready to distribute a sub-directory as a library, you decide exactly what version of that library you want, and you copy it over to its own repo. When you need to merge instead of just copy, there is the git subtree merge strategy.

    There is another system, built on top of the subtree-merge strategy. It’s called git-subtrees, and it is part of git-1.7.11. Here is a nice description of its operation. You can see from the pictures that its timelines can look confusing, but functionally it’s exactly what you want. Here is a more recent write-up, with excellent advice.

    If you don’t mind the extra ‘update’ step of git-submodules, but you’re upset about how it handles conflicts, you could try giternal. The author has included a script to show how its behavior compares with git-submodules and braid (which is for vending submodules, but not merging them).

    Personally, I like git-slave, which is a simple wrapper around git. Basically, it applies your gits commands as git commands to all your repos. It’s really just a convenience. It’s very easy to understand, has zero impact on the individual repos, and is great for branch-switching (which is not yet supported in git-subtrees).

    My philosophy is this: if I will always need to pull X and Y together, then logically they belong in the same repository. Using submodules only makes sense if there is appropriate isolation – think external vendor libraries where you don’t want to have updates brought in willy nilly and you don’t want your team able to edit them directly – that makes sense. But still, it adds steps no matter how you slice it. I for one stick to “put it in one repository if it’s one project”, regardless of how I might theoretically break it up to be more “git-like”.

    You can still use submodules.

    git submodule update

    will update all submodules in one go.

    We tried ‘git submodule’ and it is not satisfactory. It seems like git submodule is designed for modules that don’t change much. Here are the steps to make and push a change to any module:

    cd ~/dev/proj
    git checkout master
    git pull
    ... make changes to your files ...
    git commit -a -m "comment"
    git push
    cd ..   
    git commit -a -m "comment"
    git push
    

    And this has to be repeated for each module under ~/dev. Excuse me but I find this ridiculous. In svn, the same thing is accomplished by

    cd ~/dev
    svn commit -m "done in one line"
    

    I understand the benefits of git over svn however lack of proper submodule support and lack of good large file support is probably going to make us switch to svn from git (unless we get a solution here — I’d rather stay with git). Honestly I am surprised this hasn’t come up in git at all.. Different projects share common modules [that are live] all the time.

    I would object to putting proj, ext and qa under the same repository because

    • ext will be shared with other projects (repositories)
    • qa should be able to be checked out (cloned) without code

    Altan

    IMHO, submodules are the way to go here.

    Instead of asking whether you always need X and Y together, you should ask yourself whether or not you always want the exact same versions of X and Y go together.

    Git Submodules offer you this very powerful tool of quickly fixing a bug in X, without having to also update Y.

    For instance, if you’re developing a product which runs on different operating systems (let’s say Mac OS X and Windows, for instance), then it may make sense to but the operating system specific code into separate submodules. This is especially true if different people work on these different operating system ports. Using git submodules allows you to easily deploy a fix for one operating system to your customers, without having to go through the QA process on the other OS.

    Another very powerful use case are “workspace” modules. You simply some local module (for instance /Workspace), then add all the dependencies that you’re working with.

    The great thing about git submodules is that it does not only record that modules that you use, but also their specific revisions. While fixing bugs, I often have to test specific versions of some dependencies – git submodules allow me to easily record these in my workspace module’s history, allowing me to easily get back to that exact state at a later time.

    I was facing the same problem and wrote a program (bash script) to do that: gws

    Roughly the idea is the following:

    1. Create a list of project paths and urls in the dev/.projects.gws :

      work/proj  | https://...
      perso/ext  | git@github.com:...
      perso/qa   | https://...
      
    2. Use one of the gws commands:
      • init: used to automatically create .projects.gws file from the existing repositories in the current folder tree.
      • update: clone missing local repositories, for instance when a project is added in .projects.gws.
      • status: show the status of all repositories (clean, untracked files, uncommited changes, …).
      • fetch: do a git fetch in all repositories (then status will be able to detect difference with origin repository if it was modified inbetween).
      • ff: do a git fast-forward pull in all repositories
      • check: verify the state of the workspace (known, unknown, missing repositories in the workspace)

    The .projects.gws file can then be versionned with git and used on many computers (work, home, laptop…). There is also the possibility to write an .ignore.gws file to ignore locally some paths with regexp, e.g. ignore repositories in work/ folder with ^work/.* on the home computer.

    See the Readme for more information.

    I use it every day and it fits my need (and maybe yours too). Notice that I am planning to rewrite it (under another name) in Python when I’ll have time. The reason is the code begin difficult to manage in bash, and I want to add more functionalities (e.g. add support for mercurial, darcs, …).

    git-multi is the answer. https://github.com/grahamc/git-multi

    have git-multi setup, and under ‘~/dev’ folder clone all the repos you need.

    and from ‘~/dev’ run “git multi pull” or “git multi status” and other commands, which intern runs the corresponding command in all the child repos.

    Git Baby is a git and github fan, let's start git clone.