Will Git garbage-collect commit in submodule referred to by a top-level repository?

Let’s say:

 top.git     
 └── sub.git => 75fc7
  • The top-level Git repository top.git refers to commit 75fc7 in sub.git.
  • The submodule Git repository sub.git has neither branches nor tags leading to commit 75fc7 (unreachable).

Will sub.git eventually garbage-collect this commit 75fc7 because nothing can reach it?

AFAIK, Git submodules designed the way that, in this example, sub.git is not able to establish the fact that it is a submodule of any other repository. In other words, commit 75fc7 is effectively a candidate for garbage collection. Then it would be unreliable to restore state of all submodules if they may “forget” required commits.

  • git submodule add fatal: Not a Git Repository
  • How to use same protocol for git submodules?
  • Cannot remove submodule from Git repo
  • Detecting current directory in bash script
  • Removing git submodules - how to automate removal on pull?
  • Git submodules: specify a specific SHA?
  • git clone --recursive - submodules on no branch
  • git: changes ignored in sub-directory
  • 2 Solutions collect form web for “Will Git garbage-collect commit in submodule referred to by a top-level repository?”

    Yes, the commit will be eventually garbage collected.

    But don’t forget that, to be reused, a submodule referenced by its parent repo must also published that recorded SHA1 (recorded as a gitlink, a special entry in the index of the parent repo).

    If that SHA1 is not published (pushed to an upstream repo), then any clone of the parent repo would not be able to checkout the submodule anyway.
    That means a submodule must push the recorded SHA1, which makes that SHA1 referenced (by a branch or tag, as pushed on the upstream repo)

    So the issue is not so much the garbage collector here, but just the capability of a parent repo to checkout its submodule to the right SHA1.

    My scenario (not explicitly mentioned in question) is actually different and more specific. What if the commits are actually pushed upstream for both top.git and sub.git?

    Then you don’t need to wait for a gc to remove a non-accessible SHA1 for the issue to manifest.
    If the published SHA1 is no longer referable, it means any clone of top.git won’t be able to checkout the sub.git submodule repo at the right SHA1 (even if gc hasn’t run yet), because the non-referred SHA1 won’t be part of the sub.git clone anyway.


    The key point to understand: an upstream repo sub.git has no idea it is used as a submodule by another upstream repo (like top.git).

    If sub.git does not include the right SHA1 (used by top.git) for any reason (gc or other rebase/push --force or …), a clone of top.git will fail to restore the submodule to its proper state.

    Actually, it was easy to test thanks to this answer.

    Yes, the commit was garbage-collected even if it was referenced by top-level repository.

    Then it demands some measures or discipline in what commits can be used in top-level repository in order to reliably restore entire tree spanning submodules at any time in the future. Such commits must be ancestors to any long-term maintained branch or tag.

    Git Baby is a git and github fan, let's start git clone.