git-subtree is not retaining history so I cannot push subtree changes, how can I fix this/avoid this issue in the future?
I’ve been using the git-subtree extension (https://github.com/apenwarr/git-subtree) to manage sub-projects within our main project. It’s doing exactly what I want other than the fact that it fails when I try to split out changes made to a sub-project from our main project.
e.g. earlier on I had done
git subtree add -P Some/Sub/Dir --squash git@gitserver:lib.git master
to bring in the library code to Some/Sub/Dir in our main project. Everything here went great so I then pushed my changes to our central main project bare git repo. I then decide to make a change to my local version of the lib in Some/Sub/Dir, commit it, then split it out to push it back to the lib.git repo
git subtree split -P Some/Sub/Dir -b some_branch
everything works as expected. No longer needing the local copy of the repo I deleted it.
After cloning a new copy of the repo from our central repo I made some changes to the lib in Some/Sub/Dir and decided I wanted to split those changes out and push them back to the lib.git repository. I attempt to use the same subtree split command as before, however this time I end up with the following output:
1/ 3 (0) 2/ 3 (1) 3/ 3 (1) fatal: bad object d76a03f0ec7e20724bcfa253e6a03683211a7bb1
d76a03f0ec7e20724bcfa253e6a03683211a7bb1 comes from when I added the subtree:
commit 43b3eb7d69d5eb64241eddb12e5bd74fd0215083 Author: Ian Bond <firstname.lastname@example.org> Date: Fri Apr 22 15:06:50 2011 -0400 Squashed 'Subtree/librepoLib/' content from commit d76a03f git-subtree-dir: Subtree/librepoLib git-subtree-split: d76a03f0ec7e20724bcfa253e6a03683211a7bb1
which actually refers to a commit in the lib.git repo.
What I’ve been able to piece together (and I’m a git noob so I may be wrong, overlooking something, or using incorrect terminology here), is that ‘git subtree add –squash’ will bring in the entire history from the remote lib.git repo into the current repo, squash it down into a separate commit, then add that commit into the working branch. The lib.git commit history remains in the current repo, however they’re dangling commits since they’re not actually referenced other than through the text of the squash commit. As long as those dangling commits remain, git-subtree can use them to perform splits, however since a push or pull doesn’t contain dangling objects (or if I run a gc and fully prune dangling objects), those dangling commits are lost and git-subtree no longer has the necessary information to perform the split.
I’ve added a script that will fully reproduce the issues I’ve been having.
My questions are:
1) What can I do to handle the existing situation where I now have subtrees that I want to merge back to their origin repo, but no longer have any sort of history that links them together. My current thought is to do something like:
git subtree split -P Some/Sub/Dir 43b3eb7^.. --ignore-joins -b splitBranch
to split out all of the history since the ‘git subtree add’ and merge it back into the origin repo (which thankfully has not had any changes since the add). Is this the best way to go? Any recommendations for how I should perform the merge?
2) Is there anything I can do to make git-subtree work as expected? I believe if I omit the –squash parameter on ‘git subtree add’ then everything will work, however that causes a bunch of unrelated history to be injected into my repo. Is there some way to keep the needed commits around (preferably without keeping the entire history of the library around)?
2 Solutions collect form web for “git-subtree is not retaining history so I cannot push subtree changes, how can I fix this/avoid this issue in the future?”
The purpose of
git subtree split is to create some new commits (representing “local” changes originally made in the subtree’s local directory) on top of the subtree’s original history. Since it directly involves the subtree’s original history (as the parent commit of the first rewritten local commit that touches the subtree), the split operation can not be done without the subtree’s original history itself being present.
Think about what you will be doing with the history that
git subtree split generates. You will probably want to push it to a repository where you can merge it into the rest of the “upstream” history. In order for this merge operation to make sense, the split history needs to be based on the original history itself1.
Probably the most reliable way to arrange for users to have the subtree’s original history is to publish the URL for the subtree’s upstream repository in your documentation and have them define a remote for it (it is perfectly fine to have “unrelated” remotes in a single repository). E.g.
If you need to work with the “upstream” of
Some/Sub/Dir(to pull in external changes or push out local changes), please define and update a remote for the library’s repository before using
git remote add lib git@host:the-lib-repository && git fetch lib
You would need to do something like this even if you were not using
--squash since users would need to know where to get new upstream commits (and where (ultimately) to push new split-generated commits).
--squash gives you a “clean” history in your main project and means that only those users that need to deal with the subtree’s “upstream” actually have to have its objects in their repositories.
It seems like you have a good understanding of the object model. You are correct that the history that
git subtree add --squash pulls in will become dangling2 but that
git subtree split can still use it until it is pruned away.
(with reference to your reproduction script)
You are able to successfully split in your
repoMainClone only because local clones automatically hardlink (or copy) all the files in
.git/objects/ (thus getting access to
repoMain’s copies of the dangling (or nearly dangling2) objects from
repoLib) instead of using the usual “pack protocol” transport (which would limit the transferred objects to only those needed for the transferred refs; i.e. omitting anything from
repoMainPull is effectively equivalent cloning
file://"$(pwd)"/repoMain repoMainCloneFile (the
file:// URL forces local clones to use pack-based transfers instead of just linking/copying everything).
Actually, you can directly merge unrelated histories, but you lose the ability to do three-way merges (since there is no common ancestor). This would be quite a sacrifice.
git subtree split -P Some/Sub/Dir 43b3eb7^.. --ignore-joins … (where 43b3eb7 is the synthetic commit that resulted from
git subtree add --squash …), would generate an unrelated history (except it needs to be
43b3eb7^ means “the first parent of 43b3eb7” and 43b3eb7 has no parents). I am not sure that
git subtree split was designed to take ranges like this though. The documentation for
git subtree split just says
<commit>, but never really mentions its purpose. Reading the code shows that it defaults to HEAD, which might indicate that it is intended to be a single commit specifying the “tip” of the history that should be processed for splitting. Also, turning on the debug output shows a message
incorrect order: which might indicate that using a range argument is putting the split operation in an unexpected situation (it is expecting to have processed all of the parents of a commit before processing the commit itself, but the range ensures that 43b3eb7 (which is the parent of the subtree merge commit) is never processed). I think you can just use
--ignore-splits and leave off the range if you want to generate “unrelated” history and try to use it in some way:
git subtree split -P Some/Sub/Dir --ignore-joins ….
They are not actually dangling immediately after
git subtree add --squash because they are still referenced by FETCH_HEAD. Once an unrelated fetch is done, however, they will become truly dangling.
@Chris Johnsen’s answer is very right, it explains why spliting works in the clone not a pull situation.
For the work around provided in the question and explained in footnote 2 of @Chris Johnsen’s answer, I can confirm that
git subtree split -P Some/Sub/Dir -b splitBranch --ignore-joins and
git subtree split -P Some/Sub/Dir -b splitBranch 43b3eb7.. are acturally produced the same commit and same branch which can reflect the modifications done in the local repo, but can not be pushed to the original repoLib repo, because they don’t have a common accesstor, even though git diff shows
43b3eb7d69d are the same.
So, in order to get subtree push working in a pull situation, the original repoLib remote repo must be added and fetched to get
d76a03f0ec7e2 exsited to produce a branch that have a common accesstor with the original repoLib.
The original reproduce script could not run smoothly under linux, here is a new one: http://pastebin.com/3NAQKEz9