How to reduce the depth of an existing git clone?

I have a clone. I want to reduce the history on it, without cloning from scratch with a reduced depth. Worked example:

$ git clone git@github.com:apache/spark.git
# ...
$ cd spark/
$ du -hs .git
193M    .git

OK, so that’s not so but, but it’ll serve for this discussion. If I try gc it gets smaller:

$ git gc --aggressive
Counting objects: 380616, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (278136/278136), done.
Writing objects: 100% (380616/380616), done.
Total 380616 (delta 182748), reused 192702 (delta 0)
Checking connectivity: 380616, done.
$ du -hs .git
108M    .git

Still, pretty big though (git pull suggests that it’s still push/pullable to the remote). How about repack?

$ git repack -a -d --depth=5
Counting objects: 380616, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (95388/95388), done.
Writing objects: 100% (380616/380616), done.
Total 380616 (delta 182748), reused 380616 (delta 182748)
Pauls-MBA:spark paul$ du -hs .git
108M    .git

Yup, didn’t get any smaller. –depth for repack isn’t the same for clone:

$ git clone --depth 1 git@github.com:apache/spark.git
Cloning into 'spark'...
remote: Counting objects: 8520, done.
remote: Compressing objects: 100% (6611/6611), done.
remote: Total 8520 (delta 1448), reused 5101 (delta 710), pack-reused 0
Receiving objects: 100% (8520/8520), 14.82 MiB | 3.63 MiB/s, done.
Resolving deltas: 100% (1448/1448), done.
Checking connectivity... done.
Checking out files: 100% (13386/13386), done.
$ cd spark
$ du -hs .git
17M .git

Git pull says it’s still in step with the remote, which surprises nobody.

OK – so how to change an existing clone to a shallow clone, without nixing it and checking it out afresh?

  • Vim does not open Git branch files
  • GitConfig: bad config for shell command
  • How to use environment variable for Git remote URL
  • How to see if a diff in git is the addition / removal of a binary file?
  • git merge only changeset
  • How to fix error when git push was rejected
  • Git setting to allow force update of certain branches
  • After making a bare repository clone of an existing repository, is it possible to treat the original repository as a clone of the bare repository?
  • 4 Solutions collect form web for “How to reduce the depth of an existing git clone?”

    git clone --bare --mirror --depth=5  reponame temp.git
    rm -rf reponame/.git/objects
    mv temp.git/{shallow,objects} reponame/.git
    rm -rf temp.git
    

    This really isn’t cloning “from scratch”, as it’s purely local work and it creates virtually nothing more than the shallowed-out pack files, probably in the tens of kbytes total. I’d venture you’re not going to get more efficient than this, you’ll wind up with custom work that uses more space in the form of scripts and test work than this does in the form of a few kb of temporary repo overhead.

    Edit, Feb 2017: this answer is now outdated / wrong. Git can make a shallow clone shallower, at least internally. Git 2.11 also has --deepen to increase the depth of a clone, and it looks as though there are eventual plans to allow negative values (though right now they are rejected). It’s not clear how well this works in the real world, and your best bet is still to clone the clone, as in jthill’s answer.


    You can only deepen a repository. This is primarily because Git is built around adding new stuff. The way shallow clones work is that your (receiving) Git gets the sender (another Git) to stop sending “new stuff” upon reaching the shallow-clone-depth argument, and coordinates with the sender so as to understand why they have stopped at that point even though more history is obviously required. They then write the IDs of “truncated” commits into a special file, .git/shallow, that both marks the repository as shallow, and notes which commits are truncated.

    Note that during this process, your Git is still adding new stuff. (Also, when it has finished cloning and exits, Git forgets what the depth was, and over time it becomes impossible even to figure out what it was. All Git can tell is that this is a shallow clone, because the .git/shallow file containing commit IDs still exists.)

    The rest of Git continues to be built around this “add new stuff” concept, so you can deepen the clone, but not increase its shallowness. (There’s no good, agreed-upon verb for this: the opposite of deepening a pit is filling it in, but fill has the wrong connotation. Diminish might work; I think I’ll use that.)

    In theory, git gc, which is the only part of Git that ever actually throws anything out,1 could perhaps diminish a repository, even converting a full clone into a shallow one, but no one has written code to do that. There are some tricky bits, e.g., do you discard tags? Shallow clones start out sans tags for implementation reasons, so converting a repository to shallow, or diminishing an existing shallow repository, might call for discarding at least some tags. Certainly any tag pointing to a commit wiped out by the diminish action would have to go.


    Meanwhile, the --depth argument to git-pack-objects (passed through from git repack) means something else entirely: it’s the maximum length of a delta chain, when Git uses its modified xdelta compression on Git objects stored in each pack-file. This has nothing to do with the depth of particular parts of the commit DAG (as computed from each branch head).


    1Well, git repack winds up throwing things out as a side effect, depending on which flags are used, but it’s invoked this way by git gc. This is also true of git prune. For these two commands to really do their job properly, they need git reflog expire run first. The “normal user” end of the clean-things-up sequence is git gc; it deals with all of this. So we can say that git gc is how you discard accumulated “new stuff” that turned out to be unwanted after all.

    OK here’s an attempt to bash it, that ignores non-default branches, and also assumed the remote is called ‘origin’:

    #!/bin/sh
    
    set -e
    
    mkdir .git_slimmer
    
    cd $1
    
    changed_lines=$(git status --porcelain | wc -l)
    ahead_of_remote=$(git status | grep "Your branch is ahead" | wc -l)
    remote_url=$(git remote show origin  | grep Fetch | cut -d' ' -f5)
    latest_sha=$(git log | head -n 1 | cut -d' ' -f2)
    
    cd ..
    
    if [ "$changed_lines" -gt "0" ]
    then
      echo "Untracked Changes - won't make the clone slimmer in that situation"
      exit 1
    fi
    
    if [ "$ahead_of_remote" -gt "0" ]
    then
      echo "Local commits not in the remote - won't make the clone slimmer in that situation"
      exit 1
    fi
    
    cd .git_slimmer
    git clone $remote_url --no-checkout --depth 1 foo
    cd foo
    latest_sha_for_new=$(git log | head -n 1 | cut -d' ' -f2)
    cd ../..
    
    if [ "$latest_sha" == "$latest_sha_for_new" ]
    then
      mv "$1/.git" "$1/.gitOLD"
      mv ".git_slimmer/foo/.git" "$1/"
      rm -rf "$1/.gitOLD"
      cd "$1"
      git add .
      cd ..
    else
      echo "SHA from head of existing get clone does not match the latest one from the remote: do a git pull first"
      exit 1
    fi
    
    rm -rf .git_slimmer
    

    Use: ‘git-slimmer.sh <folder_containing_git_repo>’

    since at least git version 2.14.1 there is

    git fetch --depth 10
    

    this will cut of (or lengthen) the local history to depth of 10 (and also fetch the newest commits from origin).

    the cut commits will no longer be reachable by normal means but will still linger around in the repository. they will be removed eventually by automatic git gc.

    you can remove the dangling objects immediately. for example to free disk space. to do so you have to remove all references that might hold the old commits. that is mostly the reflog and the tags. then run git gc.

    clear the reflog:

    git reflog expire --prune=all --all
    

    remove all tags:

    git tag -l | xargs git tag -d
    

    delete all dangling objects:

    git gc --prune=all
    
    Git Baby is a git and github fan, let's start git clone.