How to optimize git update-index?

I have a rather large repository (11 GB, 900,000+ files) and having trouble with iterating within reasonable time. After a bit of profiling, the real bottleneck seems to be git update-index:

$ time git update-index --replace $path > /dev/null

real    0m5.766s
user    0m1.984s
sys     0m0.391s

That makes an unbearable number of days to get the list of files. Is there any way to speed the update-index operation up?

  • Using Git over non-standard SSH ports
  • Interactively cherry-picking commits with rebase
  • How to mske TeamCity trigger a build only for PRs to master branch?
  • How do I undo a heroku create that was run over existing app?
  • How to REALLY delete a git branch (i.e. remove all of its objects/commits)?
  • Version control for Google App Engine
  • For what it’s worth, I’m running cygwin on Windows 7.

    EDIT: To put more context to the question.

    The large repository comes from an SVN import, and contains a number of binaries that shouldn’t be in the repository. However, I want to keep the commit history and commit logs. In order to do that, I’m trying to replace the contents of the binaries with file hashes, which should compact the repository and allow me to retain history.

  • Bash function to find all Git commits in which a file (whose name matches a regex) has *changed*
  • To checkout on to a branch which contains '&' in its name
  • Is there a way to recover a commit that was accidentally skipped during a rebase?
  • Can I Tag Git commit with VisualStudio 2013
  • can I empty a remote git repository?
  • files exported from Photoshop always register as changed in git
  • One Solution collect form web for “How to optimize git update-index?”

    You want to use the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch specifically designed for removing large files from Git repos.

    Download the BFG jar (requires Java 6 or above) and run this command:

    $ java -jar bfg.jar  --strip-blobs-bigger-than 1MB  my-repo.git
    

    Any files over 1MB in size (that aren’t in your latest commit) will be removed from your Git repository’s history, and replaced with a .git-id file that contains the old Git hash-id of the original file (which matches the replace contents of the binaries with file hashes requirement of the question).

    You can then use git gc to clean away the dead data:

    $ git gc --prune=now --aggressive
    

    The BFG is typically 10-50x faster than running git-filter-branch and the options are tailored around these two common use-cases:

    • Removing Crazy Big Files
    • Removing Passwords, Credentials & other Private data

    Full disclosure: I’m the author of the BFG Repo-Cleaner.

    Git Baby is a git and github fan, let's start git clone.