How to optimize git update-index?
I have a rather large repository (11 GB, 900,000+ files) and having trouble with iterating within reasonable time. After a bit of profiling, the real bottleneck seems to be git update-index:
$ time git update-index --replace $path > /dev/null real 0m5.766s user 0m1.984s sys 0m0.391s
That makes an unbearable number of days to get the list of files. Is there any way to speed the update-index operation up?
For what it’s worth, I’m running cygwin on Windows 7.
EDIT: To put more context to the question.
The large repository comes from an SVN import, and contains a number of binaries that shouldn’t be in the repository. However, I want to keep the commit history and commit logs. In order to do that, I’m trying to replace the contents of the binaries with file hashes, which should compact the repository and allow me to retain history.
One Solution collect form web for “How to optimize git update-index?”
You want to use the BFG Repo-Cleaner, a faster, simpler alternative to
git-filter-branch specifically designed for removing large files from Git repos.
Download the BFG jar (requires Java 6 or above) and run this command:
$ java -jar bfg.jar --strip-blobs-bigger-than 1MB my-repo.git
Any files over 1MB in size (that aren’t in your latest commit) will be removed from your Git repository’s history, and replaced with a
.git-id file that contains the old Git hash-id of the original file (which matches the replace contents of the binaries with file hashes requirement of the question).
You can then use
git gc to clean away the dead data:
$ git gc --prune=now --aggressive
The BFG is typically 10-50x faster than running
git-filter-branch and the options are tailored around these two common use-cases:
- Removing Crazy Big Files
- Removing Passwords, Credentials & other Private data
Full disclosure: I’m the author of the BFG Repo-Cleaner.