How to find the N largest files in a git repository?

I wanted to find the 10 largest files in my repository. The script I came up with is as follows:

REP_HOME_DIR=<top level git directory>
max_huge_files=10

cd ${REP_HOME_DIR}
git verify-pack -v ${REP_HOME_DIR}/.git/objects/pack/pack-*.idx | \
  grep blob | \
  sort -r -k 3 -n | \
  head -${max_huge_files} | \
  awk '{ system("printf \"%-80s \" `git rev-list --objects --all | grep " $1 " | cut -d\" \" -f2`"); printf "Size:%5d MB Size in pack file:%5d MB\n", $3/1048576,  $4/1048576; }'
cd -

Is there a better/more elegant way to do the same?

  • Maven plugin for versioning and minifying javascript
  • Git - remote: Repository not found
  • Extracting git information in rstudio
  • Git force push to github rejected for large file that is deleted and no longer tracked
  • Deploy my app on heroku
  • Undoing my mistake using git reset --hard
  • By “files” I mean the files that have been checked into the repository.

  • Tracking a remotes/p4/master branch in git
  • JIRA + fisheye + git -> how to repair missing links?
  • How do you annotate a branch?
  • git pull from master into the development branch
  • Who pushed commits into remote repository?
  • Maintain different directory structure in different git branches
  • 4 Solutions collect form web for “How to find the N largest files in a git repository?”

    I found another way to do it:

    git ls-tree -r -t -l --full-name HEAD | sort -n -k 4 | tail -n 10
    

    Quoted from: SO: git find fat commit

    How about

    git ls-files | xargs ls -l | sort -nrk5 | head -n 10
    
    git ls-files: List all the files in the repo
    xargs ls -l: perform ls -l on all the files returned in git ls-files
    sort -nrk5: Numerically reverse sort the lines based on 5th column
    head -n 10: Print the top 10 lines
    

    You can also use du – Example: du -ah objects | sort -n -r | head -n 10 . du to get the size of the objects, sort them and then picking the top 10 using head.

    You can use find to find files larger than a given threshold, then pass them to git ls-files to exclude untracked files (e.g. build output):

    find * -type f -size +100M -print0 | xargs -0 git ls-files
    

    Adjust 100M (100 megabytes) as needed until you get results.

    Minor caveat: this won’t search top-level “hidden” files and folders (i.e. those whose names start with .). This is because I used find * instead of just find to avoid searching the .git database.

    I was having trouble getting the sort -n solutions to work (on Windows under Git Bash). I’m guessing it’s due to indentation differences when xargs batches arguments, which xargs -0 seems to do automatically to work around Windows’ command-line length limit of 32767.

    Git Baby is a git and github fan, let's start git clone.