Find all binary files in git HEAD

I have a huge git repo that eventually want to clean up with bfg.
But first, I want to track down and remove files in the HEAD which git treats as binary…

So, what i’m looking for is a command to find all files in the HEAD that git treats as binary.

  • Dockerfile versioning best practice
  • How to undelete a branch on github?
  • How do I keep GIT repositories inside Dropbox?
  • How should git submodules be set up for laravel 4 composer packages?
  • How to commit one file at a time using Git?
  • Git: move changes between branches without working directory change
  • These didn’t help:

    • List all text (non-binary) files in repo < I am looking for binary files. not text files.
    • Git find all binary files in history < I only care about the HEAD
    • http://git.661346.n2.nabble.com/git-list-binary-and-or-non-binary-files-td3506370.html < I tried those commands and they don’t help.

    Thank you in advance for your help.

  • How can I build a git tag in TeamCity?
  • How to transform an old branch into a tag on GitHub
  • Is gradlew mandatory for travis CI to work?
  • Capistrano 3 pulling command line arguments
  • Advanced git scenarios
  • Can't resolve rebase conflict
  • 4 Solutions collect form web for “Find all binary files in git HEAD”

    diff <(git grep -Ic '') <(git grep -c '') | grep '^>' | cut -d : -f 1 | cut -d ' ' -f 2-
    

    Breaking it down:

    • git grep -c '' prints the names and line counts of each file in the repository. Adding the -I option makes the command ignore binary files.
    • diff <(cmd1) <(cmd2) uses process substitution to provide diff with named pipes through which the output of cmd1 and cmd2 are sent.
    • The grep and cut commands are used to extract the filenames from the output of diff.
    grep -Fvxf <(git grep --cached -Il '';
                 git config --file .gitmodules --get-regexp path | awk '{ print $2 }';) \
               <(git ls-files)
    

    Explanation:

    • grep -Fvxf: filter all lines present in the first file from the second. See: Remove Lines from File which appear in another File
    • git grep part: list all text (non-binary) files. See: List all text (non-binary) files in repo
    • git config part: get rid of submodules: List submodules in a git repository
    • git ls-files: list all files

    Or you chould do a for loop on git ls-files with How to determine if Git handles a file as binary or as text?

    Here is the same script for Windows using PowerShell:

    $textFiles = git grep -Il .
    $allFiles = git ls-files
    
    foreach ($line in $allFiles){
        if ($textFiles -notcontains $line) {
            $line;
        }
    }
    

    Or in the short form:

    $textFiles = git grep -Il .
    git ls-files | where { $textFiles -notcontains $_ }
    

    That takes O(n^2) to complete, and this is faster approach using hashtables:

    $files = @{}
    git ls-files | foreach { $files[$_] = 1 }
    git grep -Il . | foreach { $files[$_] = 0 }
    $files.GetEnumerator() | where Value -EQ 1 | sort Name | select -ExpandProperty Name
    

    That takes O(n) to complete.

    A simplified solution based on the answer of @jangler (https://stackoverflow.com/a/30690662/808101)

    comm -13 <(git grep -Il '' | sort -u) <(git grep -al '' | sort -u)
    

    Explanation:

    1. git grep

      • -l Ask to only print the filename of file matching the pattern '' (which should match with every line of every file)
      • -I This option makes the command ignore binary files
      • -a This option force to process binary files as if they were text
    2. sort -u Sort the result of the grep, since comm only process sorted files

    3. comm -13 List the files that are in common with both git grep

    Git Baby is a git and github fan, let's start git clone.