Find all binary files in git HEAD

I have a huge git repo that eventually want to clean up with bfg.
But first, I want to track down and remove files in the HEAD which git treats as binary…

So, what i’m looking for is a command to find all files in the HEAD that git treats as binary.

  • Git submodules confusion
  • What is the difference between git add -A and git add --update :/ for github?
  • Migrating a large SVN repo to git
  • Should I check-in iOS xcuserdata?
  • How to disable relative date display on gitweb?
  • Git push - Out of memory, calloc failed and pack-objects died with strange error
  • These didn’t help:

    • List all text (non-binary) files in repo < I am looking for binary files. not text files.
    • Git find all binary files in history < I only care about the HEAD
    • http://git.661346.n2.nabble.com/git-list-binary-and-or-non-binary-files-td3506370.html < I tried those commands and they don’t help.

    Thank you in advance for your help.

  • Git fetch single file from remote repository programatically
  • Git pull, fatal: loose object
  • Reverting push to remote with sourcetree
  • git subtree push lost my commit message
  • Configure port for git repository in jenkins
  • Can't import android project from git
  • 4 Solutions collect form web for “Find all binary files in git HEAD”

    diff <(git grep -Ic '') <(git grep -c '') | grep '^>' | cut -d : -f 1 | cut -d ' ' -f 2-
    

    Breaking it down:

    • git grep -c '' prints the names and line counts of each file in the repository. Adding the -I option makes the command ignore binary files.
    • diff <(cmd1) <(cmd2) uses process substitution to provide diff with named pipes through which the output of cmd1 and cmd2 are sent.
    • The grep and cut commands are used to extract the filenames from the output of diff.
    grep -Fvxf <(git grep --cached -Il '';
                 git config --file .gitmodules --get-regexp path | awk '{ print $2 }';) \
               <(git ls-files)
    

    Explanation:

    • grep -Fvxf: filter all lines present in the first file from the second. See: Remove Lines from File which appear in another File
    • git grep part: list all text (non-binary) files. See: List all text (non-binary) files in repo
    • git config part: get rid of submodules: List submodules in a git repository
    • git ls-files: list all files

    Or you chould do a for loop on git ls-files with How to determine if Git handles a file as binary or as text?

    Here is the same script for Windows using PowerShell:

    $textFiles = git grep -Il .
    $allFiles = git ls-files
    
    foreach ($line in $allFiles){
        if ($textFiles -notcontains $line) {
            $line;
        }
    }
    

    Or in the short form:

    $textFiles = git grep -Il .
    git ls-files | where { $textFiles -notcontains $_ }
    

    That takes O(n^2) to complete, and this is faster approach using hashtables:

    $files = @{}
    git ls-files | foreach { $files[$_] = 1 }
    git grep -Il . | foreach { $files[$_] = 0 }
    $files.GetEnumerator() | where Value -EQ 1 | sort Name | select -ExpandProperty Name
    

    That takes O(n) to complete.

    A simplified solution based on the answer of @jangler (https://stackoverflow.com/a/30690662/808101)

    comm -13 <(git grep -Il '' | sort -u) <(git grep -al '' | sort -u)
    

    Explanation:

    1. git grep

      • -l Ask to only print the filename of file matching the pattern '' (which should match with every line of every file)
      • -I This option makes the command ignore binary files
      • -a This option force to process binary files as if they were text
    2. sort -u Sort the result of the grep, since comm only process sorted files

    3. comm -13 List the files that are in common with both git grep

    Git Baby is a git and github fan, let's start git clone.