Trying to fix line-endings with git filter-branch, but having no luck

I have been bitten by the Windows/Linux line-ending issue with git. It seems, via GitHub, MSysGit, and other sources, that the best solution is to have your local repos set to use linux-style line endings, but set core.autocrlf to true. Unfortunately, I didn’t do this early enough, so now every time I pull changes the line endings are borked.

I thought I had found an answer here but I can’t get it to work for me. My Linux command line knowledge is limited at best, so i am not even sure what the “xargs fromdos” line does in his script. I keep getting messages about no such file or directory existing, and when I manage to point it to an existing directory, it tells me I don’t have permissions.

  • What's the difference between overwriting a file and checking it out?
  • Git - origin/master diverged - throwing away abandoned commit
  • Remote origin already exists on 'git push' to a new repository
  • Have JIRA send mails to watchers on commit from Stash on a ticket
  • Number of repositories for small, but multidirectory, project?
  • Unstage only new files using Git
  • I’ve tried this with MSysGit on Windows and via the Mac OS X terminal. Any help would be GREATLY appreciated.

  • Git: whole file to stdout
  • Cannot Create A New Rails Project Due To chmod
  • index file smaller than expected
  • Find all the direct descendants of a given commit
  • How is the Git config evaluated when commit-ing?
  • Amazon EC2, deployment with capistrano, how to?
  • 8 Solutions collect form web for “Trying to fix line-endings with git filter-branch, but having no luck”

    The git documentation for gitattributes now documents another approach for “fixing” or normalizing all the line endings in your project. Here’s the gist of it:

    $ echo "* text=auto" >>.gitattributes
    $ rm .git/index     # Remove the index to force git to
    $ git reset         # re-scan the working directory
    $ git status        # Show files that will be normalized
    $ git add -u
    $ git add .gitattributes
    $ git commit -m "Introduce end-of-line normalization"
    

    If any files that should not be
    normalized show up in git status,
    unset their text attribute before
    running git add -u.

    manual.pdf -text

    Conversely, text files that git does
    not detect can have normalization
    enabled manually.

    weirdchars.txt text

    The easiest way to fix this is to make one commit that fixes all the line endings. Assuming that you don’t have any modified files, then you can do this as follows.

    # From the root of your repository remove everything from the index
    git rm --cached -r .
    
    # Change the autocrlf setting of the repository (you may want 
    #  to use true on windows):
    git config core.autocrlf input
    
    # Re-add all the deleted files to the index
    # (You should get lots of messages like:
    #   warning: CRLF will be replaced by LF in <file>.)
    git diff --cached --name-only -z | xargs -0 git add
    
    # Commit
    git commit -m "Fixed crlf issue"
    
    # If you're doing this on a Unix/Mac OSX clone then optionally remove
    # the working tree and re-check everything out with the correct line endings.
    git ls-files -z | xargs -0 rm
    git checkout .
    

    My procedure for dealing with the line endings is as follows (battle tested on many repos):

    When creating a new repo:

    • put .gitattributes in the very first commit along with other typical files as .gitignore and README.md

    When dealing with an existing repo:

    • Create / modify .gitattributes accordingly
    • git commit -a -m "Modified gitattributes"
    • git rm --cached -r . && git reset --hard && git commit -a -m 'Normalize CRLF' -n"
      • -n (--no-verify is to skip pre-commit hooks)
      • I have to do it often enough that I defined it as an alias alias fixCRLF="..."
    • repeat the previous command
      • yep, it’s voodoo, but generally I have to run the command twice, first time it normalizes some files, second time even more files. Generally it’s probably best to repeat until no new commit is created 🙂

    In .gitattributes I declare all text files explicitly as having LF EOL since generally Windows tooling is compatible with LF while non-Windows tooling is not compatible with CRLF (even many nodejs command line tools assume LF and hence can change the EOL in your files).

    Contents of .gitattributes

    My .gitattributes usually looks like:

    *.html eol=lf
    *.js   eol=lf
    *.json eol=lf
    *.less eol=lf
    *.md   eol=lf
    *.svg  eol=lf
    *.xml  eol=lf
    

    To figure out what distinct extensions are tracked by git in the current repo, look here

    Issues after normalization

    Once this is done, there’s one more common caveat though.

    Say your master is already up-to-date and normalized, and then you checkout outdated-branch. Quite often right after checking out that branch, git marks many files as modified.

    The solution is to do a fake commit (git add -A . && git commit -m 'fake commit') and then git rebase master. After the rebase, the fake commit should go away.

    git status --short|grep "^ *M"|awk '{print $2}'|xargs fromdos
    

    Explanation:

    • git status --short

      This displays each line that git is and is not aware of. Files that are not under git control are marked at the beginning of the line with a ‘?’. Files that are modified are marked with an M.

    • grep "^ *M"

      This filters out only those files that have been modified.

    • awk '{print $2}'

      This shows only the filename without any markers.

    • xargs fromdos

      This takes the filenames from the previous command and runs them through the utility ‘fromdos’ to convert the line-endings.

    The “| xargs fromdos” reads from standard input (the files find finds) and uses it as arguments for the command fromdos, which converts the line endings. (Is fromdos standard in those enviroments? I’m used to dos2unix). Note that you can avoid using xargs (especially useful if you have enough files that the argument list is too long for xargs):

    find <path, tests...> -exec fromdos '{}' \;
    

    or

    find <path, tests...> | while read file; do fromdos $file; done
    

    I’m not totally sure about your error messages. I successfully tested this method. What program is producing each? What files/directories do you not have permissions for? However, here’s a stab at guessing what your it might be:

    One easy way to get a ‘file not found’ error for the script is by using a relative path – use an absolute one. Similarly you could get a permissions error if you haven’t made your script executable (chmod +x).

    Add comments and I’ll try and help you work it out!

    okay… under cygwin we don’t have fromdos easily available, and that awk substeb blows up in your face if you have any spaces in paths to modified files (which we had), so I had to do that somewhat differently:

    git status --short | grep "^ *M" | sed 's/^ *M//' | xargs -n 1 dos2unix
    

    kudos to @lloyd for the bulk of this solution

    Here’s how I fixed all line endings in the entire history using git filter-branch. The ^M character needs to be entered using CTRL-V + CTRL-M. I used dos2unix to convert the files since this automatically skips binary files.

    $ git filter-branch --tree-filter 'grep -IUrl "^M" | xargs -I {} dos2unix "{}"'
    

    Follow these steps if none of other answers works for you:

    1. If you are on Windows, do git config --global core.autocrlf true; if you are on Unix, do git config core.autocrlf input
    2. Run git rm --cached -r .
    3. Delete the file .gitattributes
    4. Run git add -A
    5. Run git reset --hard

    Then your local should be clean now.

    Git Baby is a git and github fan, let's start git clone.