Github Repo Corruption – Sha1 Collision

Yesterday one of my team’s checkins corrupted our github repo. On github, they were showing this error:

$ git fsck
error: sha1 mismatch 87859f196ec9266badac7b2b03e3397e398cdb18

error: 87859f196ec9266badac7b2b03e3397e398cdb18: object corrupt or missing
missing blob 87859f196ec9266badac7b2b03e3397e398cdb18

When I tried to pull onto a different machine, I got this:

  • Heroku : Username for 'https://git.heroku.com': git
  • I'm using Git - how do people handle working on the same code but on different computers (eg. Work, then Home)?
  • Git Pull vs Git fetch Which one is Preferable?
  • Working with git from 2 laptops with no bare repo
  • Best Practice for Adding .gitignore to Repo
  • Android Studio can't upload the project in Github
  • Hyperion:Convoy-clone saalon$ git fsck
    warning in tree 5b7ff7b4ac7039c56e04fc91d0bf1ce5f6b80a67: contains zero-padded file modes
    warning in tree 5db54a0cdcd5775c09365c19c061aff729579209: contains zero-padded file modes
    broken link from    tree 6697c12387f8909cfe7250e9d5854fd6713d25c1
                  to    blob 87859f196ec9266badac7b2b03e3397e398cdb18
    dangling tree 144becf61ae14cec34b6af1bd8a0cf4f00d346d1
    missing blob 87859f196ec9266badac7b2b03e3397e398cdb18
    

    (I get the zero-padded file warnings on both the offending machine and the second machine I pulled to. I get the broken link error only on the second machine).

    I tracked down the blob to the specific file that’s the problem, but after going through the Git FAQ’s process on fixing a broken link error, I had no luck.

    I went through Github’s documentation and found a process to delete the master repo from github and repush from the offending machine. I tried this, but when I went to re-push the master branch, I got the following error:

    fatal: SHA1 COLLISION FOUND WITH 87859f196ec9266badac7b2b03e3397e398cdb18 !
    error: unpack failed: index-pack abnormal exit
    

    I’ve got an open ticket with Github but it’s taking them forever to respond. Any idea what the problem might be? Is there a problem at Github that they need to fix, or is there something I can do to take care of this?

  • git: Unable to index file - permission denied
  • Am I doing it wrong? Merging SVN changes from trunk into a git branch. Using merge --squash
  • Cannot install openproject in Windows7 64 bit - undefined method `dlopen' for Fiddle:Module
  • git tag -l not displaying the most recent releases
  • deploying a website/webapp via git/gitolite permissions error
  • Git branch is ahead of origin/master
  • 3 Solutions collect form web for “Github Repo Corruption – Sha1 Collision”

    After some back and forth with GitHub (and some troubleshooting help from ssmir), this problem is split between a thing I needed to solve and a thing Github needed to solve.

    What needed to be solved on my end was this:

    Hyperion:Convoy-clone saalon$ git fsck
    warning in tree 5b7ff7b4ac7039c56e04fc91d0bf1ce5f6b80a67: contains zero-padded file modes
    warning in tree 5db54a0cdcd5775c09365c19c061aff729579209: contains zero-padded file modes
    broken link from    tree 6697c12387f8909cfe7250e9d5854fd6713d25c1
                  to    blob 87859f196ec9266badac7b2b03e3397e398cdb18
    dangling tree 144becf61ae14cec34b6af1bd8a0cf4f00d346d1
    missing blob 87859f196ec9266badac7b2b03e3397e398cdb18
    

    If you notice, there’s a broken link from a tree to a blob. What this is saying is that there’s a folder that should have a file in it, but there’s not actually a file in it. Someone added a file to their local repo and pushed it, but the file itself didn’t end up in the remote repo. Now every time someone pulls down the repo themselves, they get the same broken git filesystem link.

    The instructions here do a good job of explaining what to do if you get the problem, but in the midst of the actual crisis, I found the description a little lacking in context. It gave a clear list of steps but not a great idea of the why – at least, not for someone who’s still a little new to Git.

    Basically, what you need to do is figure out what file that missing blob is, track down what computer checked it in last and go to work on their local repo. Their computer has both the SHA1 link to the file and the contents of the file itself. Everyone else has a pile of broken.

    So first, we need to find out what blobs/files are in that tree. To do that, you use git ls-tree.

    git ls-tree 6697c12387f8909cfe7250e9d5854fd6713d25c1
    

    In my case, that listed only one file: the file that was corrupt. In your case, it might give a whole list of files, in which case what you need to do is match up the blob/file’s SHA1 hash to the one mentioned in the broken link error. In my case, it was this:

    100644 blob 87859f196ec9266badac7b2b03e3397e398cdb18    short_description.html
    

    Notice that it doesn’t give you the directory the file is actually supposed to be in. That’s kind of frustrating, but with a little detective work you can find it. The file might be uniquely named, in which case you can just do a find for the file name. Or you can look through your commit history and see when and where a file called short_description.html was placed.

    Here’s the part the GitFaq wasn’t entirely clear on. They say to recreate the file, then run this command:

    git hash-object -w db/content/page_parts/venues/86/short_description.html 
    

    But what is that doing?

    Basically, when you run git hash-object is returns the sha1 hash for that file. And (and here’s the important part) it creates a blob from the file, and a blob was just what we were missing. Here’s the part it’s not clear on, though: In order for this to work, the file needs to match exactly the file that initially caused the problem. In other words, if that short_description.html file had content in it, you can’t just create a blank file and run hash-object. If you do, the blob’s sha1 hash won’t match the one git is missing, and that broken link will still be broken.

    This is why you need to be on the offending machine’s repo. Everyone else has a link but not file and no blob. The offending machine (hopefully) still has the original file. In my case, they didn’t have the original file (in my flailing, it had been deleted inadvertently), but when I looked at their commit history on their box, the diff contained the content of the file that had been committed but never made it to github. I copied that out, recreated the file and ran hash-object. The next time I ran git fsck, the broken link was gone.

    One note: technically, this problem can be fixed on someone else’s repo, provided you can recreate the missing file. In my case, I actually had the file created on the offending machine, but had it e-mailed to me and fixed the problem in a clean repo on a different system. The important thing is recreating the file exactly so it generates the same sha1 hash that the repo is missing.

    As for the SHA1 collision problem I got when I tried to push to github? This ugly sucker?

    fatal: SHA1 COLLISION FOUND WITH 87859f196ec9266badac7b2b03e3397e398cdb18 !
    error: unpack failed: index-pack abnormal exit
    

    That was a problem in github’s side that they needed to fix.

    Just a reminder. A small likelihood of something happening is not the same as it not being able to happen. You can get hash collisions with git’s use of sha-1. Once you have two files that collide, the likelihood becomes 100%. At that point, there’s slim consolation from the theoretical likelihood. Add a space to one and you’ll be fine though.

    I ran into the same issue and ran:

    git prune  
    git gc  
    

    which mentioned

    error: bad ref for refs/remotes/origin/ticketName

    so I removed the reference and that fixed the issue:

    rm .git/refs/remotes/origin/ticketName
    
    Git Baby is a git and github fan, let's start git clone.