How does Git(Hub) handle possible collisions from short SHAs?
Both Git and GitHub display short versions of SHAs — just the first 7 characters instead of all 40 — and both Git and GitHub support taking these short SHAs as arguments.
git show 962a9e8
- change git commit message and keep SHA the same
- Difference in SHA Hash between git hash-object & git hash-object -t
- How to find a Github file 's SHA blob
- How to get SHA of file for specific Git commit?
- How to cherry-pick the last sha from another branch in Git with 1 command?
- Git: How to keep SHAs in commit messages up-to-date after rebase?
Given that the possibility space is now orders of magnitude lower, “just” 268 million, how do Git and GitHub protect against collisions here? And how do they handle them?
3 Solutions collect form web for “How does Git(Hub) handle possible collisions from short SHAs?”
These short forms are just to simplify visual recognition and to make your life easier. Git doesn’t really truncate anything, internally everything will be handled with the complete value. You can use a partial SHA-1 at your convenience, though:
Git is smart enough to figure out what commit you meant to type if you provide the first few characters, as long as your partial SHA-1 is at least four characters long and unambiguous — that is, only one object in the current repository begins with that partial SHA-1.
I have a repository that has a commit with an id of
git show 00018
shows the revision, but
git show 0001
error: short SHA1 0001 is ambiguous. error: short SHA1 0001 is ambiguous. fatal: ambiguous argument '0001': unknown revision or path not in the working tree. Use '--' to separate paths from revisions
(If you’re curious, it’s a clone of the git repository for git itself; that commit is one that Linus Torvalds made in 2005.)
Two notes here:
If you type y anywhere on the GitHub page displaying a commit, you will see the full 40 bytes of said commit.
That illustrates emboss’s point: GitHub doesn’t truncate anything.
And 7 bits isn’t enough since 2010 anyway.
See commit dce9648 by Linus Torwalds himself (Oct 2010, git 184.108.40.206):
The default of 7 comes from fairly early in git development, when seven hex digits was a lot (it covers about 250+ million hash values). Back then I thought that 65k revisions was a lot (it was what we were about to hit in BK), and each revision tends to be about 5-10
new objects or so, so a million objects was a big number.
(BK = BitKeeper)
These days, the kernel isn’t even the largest git project, and even the kernel has about 220k revisions (much bigger than the BK tree ever was) and we are approaching two million objects. At that point, seven hex digits is still unique for a lot of them, but when we’re
talking about just two orders of magnitude difference between number of objects and the hash size, there will be collisions in truncated hash values. It’s no longer even close to unrealistic – it happens all the time.
We should both increase the default abbrev that was unrealistically small, and add a way for people to set their own default per-project in the git config file.