Does git similarity index 75% mean git thinks I have renamed a file?
I am using GitExtensions with Visual Studio and when go to commit my change, it says I have added two new files and has a third file (a .resx file) which it seems to be comparing with another .resx file and it says they have similarity index 75%
The files are not related, but a large portion of the file is standard template that is in all .resx files so I can understand them being treated as similar.
So question is – Does this message mean that git thinks I have renamed the older file and will it mess up if I continue with the commit as is?
2 Solutions collect form web for “Does git similarity index 75% mean git thinks I have renamed a file?”
Git does not store diffs.1 Instead, each commit stores complete files (as listed in the index-at-the-time-the-commit-is-made), as a sort of stand-alone entity. To retrieve a previous commit, git simply finds the commit ID and extracts the associated files.2
The “similarity index” and any presentation of “a file was renamed” or “a file was copied” are simply git guessing at what happened, in an attempt to make things clearer to the human, or present the shortest way to get from one commit to another, for instance. You are correct that the template match is misleading git at this point, but “this point” is the “presentation to user of how to get from Point A to Point B”, not “what was or will be stored”.
git status command—presumably Visual Studio, which I’ve never used, just runs
git status for you—makes git produce a new comparison, this time “most recent/current commit” (
HEAD) vs “current index”, i.e., “what will be committed if you commit now”. In fact, you actually get two comparisons:
HEAD-vs-index, and index-vs-work-tree. This gets you git’s best guess at what happened—including computing that similarity index, so that it can guess whether some file(s) were renamed.
Note that once you have any two given commits to
git diff, you can specify different copy and/or rename thresholds to get “what happened” shown to you in different ways. Git does this on demand, by extracting (mostly in-memory) the two commits, comparing them, computing each similarity index (again) at that time, and making its best guess at copies or renames from there.
1This glosses over git’s “pack” files, which do use deltas. However, pack files are generally constructed long after a commit (or series of commits). New commits always make new, stand-alone object files, which may be packed and re-packed in various ways later.
2To speed up operation, git will use the current index (cache) information to figure out a quick way to change from “commit currently checked out” (as noted by the index/cache) to “new commit to be checked out” (given as an argument to
git checkout). In particular, as long as you have not modified your work-tree so that the index is current, this allows
git checkout to avoid touching or even inspecting most files when switching between similar branches or commits.
You don’t need to worry about either of these footnotes, though: it’s all handled automatically, behind the scenes. (Footnote two can come into play when you start using
--work-tree= arguments, as people do in fancy auto-deployment scripts with bare repositories on servers. However, even there it usually just works, all automatically.)
Git does not calculate
diff based on the similarity index. Instead it will store the hash value for the files.
TL;DR: You can commit as is without worrying about git thinking you simply renamed the file.