Language agnostic way to remove sensitive information from files before committing to Git
What are the recommended procedures for automatically removing sensitive information from files before committing to Git?
For example, say I have the following in a file called
- .gitignore, git add, and files mysteriously disappearing
- How does git blame determine who edited a line of a file?
- Updating a working tree to Git's HEAD
- How is Git Distributed Source Code Management?
- Purging redundant branches from git
- How to use SVN, Branch? Tag? Trunk?
personal_stuff = "some personal stuff"
How can I automatically remove the personal information from
code.rb before committing to version control? The solution should be language-agnostic.
5 Solutions collect form web for “Language agnostic way to remove sensitive information from files before committing to Git”
Using a “clean filter” for specific files is another way to go.
Update an example, as demanded:
Add a “clean” filter to the local repository configuration, consisting of one call to
sed. This could be a path to a shell script or to any program which consumes data on its standard input and writes processed data to its standard output:
$ git config --add filter.classify.clean \ 'sed -e '\''s!\<\(personal_stuff\s\+=\s\+\)"[^"]\+"!\1"SECRET"!'\'
Now Register our filter to be applied for files which names match
$ cat >.gitattributes *.rb filter=classify ^D
Create a couple of test files:
$ cat >test.rb aaa bbb personal_stuff = "sensitive data" ccc ^D $ cat >test.txt aaa xxx personal_stuff = "super secret" yyy ^D
Now add and commit them:
$ git add test.* $ git commit -q -m 'root commit' ...
Now see what has happened to the contents of
test.rb, that is, what does its blob in the just recorded commit contains:
$ git cat-file -p HEAD tree 7adaac5cc23c69ff9459635d666ca63ffb9757aa author Konstantin Khomoutov <flatworm@...ourceforge.net> 1368453302 +0400 committer Konstantin Khomoutov <flatworm@...ourceforge.net> 1368453302 +0400 root commit $ git cat-file -p 7adaa 100644 blob e49630236eb74d8c7ccbcccc83c7c18af0cb4b96 test.rb 100644 blob aecd9ade78e18d5b5ded99a1e41cf366fa52e619 test.txt $ git cat-file -p e496302 aaa bbb personal_stuff = "SECRET" ccc
Verify this did not affect the work tree:
$ cat test.rb aaa bbb personal_stuff = "sensitive data" ccc
You can write your own pre-commit hook. This hook will scan your code and decline commit if it can find something that it does not like.
Writing actual hook can be a challenge, you should be able to find some examples online.
One solution is to move your confidential informations to an external file which will be ignored.
There is two ways to ignore a file in git:
- Using the
- Using the git update-index command (temporary)
In your case, the more flexible solution would be:
- Create an empty files with fake personnal stuff (like
password = "mypassword1234"or whatever…)
- Commit and push this file
- Ignore its futur modifications with
git update-index --no-assume-unchanged your_file
Use ‘.gitattributes’ with ‘.gitfilters’. Here is an example with ‘rcs-keywords’; you’d follow the same structure but with filters for your sensitive data.
Your attributes files maps from file glob to filter, as such:
# .gitattributes # Map file extensions to git filters *.h filter=rcs-keywords *.c filter=rcs-keywords
Your .gitfilters files implement a ‘clean’ and ‘smudge’ filter. For the above ‘rcs-keywords’ filters this is:
$ ls .gitfilters/ rcs-keywords.clean* rcs-keywords.smudge*
The ‘clean’ filter removes stuff prior to commit; the ‘smudge’ filter adds stuff back on checkout.
The filters are any script. Again, for ‘rcs-keywords’ the ‘clean’ filter looks like:
#!/usr/bin/perl -p s/\$Id[^\$]*\$/\$Id\$/; s/\$Date[^\$]*\$/\$Date\$/;
Date information is removed. The associated ‘smudge’ filter adds that information back in.
Lastly, you configure git as
git config --add filter.rcs-keywords.clean .gitfilters/rcs-keywords.clean git config --add filter.rcs-keywords.smudge .gitfilters/rcs-keywords.smudge
For your case, the clean filter axes the sensitive data and the smudge filter adds it back in.
If you can’t use .gitignore as you need to make parallel changes in the same file(as found in your comments) then one option is
git add -p Using this you can add or skip accordingly.
The problem using the above git command here is, it will be more of a manual process. I guess you may not find any other automated approach for you problem.