How safe is it to host sensitive data on repository sites like github, bitbucket, etc.?
This is just a question out of curiosity. I am wondering how safe it is generally considered to host sensitive data on repository websites like Github, Bitbucket, etc.? Is it safe enough to get rid of all code on local machines and just store it all on there? How about safety in the sense of keeping company secrets? I notice these sites tout big companies like Google and Yahoo use their services, but do these big companies actually store their trade secrets and important company code on websites like this?
Github has a page (http://help.github.com/security), which has some interesting information, that shows they are marketing it as something fool proof like I described. But in practice, do big companies like Google really find that their proprietary secrets and massive amounts of code are really safe from prying eyes and disastrous occurrences on sites like these?
- Can I run a password-protected, read-only git server?
- Compiling workflow with version control
- Git: checking out a file from a previous commit and amending it to HEAD
- How to Know Gitolite Version?
- Git: How to remove file from index without deleting files from any repository
- How can I author changes that are not prone to merge conflicts?
2 Solutions collect form web for “How safe is it to host sensitive data on repository sites like github, bitbucket, etc.?”
As always, it depends 🙂
There can be two different meanings of “safety”:
- Can I trust the hoster to keep my stuff (intellectual property, company secrets…) private?
- What happens to my code if the hoster suddenly goes out of service?
For 1., there is no 100% guarantee.
Of course, the big hosters like GitHub and Bitbucket won’t share your code intentionally with third parties, but there is always the possibility that some hacker manages to get the content of your private repositories.
(this could happen to you as well if you host your code internally in your company, but this is unlikely, because unless your company is as known as, say, Google, the chance of someone trying to attack your company is much smaller than the chance of someone trying to attack a well-known public hoster).
Plus, you have to consider the laws of the country where the hoster resides.
A few weeks ago I read somewhere that if your hoster is in the USA, they can be forced by law to give your data to the US government under certain circumstances, and they are not even allowed to tell you about that (I don’t remember the name of the law, but maybe someone else knows).
I guess that all this causes most “big” companies to not host their code on a public service (my company is mid-sized, and we host our code private as well).
By the way, as you mentioned Google:
I’m sure that especially Google does not use Bitbucket or GitHub. They have the complete infrastructure for project hosting themselves, so I guess they are using it internally, too. Why should they use an external service? It’s in the cloud, yes…but it’s their cloud.
Concerning 2.: it’s unlikely that GitHub or Bitbucket will go bankrupt tomorrow, but you never know.
IMO it’s your responsibility to take backups of your code yourself.
The nature of DVCS makes sure that you have some local copies of your code anyway, but it might be difficult to search lots of developer machines for the newest versions of all of your projects.
I do this by pulling all my repositories to my local machine regularly (I wrote a tool that can do this for Bitbucket, which I use for my private projects)
I looked at GitHub a little while ago, and compared with our previous git hosting (which was on our own linux virtual server), I’m not overly impressed with the security. We do use it, but only for projects were keeping the source code private isn’t a major concern.
- There’s no company control at all over the user accounts. We control which users have access to our repository, but there’s no password policies, the users pick their own email addresses, etc.
- There’s no way to limit access by IP address
- Passwords can only be reset by the user
- Compromising the users email account (which we’re unable to see what account they’ve set it to) also results in a compromise of their github account, as they use an email challenge to reset forgotten passwords.
- There’s no access logs (there is an audit trail for most or possibly all changes, but no logging at all for access)
- Access to the web front end is only password protected, so is vulnerable to password reuse from other sites and to some extent to brute forcing (github’s statement about what they do for failed logins is pretty unclear).
One or two of these we could live, but in combination they basically make github completely unsuitable.
They have added 2 factor authentication recently, and there is an API so that organisations can at least check if users with access to their repositories have two factor authentication enabled. Whilst I don’t feel this is really the best solution, it probably just about moves github into being secure enough that it can be considered for private repos.
You can run an enterprise install instead, which presumably significantly improves security – but the cost difference between that and a standard github company account is staggering, and it would probably mean you miss out on all the third party tools that integrate with github.
GitHub have recently announced new business plans with extra features – this could solve ‘1’/’4’/’5′. (Though the ‘uptime guarantee’ that’s part of it is pretty laughable – not even “four 9s”, and excludes scheduled maintenance and anything they deem ‘outside their reasonable control’ – and it’s not an actual guarantee, it’s just a small credit against your next bill which is capped to be no more than a third of your bill. Basically very carefully worded marketing weasel words instead of any kind of commitment from them.)