Does GitHub handle large repositories well?
My company’s SVN code repository when download is about 250MBs. With the years of changes it is probably quite large (4X that size). If we moved all this over to GitHub, would each user have to download the 250MBs or would they have to download 1GB or more to get the full history of the repository?
3 Solutions collect form web for “Does GitHub handle large repositories well?”
If we moved all this over to GitHub, would each user have to download the 250MBs or would they have to download 1GB or more to get the full history of the repository?
Each of the users, when cloning for the first time, would have to retrieve the whole repository. However, git server side implementation would send a “compressed” version of the repository as a packfile. So the transmitted data would weight much less than 1Gb.
Each successive fetch/pull operation would only retrieve the new git objects (Commits, Trees and Blobs) that the server knows about and that are not already on the client’s local repository. Those would also be sent over the wire as a packfile.
Although @akonsu is correct when stating you can clone a shallow version of your repository (ie. without the whole history), that would prevent the user from further interacting with a GitHub hosted main upstream repository.
Indeed the git clone documentation states: “A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it)”
you can clone without the history: git clone –depth 1 your_repo_url (see https://git.wiki.kernel.org/index.php/GitFaq#How_do_I_do_a_quick_clone_without_history_revisions.3F)
if there are lots of versions of lots of files your object database will become larger and larger over time.
by default git used the zlib compression algorithm to store individual blobs. but it’s possible to tell git to merge multiple objects into one pack file, which also used the delta compression method to save space. your entire history still exists, it will just take a few moments longer to unpack than execute commands based on previous states (e.g. checkout older versions). but i need to stress how minor this is. honestly, less than a fraction of a second increase.
more info on packfiles from the progit book: http://git-scm.com/book/en/Git-Internals-Packfiles