Does GitHub handle large repositories well?

My company’s SVN code repository when download is about 250MBs. With the years of changes it is probably quite large (4X that size). If we moved all this over to GitHub, would each user have to download the 250MBs or would they have to download 1GB or more to get the full history of the repository?

  • Retrieving an old version of a file using Git
  • How can I know the location from where I cloned my git repository
  • “git log” output differs based on checked-out branch — where do docs explain this?
  • show fetched commits after git pull --rebase
  • Can git filter out certain lines before commit?
  • How to solve SSL certificate: self signed certificate when cloning repo from github?
  • Making a new branch and setting its upstream in one Git command
  • Clone app from pagodabox using GitHub for Windows
  • git-svn create branch off past revision
  • Automatically merge verified and tested GitHub Pull Requests
  • how to manage 2 version of a single app for different regions and languages?
  • How do I upgrade branch to lastest of an old Gerrit commit
  • 3 Solutions collect form web for “Does GitHub handle large repositories well?”

    If we moved all this over to GitHub, would each user have to download the 250MBs or would they have to download 1GB or more to get the full history of the repository?

    Each of the users, when cloning for the first time, would have to retrieve the whole repository. However, git server side implementation would send a “compressed” version of the repository as a packfile. So the transmitted data would weight much less than 1Gb.

    Each successive fetch/pull operation would only retrieve the new git objects (Commits, Trees and Blobs) that the server knows about and that are not already on the client’s local repository. Those would also be sent over the wire as a packfile.


    Although @akonsu is correct when stating you can clone a shallow version of your repository (ie. without the whole history), that would prevent the user from further interacting with a GitHub hosted main upstream repository.

    Indeed the git clone documentation states: “A shallow repository has a number of limitations (you cannot clone or fetch from it, nor push from nor into it)”

    you can clone without the history: git clone –depth 1 your_repo_url (see https://git.wiki.kernel.org/index.php/GitFaq#How_do_I_do_a_quick_clone_without_history_revisions.3F)

    if there are lots of versions of lots of files your object database will become larger and larger over time.

    by default git used the zlib compression algorithm to store individual blobs. but it’s possible to tell git to merge multiple objects into one pack file, which also used the delta compression method to save space. your entire history still exists, it will just take a few moments longer to unpack than execute commands based on previous states (e.g. checkout older versions). but i need to stress how minor this is. honestly, less than a fraction of a second increase.

    more info on packfiles from the progit book: http://git-scm.com/book/en/Git-Internals-Packfiles

    Git Baby is a git and github fan, let's start git clone.