How Github or any other cloud based repository services(Gitlab, Bitbucket) store source code files and directories?

Do they store the meta data of files and directories in databases and the actual files and directories on the files system of the server instances?

  • How to make SVN ADD ignore binaries
  • Why would you track an empty directory with Git?
  • git - how to remove empty folder and push that change?
  • Git: How to check for remote changes within a directory
  • How do I ignore a directory with SVN?
  • Git config with directory scope, containing multiple repositories
  • Build jenkins job when push code to bitbucket
  • List of authors in git since a given commit
  • Thinking of making an opensource iOS app, what should I put in my .gitingnore?
  • How to get list of latest tags in remote git?
  • GIT - Should I ignore Makefile, and other files generating by IDE?
  • Git - unlink commit from tag?
  • One Solution collect form web for “How Github or any other cloud based repository services(Gitlab, Bitbucket) store source code files and directories?”

    It should be in bare repositories.
    But it depends on the scale of the Git repositories server you are talking about.

    For instance, GitHub is using DGit

    DGit is short for “Distributed Git.

    As many readers already know, Git itself is distributed—any copy of a Git repository contains every file, branch, and commit in the project’s entire history.
    DGit uses this property of Git to keep three copies of every repository, on three different servers.
    The design of DGit keeps repositories fully available without interruption even if one of those servers goes down. Even in the extreme case that two copies of a repository become unavailable at the same time, the repository remains readable; i.e., fetches, clones, and most of the web UI continue to work.

    The point is: you cannot just store the bare repo without dealing with the rest:

    • authentication (https or ssh)
    • authorization (tied to authentication, or also membership)
    • diff: as GitHub realizes, you cannot just query a diff from Git and return it. See “How we made diff pages three times faster”.
    • search (see “How to search for a commit message on GitHub?”), in master branch or in all branches: only the master branch is supported for now (2017).

    And all of this does not even take into account all the services et metadata those hosting servers propose (project, wiki, issues, …). See for instance “Moving persistent data out of Redis” for a glimpse of the kind of technical challenge that poses.

    And that is just GitHub. BitBucket and GitLab have their own tehnical challenges and solutions.

    Git Baby is a git and github fan, let's start git clone.