How Github or any other cloud based repository services(Gitlab, Bitbucket) store source code files and directories?
Do they store the meta data of files and directories in databases and the actual files and directories on the files system of the server instances?
One Solution collect form web for “How Github or any other cloud based repository services(Gitlab, Bitbucket) store source code files and directories?”
It should be in bare repositories.
But it depends on the scale of the Git repositories server you are talking about.
For instance, GitHub is using DGit
DGit is short for “Distributed Git.”
As many readers already know, Git itself is distributed—any copy of a Git repository contains every file, branch, and commit in the project’s entire history.
DGit uses this property of Git to keep three copies of every repository, on three different servers.
The design of DGit keeps repositories fully available without interruption even if one of those servers goes down. Even in the extreme case that two copies of a repository become unavailable at the same time, the repository remains readable; i.e., fetches, clones, and most of the web UI continue to work.
The point is: you cannot just store the bare repo without dealing with the rest:
- authentication (https or ssh)
- authorization (tied to authentication, or also membership)
- diff: as GitHub realizes, you cannot just query a diff from Git and return it. See “How we made diff pages three times faster”.
- search (see “How to search for a commit message on GitHub?”), in
masterbranch or in all branches: only the
masterbranch is supported for now (2017).
And all of this does not even take into account all the services et metadata those hosting servers propose (project, wiki, issues, …). See for instance “Moving persistent data out of Redis” for a glimpse of the kind of technical challenge that poses.
And that is just GitHub. BitBucket and GitLab have their own tehnical challenges and solutions.