How are DVCS used in large teams?

I’ve recently started getting into Git on a personal project, and I can see how a DVCS might benefit us at work (which is a large enterprise software company, currently running Perforce). Feature work in my team for example mostly consists of developers creating their own branches; sometimes these are shared between small teams of developers. I think it would be more efficient in this instance to use a DVCS.

In the more general case, though, I’d be interested to hear from people that use a DVCS at work, in medium to large teams.

  1. How do you deal with N-way merges? Is this even a common scenario? Mercurial only supports N-way merges by doing (N-1) 2-way merges (and read that this is the preferred solution in other DVCS), which sounds like a very laborious process for even relatively small N.
  2. Do you use a single central authoritative repository, or is it truly P2P?
  3. Do developers often push and pull code to and from each other, or does everything go via the central repository?

  • Which part of HUDSON_HOME should I put under source control?
  • Multiple Similar Sites with GIT? Or some other version control?
  • UCM Clear Case: Hiereachy of streams in one project vs multiple projects
  • Why would I want to revert the staging area to a previous state, but not the working tree?
  • Is there a synopsis about how to do certain things with different scm tools?
  • Make developers checkout (get latest changes) before commit
  • Git commit style: All changed files at once or one at a time?
  • Mercurial (hg) commit only certain files
  • 6 Solutions collect form web for “How are DVCS used in large teams?”

    My team at my previous employer used Git, and it worked well for us. We weren’t all that large (maybe 16 or so, with maybe 8 really active committers?), but I have answers to your questions:

    1. N-Way merges aren’t terribly common. We came up with some conventions about branch naming that allowed us to write scripts that eased the “release engineering” process (I use scare quotes because we didn’t have a release engineer), and people would create private feature branches, but we rarely had an issue with merging more than two branches (see the next one).
    2. (and #3). We had a central repository on a development server for three reasons: (a) The development machine had a RAID5 (more fault tolerant) and nightly backups (dev workstations were not nightly), (b) production releases were built on the development server, and (c) having a central repository simplified scripting. As a result, N-way merges simply never happened. The closest thing we had to N-way was when someone merged laterally and then merged vertically.

    Git was a really great thing for us because of its high degree of flexibility; however, we did have to establish some conventions (branch and tag names, repo locations, scripts, etc, process) or it might have been a little chaotic. Once we got the conventions set up, the flexibility we had was just fantastic.

    Update: our conventions basically were thus:

    • a directory on our NFS server that housed all central repositories
    • we had several projects that shared components, so we broke them out into libraries, essentially, with their own repositories, and the deliverable projects just included them as git submodules.
    • there were version strings and release names imposed on us from above, so we just used a variants of those as branch names
    • similarly, for tags, they followed the process-dictated release names
    • the deliverable projects contained a properties file which I read into the shell scripts, and that allowed me to write a single script to manage the release process for all the projects, even though each one had slight variations on the process – the variations were accounted for in those property files
    • I wrote scripts that would rebuild a deliverable package from any tag
    • using git allowed us to control access using PAM and/or normal user permissions (ssh, etc)
    • There were other conventions that are harder to put in a bulleted list, like when merges should happen. Really, me and another guy were sort of the in-house “git gurus”, and we helped everyone figure out how to use branches and when to merge.
    • getting people to commit in small chunks and not drop diff-bombs in the master branch was a challenge. One guy dropped about two solid weeks of work into one commit, and we eventually had to unravel it all. A huge waste of time, and frustrating to all.
    • informative and detailed comments to go with commits

    There were other things that you learn as your team gets experienced and learns to work with each other, but this was enough to get us started.

    Update: anyone who follows such things by now already knows about it, but Vincent Dreissen has written a solid and pretty comprehensive (but not exaustive) take on branching and release engineering using Git. I would highly encourage using his process as a starting point because for two reasons:

    • lots of teams do it this way or are using some close variant (including Linux, Git, and many other OSS project teams), which means this method has been tested and tweaked to be successful in most circumstances. You are very unlikely to face an issue that hasn’t been faced and solved within the constraints of this model.
    • because of the foregoing, almost any engineer with Git experience will understand what’s going on. You won’t have to write detailed documentation about the fundamental nature of your release process; you’ll only have to document things specific only to your project or team.

    Work-flow schema from whygitisbetterthanx:

    alt git work-flow with integration manager

    To scale this up to even more developers, you simply add another layer of “trusted lieutenants” between the integration manager and the developers.

    I’ve been working for several years with the Glasgow Haskell Compiler team using Darcs. I’ve recently (several months) started using git for my own copy of the repo, both for performance and to improve my education.

    1. How do you deal with N-way merges?

      There are no N-way merges. Each developer originates a stream of patches, and streams are merged one at a time at each repo. So if N developers make changes simultaneously, they get merged pairwise.

    2. Do you use a single central authoritative repository?

      Absolutely. It’s the only way to tell what’s GHC and what isn’t.

    3. Do developers often push and pull code to and from each other, or does everything go via the central repository?

      I think it depends on the developers and the VCS you are using. On the GHC project almost all the pulls and pushes I see go through the central repository. But there’s a heavyweight (self-administered) gatekeeper on pushes to the central repo, and if a colleague has a bug fix I need now, I’ll pull it direct from his or her repo. With darcs it is very easy to pull just a single patch (rather than the whole state as in git), and I know that my fellow deveopers, who have more experience with darcs, use this feature a lot more than I do—and they like it a lot.

      With git, when I am working closely with one other developer, I will frequently create a new branch just for the purpose of sharing it with one other person. That branch will never hit the central repo.

    The fairly famous “Tech Talk: Linus Torvalds on git” explains how it is used for Linux (about as big as team as I can think of)

    If I recall correctly, it’s use was likened to a Military chain-of-command – each module has a maintainer, who handle pull requests from developers, then there’s a few “most trusted” people that deal with pulling data from the module maintainers into the official git repository.

    “Linux: Managing the Kernel Source With ‘git'” also explains it, although again it’s hardly a concise explanation..

    Here is one example (by no mean a “universal” one)

    We have central VCS (ClearCase or SubVersion, depending on the different projects), and we are using them for “official” developments efforts (dev, patches, fixes), where the number of branches is limited and well-identified.

    However, for refactoring developments involving a lot of intermediate state, where nothing works, and where many developers needs to have their own activity-based branch or branches, some Git repositories are set up between those developers, in a P2P way.
    Once the work achieve some kind of 0.1 stability, and merges are reduced, its is re-imported in the VCS, where the work can go on in an “orderly” central fashion.

    Since Git on Windows works well (MSysGit), we manage to have small initial developments quickly done on the side that way.

    We are still evaluating Git for a full-scale project development though.

    It’s probably best to look into how the linux kernel developers work. They have quite a complex workflow where changes are submitted from many sources, and then trusted developers for each subsytem (called lieutenants) pull in the changes, and when they’re happy submit them to Linus, who eventually either pulls them into his tree or rejects them. Of course it’s more complex than that, but that’s a general overview.

    Git Baby is a git and github fan, let's start git clone.