Git and Mercurial – can someone explain this test result

I was doing a comparison on speed for GIT and Mercurial.
I choose a big project of 9072 files (mainly php files and several images) with a size of 95.1 MB.

This is a fake project, and maybe give someone the idea on how to explaing the results I got – it is a wordpress download, unchanged, and copied 12 times inside two folders – one for the GIT and other for the Mercurial repository.

  • I then create a GIT repository and commit (using TortoiseGIT) and after finished, I did the same on the other folder for Mercurial using TortoiseHG.

    Git Results
    Time: 32 minutes and 30 seconds to commit everything
    Repository size: 6.38MB, with only 847 files.

    Mercurial Results:
    Time: 1 minute and 25 seconds – yes, its only 1 minute.
    Repository size: 58.8MB with 9087 files.

    I’m not arguing the best or whatever, I’m just trying to understand the differences and how both SCM created the repositories.

    It looks like HG did a copy of the files, with some sort of compression.
    But I do not understood what Git did.
    Can someone explain the results?

    PS.: I know there are some questions already about GIT and Mercurial, I’m only trying to figure out the result of this test – and even if its a valid test. When I started I was only checking speed, but I endup with some question marks on top of my head…

  • 3 Solutions collect form web for “Git and Mercurial – can someone explain this test result”

    Get your tools checked; both hg and git (command line) import these
    trees in about a second. Consider the command-line versions of the tools
    in preference to the GUI wrappers.

    You’re running into a case at which git excels and hg is less
    efficient. Mercurial uses a separate file as the revlog of each file,
    while git likes to keep things more unified. In particular, copying the
    same directory twelve times takes virtually no extra space in git. But
    how often does that happen? I hope not very. If you routinely import
    thousands of files and images, and not just as the initial commit,
    a DVCS may not be the right tool for you. Something like rsync or a
    centralized VCS would be better — a DVCS is generally tuned for a
    single project that holds text files and receives patches and merges
    over time. Other kinds of tools make different tradeoffs.

    There’s really not much point importing large directory trees
    and carefully examining the files that appear; you can read the
    documentation if you like. The main lesson here is that git keeps
    a snapshot of the entire directory structure, which allows it to
    efficiently pack things (the bundle for wordpress is 2.7MB, which is no
    larger than the tarball), but it can be more expensive to compute diffs.
    Mercurial maintains a lot more per-file information like logs and diffs,
    which means that accessing the log of just one file is much faster than
    in git, but lots of identical files and directories can have a higher
    space cost.

    I can create a pathological case, too. Here’s one where git wins:

    for dir in {1..100}; do
      mkdir $dir
      for file in {1..100}; do
        touch $dir/$file
    hg add {1..100}; hg commit -m tweedledee
    git add {1..100}; git commit -m tweedledum

    Yep, that’s 10,000 empty files across 100 identical directories. Git
    imports the entire thing in a tenth of a second, and the commit itself
    is less than a kilobyte. Mercurial, which creates a logfile for each
    file, takes about four seconds to commit the entire thing, and ends up
    with 10140 new files in .hg, totalling 40MB.

    Here’s one where mercurial wins:

    mkdir -p a/b/c/d/e
    for i in {1..1000}; do
      echo hello >> a/b/c/d/e/file
      hg add a; hg commit -m "Commit $i"
      git add a; git commit -m "Commit $i"

    That’s one thousand commits, each introducing a tiny change in
    a deeply nested file. Each commit in git introduces eight new
    objects, which are individually deflated but stored as separate
    files. Eventually, git decides to repack, which takes time. Unpacked,
    the whole thing is about 32MB, and packed it’s 620K. Mercurial, on the
    other hand, simply appends a few notes to a few logfiles each time, and
    the .hg is 396K at the end.

    What’s the point of all this? The point is that none of the cases
    discussed in this thread are realistic. In everyday usage, with
    realistic repositories, both tools are great. Just learn one.

    The manuals themselves don’t exactly show you from beginning to end how a commit is constructed, but Git Internals in Pro Git, Internals in the Mercurial wiki, and Mercurial Internals from PyCon 2010 should get you started.

    I suggest you compare DVCS on features and workflow rather than speed and disk space. Disk space is pretty cheap and both Git and Mercurial are pretty efficient for storage. As for speed, neither one will let you down even for very big projects. Go for features and one that agree with the workflow you use (or want to use).

    As for the difference in storage space in your example, git doesn’t track individual files so it will notice the content being repeated and be more efficient (while taking more time)… yet, how often does that happen in real life?

    I suggest you read mpe’s linked posts/articles too. 😀

    That doesn’t sound like a very good test, ie. it’s not often that you commit to a project with no history and 12 identical copies of the same content.

