Is there a way to easily convert a series of tarballs of a source tree into a git repository?

I’m new to git and I have a moderately large number of weekly tarballs from a long running project. Each tarball has on average a few hundred files in it. I’m looking for a git strategy that will allow me to add the expanded contents of each tarball to a new git repository, starting from version 1.001 and going through version 1.650. As of this stage of the project 99.5% of tarball(n) is just a copy of version(n-1) – in other words, a perfect candidate for git. The desired end result is to have only the master branch remaining at the end of the process.

I think I know git well enough to do this “by hand”. As I understand it there is no possibility of a merge conflict since there will be no opportunity to change the master before the next version is added and committed. A shell script is my first guess, but I’m not sure how well bash will like it when git checkout branch_n gets processed while bash is executing in branch_n-1. For the purposes of this project the host environment is Ubuntu 10.4, resources available are 8 Gig RAM, 500 Gig Disk space free and 4 CPU processor at 3.ghz .

  • not been able to make Jenkins trigger a build when changes are pushed to Github
  • How to push all annotated tags?
  • Is there a way to trigger a hook after a new branch has been checked out in Git?
  • GIT Packfile claims to have more objects, inaccessable
  • SSL certificate issue while creating git clone with TortoiseGit
  • .ssh config with amazon ec2 and git
  • I don’t need someone else to solve the problem but I could use a nudge in the right direction as to how a git expert would approach it. Any advice from someone who’s “been there done that” would be appreciated.


    PS: I have looked at site’s suggested “related questions” and found nothing relevant.

  • howto use “No Fast Forward” flag when merging branches in git with IDEA
  • repo 'bisect' for debugging Android?
  • Git pull.rebase this is a possibly dangerous operation
  • Can I add issues to github from the command line?
  • Git for Mac OS X 10.4
  • Git. Checkout feature branch between merge commits
  • 4 Solutions collect form web for “Is there a way to easily convert a series of tarballs of a source tree into a git repository?”

    Regarding this comment:

    I’m not sure how well bash will like it when git checkout branch_n gets processed while bash is executing in branch_n-1

    Are you concerned about two operations running concurrently and getting in each others’ way? This shouldn’t be a problem unless you intentionally run operations in parallel.

    Assuming the tarballs follow a linear evolution, branching shouldn’t come into this at all.

    The process should be fairly straightforward:

    1. git init
    2. untar ball _n_
    3. git add --all .; git commit (with appropriate flags)
    4. git tag -a v1.001 -m "Version 1.001."
    5. rm -rf * (to handle deletions in the history; you want to leave .git intact, of course)
    6. goto 2

    Take a look at $GIT_SRC_DIR/contrib/fast-import/import-tars.perl

    What I would do in this situation, as you have tarballs that are in the end ‘tagged versions’:

    1. create empty git repository
    2. extract a tarball to that directory overwriting any files
    3. add all files git add .
    4. git commit -a -m 'version foo'
    5. git tag current version
    6. remove all files
    7. repeat from step 2 for each tarball

    In your case it’s not necessary to create branches as all your tarballs are distinct, successive versions; each iteration overwrites previous one.

    Without having been exactly there, yu should simply:

    • untar an archive anywhere you want
    • rsync it with the git working directory in order to:
      • change the relevant file
      • add the new files from that archive to the working directory
      • remove the files from the working directory that are no linger part of the current archive
    • git add -A
    • git commit -m "archive n"
    • repeat

    The idea is not to checkout branch_n+1, but to stay within the same branch, committing each tar content one after the other within the same branch of the same git repo.
    Should you truly have somehow two concurrent processes, you could then:

    • git clone the first git repo
    • git branch -b a_new_branch to make sure you isolate that parallel process in its own branch that you will be able to push back to the first repo when done.
    Git Baby is a git and github fan, let's start git clone.