Annexed submodules in git

I’d like to keep some binary files (documentation, executable binary files, images, etc) in a git-annex, and then include them in several projects as git-submodules. I think this will allow me to keep track of the correct versions of these large files as they change, keeping old projects linked to the old versions and new projects to the new versions.

So I make the following repo for my big files:

  • Cannot access remote git repository
  • Git: Get a subfolder from a repository to a new one without losing history
  • Why does Git use SHA-1 as version numbers?
  • Looking for commands similar to “svn xx” in git (also confused in some concepts)
  • Cannot push to git from windows/phpstorm
  • Compact repo by removing old commits
  • mkdir annexedrepo
    cd annexedrepo
    cp big_files annexedrepo/
    git init
    git annex init
    git annex add .
    

    and then go to my project repo and add them as a submodule.

    cd ../otherrepo
    mkdir data
    git submodule add ../annexedrepo data/annexed
    

    I’d love if these would just appear as symlinks to the correct files in the other repo. But I guess it’s good enough if I can just make the copies as I need them with:

    git annex get data/annexed
    

    This copies the files over – I can see them in otherrepo/.git/module/data/annexed/objects/. But when I do this, the annexed files are just dead symlinks. I can list them with ls data/annexed/, but nobody’s home.

    Am I trying to do something wrongheaded? Is there a way to fix this? Are these bugs in either git-submodule or git-annex? Thanks for your help!

  • git subcommand VS git --option
  • How do I remove a working copy created via git-new-workdir without hosing the original repo?
  • Updating migration timestamps in feature branches
  • git pull VS git fetch git rebase
  • Undo git filter-branch
  • Issue with renaming a directory in git to lowercase while ignoreLowercase=True
  • 2 Solutions collect form web for “Annexed submodules in git”

    I am using the same source tree structure and also tried to use git-annex but met the same problem. I found out the git-fat extension can be used instead of git-annex and has no such an issue. So my source tree looks like this:

    /project
        .git
        .gitmodules
        ...
        <project files and folders>
        ...
        submodule
            .git
            .gitattributes
            .gitfat
            ...
            <binary files>
            ...
    

    To clone such a project

    git clone git://... project
    cd project
    git submodule init
    git submodule update
    cd submodule
    git fat init
    git fat pull
    

    The git-fat uses rsync to push/pull files. See more about git-fat.

    Well, with a bit of fiddling, I’ve found a work-around. I’d love to see something better, though. Posting this for posterity, but I’m hoping to find a better solution.

    In data/annexed, there is a file, .git, which contains a reference to ../../.git/module/data/annexed/. I removed this file and replaced it with a symlink to the same location. I now have access to ../annexedrepo/ from inside data/annexed, and my files are at the right version. I’m a bit worried about causing future problems with this workaround…

    Git Baby is a git and github fan, let's start git clone.