A version control system with minimum space requirements on the client side, and is good with binaries
(This is my first post so be gentle)
I am using subversion as version control on large binary.
I have about 2.5 gigs of binaries that I update hourly.
I get about 400 megs worth of differents each day.
Some of the files are PEs but it is mainly compressed files that are difficult to get good diffs on.
The “.svn” folders on my clients are growing daily and I do not have space on the clients to take this increass.
This size is caused by subversions pristine copy on the client (the repository is quite small).
Distrubuted Version control like GIT or Mercurial will store a repository of sorts on the client which I don’t have space for. I will never really do diffs, just updates to the head or to a given version. So the speed advantage of the pristine copy on the client side makes no difference to me.
So I am planning on using CVS because it is;
light on the client side (no pristine copy, very important to me)
It is a Server based architecture
Open source, I am poor.
Is there something completely different I should be using, a backup solution etc.?
Is there another Version control other then CVS that meets these requirements?
4 Solutions collect form web for “A version control system with minimum space requirements on the client side, and is good with binaries”
Mercurial with the large-files is similar to git-annex w/ the assistant. http://kiln.stackexchange.com/questions/4846/how-do-i-use-the-mercurial-largefiles-extension It’s appropriately labeled a “feature of last resort” because it breaks the D in DVCS (as does git-annex), but that’s what you’re asking for. It works fine and is supported by Fog Creek (the folks bringing you this site to a first order approximation).
I spent 10 years in the CVS goulag. I respect you for considering it, but you don’t want to go there. The first time someone pressed ctrl-C during a commit and leaves your repo in a half-committed state and you’re picking through
,v files trying to undo the damage you’ll want to kick yourself. The first time someone wants UTF-8 Content without remembering to do
-kb or put UTF-8 in file names or (IIRC) tries to put a space in a file name you’ll curse CVS.
As far as I can tell, CVS doesn’t do binary diffs, but will store each binary for each version. If that (disk space) is an issue, CVS is not the proper VCS for your intended use.
You may want to have a look at git-annex. It uses Git to organize information about files using commits and branches, but the actual file contents is not stored in the Git repository, which lets you control the space used by your files. It is particularly well suited to manage a collection of large files in a distributed fashion.
The git-annex assistant provides a nice interface over
A question is a bit moot as it’s not clear whether clients must have the ability to fetch arbitrary historic versions of the files. So I’m about to provide a set of options which might or might not fit your requirements; hope this will at least be able to provide some sort of hints to you…
So here we go:
git-annexbeing a solution which allows to manage a set of huge files with Git without actually keeping them on the clients.
- Sparkleshare being a “non-proprietary Dropbox”.
rsyncmight be used to pull the data to your clients highly effectively.
rdiff-backupmight be used as a variation to the former: while working generally in the way
rsyncdoes, it’s able to keep arbitrary number of “deltas” representing past states of the directory being synchronized, so any such state might be extracted/rolled back to. Old deltas might be purged at will.
This might be combined with
rdiff-backupis used on the server, and the actual copy managed by it is offered to the clients via
rsync. If a rollback is needed, another version is restored on the server and the clients then fetch it using
- A bittorrent server and clients: this protocol transfers its files using chunks, so if changes to your binaries are somewhat localized, this might work just OK.