On computer one "ingegerdsdator"
mkdir annex cd annex git init git-annex init "ingegerdsdator" mv ../stuff-I-want-to-git-annex-to-manage . git-annex add .
On computer two "hans-vita"
git clone ssh://ingegerdsdator/home/hans/annex ~/annex
cd annex
git-annex init hans-vita
git remote add ingegerdsdator ssh://ingegerdsdator/home/hans/annex
The last line is needed because without it, ingegerdsdator is only know from hans-vita as "origin".
From now on it is assumed the working directory is ~/annex
.
On computer one "ingegerdsdator"
git remote add hans-vita ssh://hans-vita/home/hans/annex git-annex sync
At one computer, say ingegerdsdator, do
mkdir -p text/ cp -av ../text/weblog text git-annex add . git commit -a -m added git-annex sync
At the other computer, hans-vita, do
git-annex sync git-annex get text/*
Add a set of files at the other computer, hans-vita, by
mkdir text/code cp ../text/code/GitAnnex.muse text/code/ git-annex add . git-annex sync
And to get it to ingegerdsdator, do
git-annex sync git-annex get .
At computer A do
git-annex unlock text/code/GitAnnex.muse ## 1. Edit the file (which in this case happened to be the source for the page ## you are reading right now) ## 2. Save the new version ## ## Since locking and unlocking is a bit tedious, save, but do not commit until ## you know you won't edit again in a while. Delaying the commit is not harmful. ## ## 3. Commit when you're done (and not before). git-annex add text/code/GitAnnex.muse git-annex sync
Deleting an annexed file has two possible meanings
To do 1. issue:
git-annex drop path/to/the.file
When syncing, the other repositories will learn that this repository no longer holds a copy of that file.
To do 2, and get the file removed from all repositories:
git rm path/to/the.file
git commit -a -m "removed annoying file"
git-annex sync
When syncing, the other repositiories will also delete the file.
In order to get the changes that was done at computer A to the file GitAnnex.muse
(or actually all modified files), to be propagated to computer B, at computer B do:
git-annex sync git-annex get .
If a annexed file is modified in two repositories a conflict will rise when these repositories are synced. Git-annex handles this by
Below is an excerpt of out put from git-annex when the file foo.R have been uniquely modified in two repos.
merge synced/master Merge made by the 'recursive' strategy. projekt/foo.variant-655a.R | 1 + projekt/{foo.R => foo.variant-adde.R} | 0 2 files changed, 1 addition(+) create mode 120000 projekt/foo.variant-655a.R rename projekt/{foo.R => foo.variant-adde.R} (100%) ok
In order to resolve the conflict do the following:
git-annex get foo*.R
git-annex unlock foo*.R
foo.R
with some content from both files, if apropriate. In this case one of the files was the correct one, so I simply renamed that to the original name and deleted the other file.mv foo.variant-adde.R foo.R rm foo.variant-655a.R
git-annex add foo.R git-annex sync
git-annex
can be used as a backup system. From within an annex, use:
git-annex --auto get .
to get content which is not backed up satisfactory. You can define what "satisfactory" means by editing .gitattributes
.
However, you must explictly add .gitattributes
with the --force
option, since git-annex
ignores dot-files by default.
.gitattributes
is inherited by subfolders, which is awesome. In the top-most .gitattributes
, which resides directly in the annex - not in the directory .git
- I define how different file types should be backed up. In sub-folders I can get a certain number of copies of all files in these directories (and their sub-dirs), regardless of file type. File types are defined by suffix in the name.
In order to automatically get .gitattributes
copied to new repositories, I set annex.numcopies
to a really high number (99).
Here is my top-most .gitattributes
:
.gitattributes annex.numcopies=99 *.Rnw annex.numcopies=99 *.muse annex.numcopies=99 *.tex annex.numcopies=99 *.pl annex.numcopies=99 *.pm annex.numcopies=99 *.R annex.numcopies=99 *.sql annex.numcopies=99 *.sh annex.numcopies=99 *.mbox annex.numcopies=2 *.odt annex.numcopies=2 *.ods annex.numcopies=2 *.png annex.numcopies=1 *.mp3 annex.numcopies=1 *.mp4 annex.numcopies=1 *.wav annex.numcopies=1 *.pdf annex.numcopies=1
In some sub-folders I have a .gitattributes
like this (notice the first line, which catch important files that cannot be matched by their suffix).
* annex.numcopies=2 .gitattributes annex.numcopies=99 *.Rnw annex.numcopies=99 *.muse annex.numcopies=99 *.tex annex.numcopies=99 *.pl annex.numcopies=99 *.pm annex.numcopies=99 *.R annex.numcopies=99 *.sql annex.numcopies=99 *.sh annex.numcopies=99 *.mbox annex.numcopies=2 *.odt annex.numcopies=2 *.ods annex.numcopies=2 *.png annex.numcopies=1 *.mp3 annex.numcopies=1 *.mp4 annex.numcopies=1 *.wav annex.numcopies=1 *.pdf annex.numcopies=1
There are providers giving away free shell accounts with storage. While I would not trust such providers enough to put my files at such hosts unencrypted, you can - securely - use such shell accounts with encrypted files, and with LUKS, all files will appear unencrypted to git-annex. The only thing you need to care about is not using the "ssh remote" since that implies that the untrusted host will not only see the contents of the files, but also get your credentials for your own box. (The firsts step in setting up a ssh remote involves ssh:ing from the the remote to your own box, a big no-no with untrusted hosts).
What you need is explained here: free-secure-online-backup.
git-annex sync
provides a new repository with information on what files exists, and where, but to actually get content, in a automated way, you need two things:
.gitattributes
with definitions on how many copies of each file you want..gittatributes
-files copied to the new repo.The latter is done by the following:
git-annex --force get *.gitattributes
Now, the new repository knows what files it is expected to keep a copy of, and it will get the right content (including dot-files) when you issue:
git-annex --force --auto get .
To push contents of files according to the principles defined in *.gitattributes
, I use the following snippet, in a command I call global-sync.sh
. It parses the list of repositories, which is in .git/config
and for each host mentioned there, it ssh into the host and syncs records, and then get the contents with --auto get
.
So, when you have run global-sync.sh
, you know that content that is supposed to be available at all repos, actually is there even if the content has changed recently.
## global-sync.sh git-annex sync for host in `grep ssh ~/annex/.git/config | cut -d ":" -f2 | cut -d "/" -f 3`; do ssh $host "cd annex; git-annex sync; git-annex --auto get" done
This is slightly off-topic, but relevant for the problem: "When I get a completely new $HOME, what do I need to do to get everything working as in my other $HOME:s?"
If you want .dot-directories containing symlinks pointing to files managed by git-annex
, then you need to create these too on each new repository.
Some collection of files are not suitable for use with git-annex, and if you have directory structure to impose order on these files, that directory structure need to be created too (I use dropbox for some directories under $HOME, but outside the annex).
Files like .emacs
need to be in $HOME
rather than in $HOME/annex
, which can easily be solved by letting $HOME/.emacs
be a symlink to $HOME/annex/.emacs
. git-annex
will cater for $HOME/annex/.emacs
, but the symlink must be put in $HOME
by some other mechanism. I keep a tar-archive of symlinks pointing into $HOME/annex
, so all symlinks can be recreated in one command.
You may, of course, manage symlinks.tbz
with git-annex
. Doing so will ease the distribution of updated versions symlinks.tbz
.
To find and archive the symlinks:
#!/bin/sh ## dot-files in $HOME can by symlinked into ~/annex. The symlinks themselves ## are not backed up by git-annex. This script backups the symlinks. ## catch symlinks pointing to $HOME/annex or "annex" tar -jcvf ~/annex/symlinks.tbz `find $HOME -xdev ! -iname "annex*" -type l -lname "$HOME/annex*" -o -lname "annex*"`
To use them, that is to un-archive them in a new $HOME
, put symlinkz.tbz in the new $HOME
, and issue:
tar -C / -jxvf ~/annex/symlinks.tbz
I got fsck errors from an usb stick where I have an annex. git-annex fsck
reported that everything was ok. git fsck
informed about dangling blobs, but that is harmless, I think. git-annex unused
resulted in:
git-annex unused
unused . (checking for unused data...) (checking master...) (checking Ekbrands_data/master...) (checking Ekbrands_data/synced/master...)
Some corrupted files have been preserved by fsck, just in case:
NUMBER KEY
1 SHA256-s8005--8c04a49afbdcd054036db8d3c3d884bd2da41d810cc15bb2797b824ff24a84dc
To remove unwanted data: git-annex dropunused NUMBER