Title : Introduction to git-annex (Port Of The Week)
Author: Solène
Date : 12 May 2021
Tags : git openbsd
# Introduction
Now that git-annex is available as a package on OpenBSD I can use it again. I've been relying on it a few years ago but it was really complicated for me to compile it and I gave up. Since I really missed it, I'm now back to it and I think it's time to share about this wonderful piece of software.
git-annex is meant to help you manage your data like you would manage books in a library, you have a database telling you where the books are and you can find them on the shelves, or at least you can know who borrowed the book. We are working with digital files that can be copied here so the analogy doesn't fully work, but you could want to put your data in an external hard drive but not everything, and you may want to have some data on multiples devices for safety reasons, git-annex automates this.
It works very well for files that are not changing much, I call them "static files", they are music, videos, pictures, documents. You don't really want to use git-annex with files you edit everyday, it doesn't work well because the process can be a bit tedious.
git-annex may not be easy to understand at first, I suggest you try locally to grasp its purpose.
git-annex official
what git-annex is
# Cheat sheet
Let's create a cheat sheet first. Most git-annex commands have a dedicated man page, but can also provide a simpler help by using "git annex help somecommand".
## Create the repository
The first step is to create a repository which is based on git, then we will tell git-annex to init it too.
```command line example
mkdir ~/MyDataLibrary && cd ~/MyDataLibrary
git init
git annex init "my-computer"
```
## Add a file
When you want to register a file in git annex, you need to use "git annex add" to add it and then "git commit" to make it permanent. The files are not stored in the git repository, it will only contains metadata.
```command line example
git annex add Something
git commit -m "I added something"
```
Example:
```command line example
$ echo "hello there" > hello
$ ls -l hello
-rw-r--r-- 1 solene wheel 12 May 12 18:38 hello
$ git annex add hello
add hello
ok
(recording state in git...)
$ ls -l hello
lrwxr-xr-x 1 solene wheel 180 May 12 18:38 hello -> .git/annex/objects/qj/g5/SHA256E-s12--aadc1955c030f723e9d89ed9d486b4eef5b0d1c6945be0dd6b7b340d42928ec9/SHA256E-s12--aadc1955c030f723e9d89ed9d486b4eef5b0d1c6945be0dd6b7b340d42928ec9
$ git status hello
On branch master
Changes to be committed:
(use "git restore --staged ..." to unstage)
new file: hello
```
## Make changes to a file
If you want to make changes to a file, you first need to "unlock" it in git-annex, which mean the symbolic link is replaced by the file itself and is no longer in read-only. Then, after your changes, you need to add it again to git-annex and commit your changes.
```command line example
git annex unlock file
vi file
git annex add file
git commit -m "I changed something" file
```
## Add a remote encrypted repository
If you want to store data (for duplication) on a remote server using ssh you can use a remote of type "rsync" and encrypt the data in many fashions (GPG with hybrid is the best). This will allow to store data on remote untrusted devices.
```command line example
git annex initremote my-remote-server type=rsync rsyncurl=remote-server.com:/home/solene/git-annex-data keyid=my-gpg@address encryption=hybrid
```
After this command, I can send files to my-remote-server.
git-annex website about
git-annex website about special
## Manage data from multiple computers (with ssh)
**This is a way to have a central git repository for many computers, this is not the best way to store data on remote servers**.
If you want to use a remote server through ssh, there are two ways: mounting the remote file system using sshfs or use a plain ssh. If you use sshfs, then it falls as a standard local file system like an external usb drive, but if you go through ssh, it's different.
You need to have a key authentication based for the remote ssh and you also need git-annex on the remote server. It's important to have a bare git repo.
```command line example
cd /home/data/
git init --bare
git annex init "remote-server"
```
On your computer:
```command line example
git remote add remote-server ssh://hostname:/home/data/
git fetch remote-server
```
You will be able to use commands related to repositories now!
## List files and where they are stored
You can use the "git annex list" command to list where your files are physically stored.
In the following example you can see which files are on my computer and which are available on my remote server called "network", "web" and "bittorrent" are special remotes.
```command line example
here
|network
||web
|||bittorrent
||||
X___ Documentation/Nim/Dominik Picheta - Nim in Action-Manning Publications (2017).pdf
X___ Documentation/ada/Ada-Distilled-24-January-2011-Ada-2005-Version.pdf
X___ Documentation/ada/courseada1.pdf
X___ Documentation/ada/courseada2.pdf
X___ Documentation/ada/courseada3.pdf
X___ Documentation/scheme/artanis.pdf
X___ Documentation/scheme/guix.pdf
X___ Documentation/scheme/manual_guix.pdf
X___ Documentation/skribilo/skribilo.pdf
X___ Documentation/uck2ep1.pdf
X___ Documentation/uck2ep2.pdf
X___ Documentation/usingckermit3e.pdf
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/01 - Daftendirekt.flac
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/02 - Wdpk 83.7 fm.flac
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/03 - Revolution 909.flac
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/04 - Da Funk.flac
XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/05 - Phoenix.flac
_X__ Musique/Alan Walker/Alan Walker - Different World/01 - Alan Walker - Intro.flac
_X__ Musique/Alan Walker/Alan Walker - Different World/02 - Alan Walker, Sorana - Lost Control.flac
_X__ Musique/Alan Walker/Alan Walker - Different World/03 - Alan Walker, Julie Bergan - I Don_t Wanna Go.flac
```
## List files locally available
If you want to list the files for which you have the content available locally, you can use the "list" command from git-annex but only restrict to the group "here" representing your local repository.
```command line example
git annex list --in here
```
# Work with a remote repository
## Copy files to a remote
If you want to duplicate files between repositories to have multiples copies you can use "git annex copy".
```command line example
git annex copy Music -t remote-server
```
## Move files to a remote
If you want to move files from a repository to another (removing the content from origin) you can use "git annex move" which will copy to destination and remove from origin.
```command line example
git annex move Music -t remote-server
```
## Get a file content
If you don't have a file locally, you can fetch it from a remote to get the content.
```command line example
git annex get Music/Queen
```
## Forget a file locally
If you don't want to have the file locally because you don't have disk space or you simply don't want it, you can use the "drop" command. Note that "drop" is safe because git-annex won't allow you to drop files that have only one copy (except if you use --force of course).
```command line example
git annex drop Music/Queen
```
Real life example: I have a very huge music library but my laptop SSD is too small, I get get some music I want and drop the files I don't want to listen for a while.
## Use mincopies to enforce multi repository data duplication
The numcopies and mincopies variables can be used to tell git-annex you want exactly or at least "n" copies of the files, so it will be able to protect you from accidental deletions and also help uploading files to other repositories to match the requirements.
### Enable per directory recursively
```command line example
echo "* annex.mincopies=2" > .gitattributes
```
### Only upload files not matching the num copies
If you have multiples repositories and some files doesn't match the copies requirements, you can use the following commands to only push the files missing copies.
```command line example
git annex copy --auto -t remote-server
```
Real life example: I want my salaries PDF to be really safe, I can ask to have 2 copies of those and then run a sync to the remote server which will proceed to upload them if there is only one copy of the file yet.
## Verifying integrity and requirements
There is the git-annex fsck command which will check the integrity of every file in the local repository and reports you if they are sane (or not), but it will also tell you which file doesn't meet the mincopies requirements.
```command line example
git annex fsck
```
# Reversibility
If for some reasons you want to give up git-annex, you can easily get all your files back like a normal file system by using "git annex unlock ." on the top directory of your repository, every local files will be replaced by their physical copy instead of the symlink. Reversibility is very important when you deal with your data because it means you are not stuck forever with a tool in case it's broken or if you want to switch to another process.
# My workflow
I have a ~/DATA/ directory in which I have sub directories {documents,documentation,pictures,videos,music,images}, documents are papers or legal papers, documentation are mostly PDF. Pictures are family pictures and images are wallpapers or stupid images I want to keep.
I've set a mincopies to 2 for documents and pictures and my music is not on my computer but on a remote, I get the music files I want to listen when I'm on the local network with the computer having the files, I drop them locally when I'm bored.
# Conclusion
git-annex separates content from indexation, it can be used in many ways but it implies an archivist philosophy: redundancy, safety, immutability (sort of). It is not meant for backup, you can backup your directory managed by git-annex, it will save the data you have locally, you will have to make backup of your other data as well.
I love that tool, it's a very nice piece of software. It's unique, I didn't find any other program to achieve this.
## More resources
git-annex official
git-annex special remotes (S3, webdav, bittorrent
git-annex