drs is a small set of shell scripts that allows you to store directory revisions (snapshots if you like) remotely. Revision metadata is stored in a Git repository while the directory contents are stored on a remote host using SSH and rsync. Metadata repository can be kept small since it's completely independent of the directory contents.
It's really easy to setup, depends on only standard tools and easy to extend.
Where does it fit?
I needed to store large builds (>5GB) and distribute them efficiently to testers. The actual differences between builds were quite small, a few changed jars along with 100s of other jars that rarely changing. In such case, rsync does a spectacular jobs to speed things up. Git is great to keep track of everything else, branches, build information etc.
Relation to Git
drs uses Git as a minimalistic database. Commands like drs-put
, drs-get
are integrated as Git aliases and organized around producer/customer concept. Producer is usually a build job on CI, the consumer can be a human tester or a regression test job for example. Most workflow tasks (except git init, tag
) are covered with drs
commands, therefore users don't have to know Git much. For more details see Differences to Git
- drs - an uncomplicated directory revision storage
🎉 For a complete dockerized example see drs demo
It's fully functional, you can play with put
and get
commands.
- Clone this repository to a suitable directory on your computer
- Add this directory plus
src
to theDRS_HOME
environment variable
export DRS_HOME=~/drs/src
- Download
drs.tar.gz
from the latest release
curl -o drs.tar.gz -L https://github.com/bvarnai/drs/releases/latest/download/drs.tar.gz
- Extract archive (to a directory of your choosing)
tar -zxvf drs.tar.gz
- Add this directory to the
DRS_HOME
environment variable
export DRS_HOME=~/drs
📝 You can set DRS_HOME
in ~/.profile
or ~/.bashrc
to make it permanent
sudo apt install openssh-client git rsync uuid-runtime jq
Install client prerequisites on Git for Windows (Git-Bash/MinGW/MSYS2)
Unfortunately Git-Bash
doesn't have a default package manager, so installing additional tools is a manual process.
Git-Bash
leverages MSYS2 and ships with a subset of its files. To go deeper on MSYS2 architecture see Environment
Good news is that there are pre-compiled packages available, you just have download, extract the archives and add them to your existing Git-Bash
installation.
Tested with Git for Windows versions:
- 2.43.0
- 2.41.0
To extract the archives you need the zstd
tool, this needs to be installed first.
Make sure your Git-Bash
installation directory is correct.
mkdir tmp
cd tmp
curl -L https://github.com/facebook/zstd/releases/download/v1.5.5/zstd-v1.5.5-win64.zip -o zstd-v1.5.5-win64.zip
unzip zstd-v1.5.5-win64.zip
gsudo cp zstd-v1.5.5-win64/zstd.exe 'C:\Program Files\Git\usr\bin'
The last cp
command requires elevation. If don't have gsudo installed,
than copy zstd-v1.5.5-win64/zstd.exe
to C:\Program Files\Git\usr\bin
directory manually.
Once the zstd
is working, download the following packages:
- libxxhash-0.8.1-1-x86_64.pkg.tar.zst
- xxhash-0.8.1-1-x86_64.pkg.tar.zst
- libzstd-1.5.5-1-x86_64.pkg.tar.zst
- liblz4-1.9.4-1-x86_64.pkg.tar.zst
- libopenssl-3.2.0-1-x86_64.pkg.tar.zst
- rsync-3.2.7-2-x86_64.pkg.tar.zst
- util-linux-2.35.2-1-x86_64.pkg.tar.zst
You can use get-git-bash-packages.sh
script to automate this step. Run it from the tmp
directory.
cd tmp
. $DRS_HOME/get-git-bash-packages.sh
gsudo cp -r usr/ 'C:\Program Files\Git'
The last cp
command requires elevation. If don't have gsudo installed,
than copy usr/
to C:\Program Files\Git
directory manually.
💡 If don't want to pollute your vanilla Git-Bash
installation, move these packages to any directory and add it to the PATH
variable.
There is a script check-client-prerequisites.sh
to check if your installation is ready:
$DRS_HOME/check-client-prerequisites.sh
It should print all OK.
You will need an SSH server, pick your own favorite. For basic setup instruction see SSH server setup or check out the demo server Dockerfile
drs uses ssh
to connect to the remote host. SSH configuration should be added to ~/.ssh/config
file. This must be done on every client.
Host <drs-host-name>
HostName <drs-real-host-name>
User <drs-user>
IdentityFile <drs-user-key>
IdentitiesOnly yes
Port <drs-server-port>
ForwardX11 no
Ciphers ^[email protected],[email protected],aes256-ctr,aes192-ctr,[email protected],aes128-ctr
<drs-host-name>
: a name that used to identify host. I recommend to use something simple likedrs-server
, this allows you to change the real host name without changing the configuration in the repository<drs-user-key>
: ssh private key<drs-real-host-name>
: the real host name, for example drs.mycompany.com<drs-user-name>
: ssh user name to login<drs-host-port>
: ssh port of the host
💡 Cipher list is optional, based on post Benchmark SSH Ciphers
An example configuration
Host drs-server
HostName drs.mycompany.com
IdentityFile id_rsa
IdentitiesOnly yes
User drs
Port 2222
ForwardX11 no
Ciphers ^[email protected],[email protected],aes256-ctr,aes192-ctr,[email protected],aes128-ctr
📝 Note SSH configuration is an extensive topic, endless options to choose from. You can find out more about option here How to Use The SSH Config File
💡 If you are working in secure, trusted environment, for example a company intranet you can use a shared user for drs
. It greatly simplifies client setup.
If you don't have an SSH server, please follow the guide Initial Server Setup
If you don't have SSH keys, please follow the guide How to Set Up SSH Keys
This section explains how to setup the drs metadata repository, it's nothing more than a normal Git repository.
- Create an empty Git repository (or use an existing one)
mkdir myrepo git init
- Copy the configuration template file from
$DRS_HOME/drs.json
- Add your project directory ("name" property in
drs.json
) to your .gitignore file. It'smyproject
in the template - Install Git aliases
. $DRS_HOME/install.sh
- Add and commit configuration
git add . git commit -m "Add initial drs configuration"
- Set remote
git remote add origin [email protected]
- Push
git push -u origin master
The configuration file is called drs.json
and it's located in the root of metadata repository.
{
"name": "<project-name>",
"defaultBranch": "<default-branch>",
"remote": {
"host": "<drs-host-name>",
"directory": "<remote-directory>",
"rsyncOptions": {
"get":"<rsync-options>",
"put":"<rsync-options>"
}
}
}
name
- project name, defines the project directory on remote as$remote.directory/$name
and the working directory locally as$name
defaultBranch
- commands will fall back to this default branch is nothing is specifiedremote
configuration section for remotehost
- host name as specified in~/.ssh/config
(see drs-host-name)directory
- base directory on the remotersyncOptions
configuration section for rsyncget
- options passed to rsync forget
commandput
- options passed to rsync forput
command
directory
will expand on client side, using an absolute path is highly recommended
For all available rysnc options see rsync docs. The following rsync options added implicitly:
-v
,--info=progress2
and--itemize-changes
if-v|--verbose
is set--quiet
if-q|--quiet
is set
rsyncOptions
by default
Example configuration
{
"name": "myproject",
"defaultBranch": "main",
"remote": {
"host": "drs-server",
"directory": "/var/drs",
"rsyncOptions": {
"get":"-az --delete-during --stats",
"put":"-az --delete-during --whole-file --stats"
}
}
}
This will store data on drs-server
in /var/drs/myproject
directory.
📝 For my projects, the repository called myapp-builds
and working directory called myapp
, this will give myapp-builds/myapp
local directory.
But nothing wrong with have myapp/myapp
structure.
The actual contents/files are not stored in the drs metadata repository, but there is a dedicated directory called the working directory (a working copy if you please). For convenience this is placed under a sub directory in drs repository and it's ignored by Git.
Example structure
myrepo
myproject
.gitignore
drs.json
myproject
is your working directory.gitignore
containsmyproject
entrymyproject
Otherwise there is no limitation on what you put in the metadata repository. For example you can store build information, logs, anything really. I like to think of it as where you keep your complete build history. It should be provide enough information to reproduce a specific build.
Hooks are shell scripts to allow project specific extensions. They are committed to the metadata repository with a predefined name and function to implement.
drs-info-hook.sh
is called by theinfo
command. It can be used to print out user friendly information such links to Jenkins builds, source references etc.function info_hook() { # your hook implementation : }
drs-put-hook.sh
is called by theput
command before commit. It can be used to collect all necessary information about a revision (a build). Such can be used byinfo
command for examplefunction put_hook() { # your hook implementation : }
Given you have a Jenkins job which is producing your builds. drs-put-hook.sh
will dump env
to a file env.json
. Than it will committed and pushed to the metadata repository.
drs-put-hook.sh
function put_hook()
{
jq -n env > env.json
}
Clients consuming these builds will use info
can get valuable information.
drs-info-hook.sh
function info_hook()
{
change_branch=$(jq -r '.CHANGE_BRANCH' env.json)
if [[ "${change_branch}" != "null" ]]; then
branch="${change_branch}"
pr="true"
else
branch=$(jq -r '.BRANCH_NAME' env.json)
fi
echo "branch: ${branch}"
if [[ -n "${pr}" ]]; then
echo "PR: $(jq -r '.BRANCH_NAME' env.json)"
echo "PR link: $(jq -r '.CHANGE_URL' env.json)"
fi
build_url=$(jq -r '.BUILD_URL' env.json)
echo "build link: ${build_url}"
job_url=$(jq -r '.JOB_URL' env.json)
echo "job link: ${job_url}"
}
📝 Jenkins adds many environment variables to builds implicitly. The actual availability depends on your job setup.
- Make sure your pushed your configuration files
drs.json
and.gitignore
- Copy your initial content to the working directory
- Put your directory to remote
git drs-put
# create a new branch (based on the source branch)
git drs-create myFeature
# put new build artifacts to remote
git drs-put
# select the branch you need a build from
git drs-select myFeature
# update to the latest available build
git drs-update
# get the build
git drs-get
Command syntax is the following:
git drs-<command> [options] [arguments]
Optional elements are shown in brackets [ ]. For example, many commands take a branch name as an argument.
To get some information about a command and a link to it's reference documentation use command
with help
:
git drs-<command> help
💡 You can also use commands without Git alias, this is recommended for scripts. Refer to the command name when calling
$DRS_HOME/<command>.sh
The commit message is not very informative. To get more user friendly information use info
:
git drs-info
The info
command implementation is project specific, see section Hooks
To get the current branch name use name
:
git drs-name
To select and switch to an existing branch use select
:
git drs-select [<branch>|<tag>|<uuid>]
Arguments:
branch, tag
- the branch, tag to select, if not specified thedefaultBranch
property will be used (optional)uuid
- the uuid to select, alternatively this searches the log for a specific uuid (optional)
📝 uuid
based selection is useful is to identify builds for example, Jenkins can post the uuid
for each build and users can use this directly
To get to the latest revision use update
:
git drs-update
📝 If you are in detached HEAD state (not on any branch), update
will fail. You need to select a branch than update it
To get the directory revision specified by the current commit. The working directory content will be synchronized with this revision.
git drs-get [-v,--verbose] [-q,--quiet] [--stats] [--latest] [<target_directory>]
Options:
verbose
- sets rsync verbose mode (optional)quite
- sets rsync quiet mode (optional)stats
- enables rsync statistics (optional)latest
- combinesupdate
andget
to get the latest version
Arguments:
target_directory
– the directory to get content to, if not specified set thename
property will be used (optional)
💡 Usually you are only interested in the latest version, this can be done with a one-liner:
git drs-get --latest
To create a new branch use create
:
git drs-create [<branch>]
Arguments:
branch
- the branch to create (mandatory)
To put revision to remote host use put
:
git drs-put [-v,--verbose] [--no-sequence-check] [-s,--sequence <sequence_number>] [<source_directory>]
Options:
verbose
- sets rsync verbose mode (optional)quite
- sets rsync quiet mode (optional)stats
- enables rsync statistics (optional)no-sequence-check
- disables sequence number checkingsequence_number
- the sequence number, must be a comparable decimal (optional)
Arguments:
source_directory
– the directory to put content from (optional)
Simple Jenkins example for using --sequence
$DRS_HOME/create.sh $BRANCH_NAME
$DRS_HOME/update.sh
$DRS_HOME/put.sh --sequence $BUILD_ID my_build_dir
📝 BRANCH_NAME
and BUILD_ID
are Jenkins job variables
source_directory
allows you to use a source directory eliminating the need to stage (copy) content to the working directory
Since drs is uses Git more like a database, therefore not all Git concepts apply. Especially collaboration is completely different in a drs metadata repository.
-
Origin has precedence
To keep the workflow simple and robust, origin has precedence. Commands will force you to be up-to-date with origin and
drs-put
will implicitly try to push the new revision. This ensures whatever happens users will be fall back to a public last known version. Origin is the single source of truth, which must less error prune in single producer, multiple consumer context. -
No merging
Revisions are not stored in Git, they are simple directories somewhere. As you cannot merge a directory on a filesystem, you cannot merge in drs either.
-
Commit message format
Commit message has a strict format. You should not create them manually.
📝 No merging implies that branches are not merged. They are created than deleted if not needed. It's possible to keep all branches if you want to keep all history.
Deleting revisions is done by deleting directories on the remote host. drs will try to locate a revision, if not found, it's assumed to be deleted. This part of the normal workflow and will not be treated as error. To implement a simple retention policy, you can setup a cron job or Jenkins job to delete directories older than 2 weeks for example.
Git was a convenient choice to make something distributed and transactional. Directory metadata is published as a Git commit message in json
format. 😰 ugh, you might say, and you are probably right. I abused the commit message, but in a good way, embracing the tremendous flexibility Git offers. I didn't use Git notes because I don't have anything to annotate, I just want to record something.
So a typical drs commit message looks like this:
{"uuid":"c1ca82b1-7f34-4f4c-9a76-05e3297b2a23","seq":"1622824489"}
The uuid
is used to identify the directory on the remote host. The sequence number helps to drop outdated builds.
rsync is a great tool when your have a small deltas to deal with. Initially I wanted to use a "trendy" S3 (minIO for example) based solution, but I realized not much is gained there. I think for a small development team, these are just adding an unnecessary overhead.
Obviously this is very subjective topic. I wanted to rely on external tools and keep it simple as possible. No advanced logic and the seamless integration with Git aliases pushed me in the direction to use shell only.
I used Google's Shell Style Guide with the help of ShellCheck