HOW I BACKUP

2023-04-27
If you are a system administrator, one of the most crucial aspects of your job are backups. And it's not enough to just create them, you also have to store them properly and test their effectiveness, i.e. if the recovery process can be completed without any issue. Furthermore, you also have to consider the scalability factor of your disaster recovery plan, i.e. how to back up your whole infrastructure when it will grow in size.

There are many ways to accomplish this. A backup can be anything from an external hard drive where you manually copy files one-by-one to a fully-automated orchestration service that provide you disaster recovery services through cloud computing(DRaaS).

RAID is not enough!

There are also people who consider snapshots, volume management technologies(such as LVM or ZFS) or data mirroring(such as RAID) as reliable backup solutions. Let's be clear about this: they are not. RAID cannot protect you against human error(i.e., against accidental files deletion) or from viruses, in fact it will efficiently propagate the file changes across the disks of the array. RAID is good for hardware failures, i.e. when your disk break down. But, if you want to be able to recover a file that has been accidentally deleted, you need a backup.

How to make a backup

A proper backup plan should consider, at least, the following three aspects:
  1. The 3-2-1 strategy;
  2. Off-SAN copies;
  3. Periodically recovery testing.
The first point is, in my opinion, the most important. The 3-2-1 backup strategy involves three copies of your data: one is the original file, the second one is a copy of the file on an external, onsite medium and the third one is another copy of the file on another external medium located on a different location(offsite). This strategy allows you to be prepared for any kind of disaster, both natural and human made.

How I backup my machines

On UNIX/Windows there are many different programs to make a backup, both free and paid. In this article, however, I want to show my personal backup solution, but before giving you the name of the software I made, I want to digress a little about the requirements I had for my backup process.

My machines are all small-scale virtual servers(VPS), I don't have a large infrastructure with TBs of data. Therefore, the backup solution I was looking for was something that could back up single directories instead of entire operating systems. I was looking for something very small, UNIX-compatible, that would allow me to perform backups using a cronjob. Furthermore, I needed something that would encrypt the final backup using a secure encryption algorithm(like AES).

The first prototype of this backup software was a simple shell script that copied each file using a combination of rsync(1) and gpg(1). At first it worked fine, but it had a major problem: the source code was tightly coupled with the data(the files and folder to backup), meaning that to add a new folder or a new file to back up, I was forced to modify the source code of the program. It was very inconvenient to use, even for a few files. So after that I decided to build a new backup program from scratch, a program where I could dynamically specify the paths to back up using a configuration file, a program able to backup, compress, encrypt, decrypt and verify a backup all from a clean, command-line interface. A program where the source code was decoupled from the user data. That's how backup.sh was born.

backup.sh

backup.sh is a POSIX compliant, modular and lightweight backup utility to save and encrypt your data. It is intended to be used on relatively small UNIX systems, such as VPS, personal servers or workstation, to back up single directories. backup.sh uses a combination of rsync(1), tar(1), gpg(1) abd sha256sum(1) to copy, compress and encrypt your data.

This backup utility works under pretty much any UNIX operating system. I personally use it on both GNU/Linux and FreeBSD servers, but I also know a couple of people who use it on macOS without any kind of issue.

To define the backup sources, backup.sh uses an associative array defined inside a simple text file. The syntax to specify a new backup entry is the following:

<LABEL>=<PATH>

where <LABEL> is the name of the backup and <PATH> is its path. Therefore, if you want to back up your nginx and ssh config files, you should create the following entries inside the configuration file:

nginx=/etc/nginx/
ssh=/etc/ssh/

backup.sh will create two folders inside the backup archive with the following names:

backup-nginx-<YYYYMMDD>
backup-ssh-<YYYYMMDD>

After that, you can start the backup using the following command:

$> sudo ./backup.sh --backup <SOURCES_FILE> <DEST> <ENCRYPTION_PASSWORD>

Where <SOURCES_FILE> is the sources file, <DEST> is the absolute path of the output of the backup without trailing slashes and <ENCRYPTION_PASSWORD> is the password to encrypt the compressed archive.

For example:

$> sudo ./backup.sh --backup sources.bk /home/marco "qwerty1234"

backup.sh will begin to copy the files defined in the sources file:

Copying nginx(1/2)
Copying ssh(2/2)
Compressing backup...
Encrypting backup...
File name: /home/marco/backup-<HOSTNAME>-<YYYYMMDD>.tar.gz.enc
File size: 7336400696(6.9G)
File hash: 0e75ca393117f389d9e8edfea7106d98
Elapsed time: 259 seconds.

This will create a new encrypted backup in /home/marco/backup-<HOSTNAME>-<YYYYMMDD>.tar.gz.enc.

On top of that, you can also ask backup.sh to generate the checksum file of your backup. To do that, simply add the --checksum flag before the backup/extract command. backup.sh will generate a checksum file at /home/marco/backup-<HOSTNAME>-<YYYYMMDD>.sha256 which can later be used to verify the integrity of the backup. Let's see how.



To tell the program that you want to verify the integrity of the files before restoring the backup, use again the --checksum:

$> ./backup.sh --checksum --extract <ENCRYPTED_ARCHIVE> <ARCHIVE_PASSWORD>

For instance:

$> ./backup.sh --checksum --extract backup-<hostname>-<YYYMMDD>.tar.gz.enc "qwerty1234" backup-<HOSTNAME>-<YYYYMMDD>.sha256

The program will decrypt the backup archive and then will verify the integrity of each file using the provided checksum database. If the integrity check fails, it will abort the restoring process with the following error message:

[FATAL] - integrity error for 'backup.sh.tmp/backup-nginx-YYYYMMDD/modules-enabled/50-mod-http-geoip2.conf'.

otherwise, it will create a new folder called backup.sh.tmp in your current path. Inside it, you will find the following folders:

backup-nginx-<YYYYMMDD>
backup-ssh-<YYYYMMDD>

Privacy concerns

backup.sh uses standard encryptions(AES256) to secure your backup files. However this tool is not intended to provide plausible deniability: many of the copying, compressing and encrypting operations made by backup.sh during the backup process can be used to invalidate plausible deniability.

In particular, you should be aware that the checksum option(--checksum) generates an UNENCRYPTED checksum file containing the digests of EVERY file in your backup archive. If your files are known to your adversary, they may use a rainbow table attack to determine whether you own a given file, voiding your plausible deniability. Use this option at your own risk.

Conclusions

As you can see backup.sh is not difficult to use. It is very lightweight, and it can be helpful for all those backup strategies where you just want to back up individual files/directories. If this brief introduction has piqued your curiosity, and you would like to know more about this project, feel free to go to this repository.