Backing-up the Ubuntu system

This is a great backup system, specially since it can handle large files easily.

https://git-annex.branchable.com/

Sorry @paperdigits, I didn’t mean to be offensive, I just wanted to contribute a solution that I found working reasonable. For me it’s simple: One solution gives a long list of packages installed, the other gives a short list of the packages that I selected for installation, but missing the ones that come with the distro preinstalled. Both can be the wrong decision, so maybe one should keep both lists. Especially when you decide to switch to a new version of the distro after replacing your broken harddisk (happens more often than one might think, for me, full harddisk comes close to broken :wink:), it’s better to drop dependencies since they can cause a lot of trouble.

Seems I am the bad guy today, but the very first words on the what git-annex is not page state that it is not a backup system. Unfortunately this site does not explain why. Besides this, I use git-annex for some applications, but neither for system backup nor for photo management, but I intend to do the latter soon (but unfortunately time is a scarce resource these days).

Most certainly no offense taken @chris nor was your statement offensive to me.

“What comes preinstalled” might be easier to define on Ubuntu, but on Debian I always start from the netinstall Iso, get the most minimal base system (no Xorg, nothing selected in tasksel when installing), then I build my system up exactly the way I want it. So getting every package is OK for me, since my “base” install is really basic :wink: I probably wouldn’t use the get-selections file on a new operating system version, I’d probably go through the list by hand if that were the case. However, on Debian one can pretty safely dist-upgrade from one release to the next, so I wouldn’t wipe my machine and reinstall in order to upgrade to a new version anyway, so using get-selections for that usecase just doesn’t happen on Debian.

Thanks for the suggestions folks. I’ll look into clonezilla and systemback. Git-annex says it’s for git users who love the command line, so that rules me out twice over just for starters!

There is a graphical interface…

https://git-annex.branchable.com/assistant/

Git-annex is not a backup solution because it syncs files and does not sync binary file history. So if you change/overwrite a binary file in the annex, then do git-annex sync --content the new binary will propagate to your remotes. The old binary might still be there if one hasn’t run git annex unused and removed it, but git annex doesn’t try very hard to retain the copy of the old binary. That’s bad if your intention is for it to be a backup all by itself.

I wouldn’t use it for system backup either… my use case is to have a server with git-annex and a bunch of other computers with git-annex clients and sync all our files in one place. This happens to work as backup in case one of the disks fail but also it works as a kind of dropbox to sync files with your team. The great advantage here is of course it can handle big chuncks of raw video flawlessly.

But this is what the OP asked for :grin:. But I agree with you on that it’s great!

Please don’t confuse file redundancy with backup. Git annex is great at giving you file redundancy, which is what you’ve described, but falls short in backup use case.

Just thought I’d share another use case scenario. :slight_smile:

Yep, to me they are kind of the same… what would be advantage of a backup over this?

I use rdiff-backup. I described my backup setup here: https://29a.ch/2014/11/13/desktop-rdiff-backupscript

With that said I don’t do full backups system backups, just what I consider valuable data. I have a script to relatively quickly setup a new box, and enough boxes around so I can continue working.

1 Like

I’ve used many backup tools over the years, including clonezilla, dejadup, custom scripts, basic cp commands, etc. I maintain about 10 Ubuntu systems between my personal computers and my lab on campus. I’ve never found a perfect solution, but recently I’ve really been enjoying aptik. It uses duplicity (and thus, rsync) as the engine and makes it easy to backup both files, settings, and program list. The developer is active and attentive (I’ve filed a few feature requests, and he got on it quickly!). So far, it’s been a pretty great piece of backup software…

1 Like

Redundancy is multiple copies of a file. In the case of git-annex, dropbox, and other file syncing solutions, the software works to make sure the same files are present everywhere.

Backups are copies of files from a certain point in time. Backups are often either separate systems or external hard drives.

Consider this use case: you’re using git-annex to sync video files between you and a friend. Your friend puts a huge video file in the file share. You add it to your project. Then your friend runs out of diskspace and removes the video file. They software syncs the file deletion across all the clients. Now that file is gone. If you have a backup from the period of time where you had the file, you can recover it, but having redundancy didn’t work since the removal was intentional.

Maybe you also delete a file you think you don’t need, but end up needing it later. If you have a backup from that point in time, you can get the file. If git-annex is auto-syncing files (or you manually sync them as well), then the file is gone, as you removed it.

dd command will do the job.

Get backup of a partition in the form of image:

$ sudo dd if=/dev/sda1 of=/home/jt/part.img

In our case we would choose one folder to serve as an archive and we never delete anything. I think you can also configure it in a way that the files would only get deleted locally and not on the server. But I see what you mean, cheers.

I know this thread is a little old now, but I wanted to share the process I use to backup my system, and photos, just in case any one else maybe looking for this.

There really is no reason to use external tools, as some of the best backup tools are all ready included with Linux, and all it take is creating a small script file.

This is my systembackup.sh script file, residing in my root scripts folder.

#!/usr/bin/env bash
#
# Backup system
# Preserve permissions and ACL's
# Run the task on processor nodes 10 and 11

taskset -c 10,11 \
    rsync -aAX --delete \
    --exclude=/dev/* \
    --exclude=/proc/* \
    --exclude=/sys/* \
    --exclude=/tmp/* \
    --exclude=/run/* \
    --exclude=/mnt/* \
    --exclude=/media/* \
    --exclude="swapfile" \
    --exclude="lost+found" \
    --exclude=".cache" \
    --exclude="Downloads" \
    --exclude=".VirtualBoxVMs" \
    --exclude=".ecryptfs" \
    / \
    /mnt/BACKUP_DRIVE/System \
    &

Lets break this down a little, first the rsync backup part of the script:

  • -a
    This option is a combination flag. It stands for “archive” and syncs recursively and preserves symbolic links, special and device files, modification times, group, owner, and permissions.
  • -A, --acls
    Preserve ACLs (implies --perms)
  • -X, --xattrs
    Preserve extended attributes
  • --delete
    Delete extraneous files from destination dirs (be careful with this) if a file in the destination is not in the source it will be deleted. So make sure that your source and destination are correct.
  • --exclude=PATTERN
    Exclude files matching PATTERN (There are a whole bunch of files that you don’t need to backup, this excludes them.
  • /
    The root directory I want to backup.
  • /mnt/BACKUP_DRIVE/SYSTEM
    The destination drive.
  • &
    Run the process in the background

So that’s the backup part of the script, after the initial backup it only copy across changes in the root system. Now an issue I found was that when I ran this backup my system could slowdown, even hang, as the process was running. I have a 12 core CPU, and the load was shared across all cores, meaning other running processes would suffer.

  • taskset -c 10,11
    What this does is assign processor cores 10 and 11 to the task, allowing my machine to keeps running smoothly while the backup is running. I especially found this useful when I had to transfer 2Tb of images to a new drive.

My backup script can be set as a cron job, or ran manually as I desire.
In case of system failure, i just need to boot with a live USB, and copy the files back using the same process.

I have a separate script to backup my photos directory, it is pretty much the same:

#!/usr/bin/env bash
#
# Backup photos preserving permissions
# Run the task on processor nodes 8 and 9

taskset -c 8,9 \
    rsync -ap --delete \
    /mnt/USER_DATA/Photos/ \
    /mnt/BACKUP_DRIVE/Photos
2 Likes

Nothing wrong with using the basic tools already included in a standard Linux system. But if you’re looking for a specialized tool BorgBackup seems to be quite a popular solution these days: https://www.borgbackup.org/

Free software doesn’t need to be backed up. If you lose a hard drive you buy another one and re-install.

On linux systems what I do back up are the /home directory and /var/www/html (where I do web development). I like to copy my /etc/fstab and /var/spool/cron directories too but they never change, so one copy is enough.

I have two internal hard drives (three actually). OS on an SSD, /home on a six terabyte WD red drive and another 8 terabyte WD that mirrors the first.

Heres’ my crontab

 # m h  dom mon dow   command
0 0 * * * /usr/bin/updatedb 
0 1 * * * /usr/bin/rsync -avz --exclude 'cache' /var/www/html/ /disk1/www/html
0 2 * * * /usr/bin/rsync -avz --exclude 'cache' /home/ /disk1

…where disk1 is my backup mirror disk
That rsync takes a long time to run the first time but quickly thereafter.

1 Like