Backing-up the Ubuntu system

chris · February 8, 2017, 8:26pm

In the second answer here, it is explained what is the drawback of --get-selections and how you can get a list of user-selected packages rather than all installed dependencies which may be not valid on a new setup, maybe if a new release of the host system is installed.

paperdigits · February 8, 2017, 8:53pm

I can only comment what is working for me; I’m a Debian stable user and get-selections is working for what I need it to.

The forth comment on the first answer is why I don’t contribute much to Stack Exchange anymore, the “God this doesn’t work and you didn’t consider every single last use case” sort of comment that comes 6 years after the answer was posted.

frd · February 8, 2017, 9:25pm

This is a great backup system, specially since it can handle large files easily.

https://git-annex.branchable.com/

chris · February 8, 2017, 10:14pm

Sorry @paperdigits, I didn’t mean to be offensive, I just wanted to contribute a solution that I found working reasonable. For me it’s simple: One solution gives a long list of packages installed, the other gives a short list of the packages that I selected for installation, but missing the ones that come with the distro preinstalled. Both can be the wrong decision, so maybe one should keep both lists. Especially when you decide to switch to a new version of the distro after replacing your broken harddisk (happens more often than one might think, for me, full harddisk comes close to broken ), it’s better to drop dependencies since they can cause a lot of trouble.

chris · February 8, 2017, 10:21pm

Seems I am the bad guy today, but the very first words on the what git-annex is not page state that it is not a backup system. Unfortunately this site does not explain why. Besides this, I use git-annex for some applications, but neither for system backup nor for photo management, but I intend to do the latter soon (but unfortunately time is a scarce resource these days).

paperdigits · February 8, 2017, 10:42pm

Most certainly no offense taken @chris nor was your statement offensive to me.

“What comes preinstalled” might be easier to define on Ubuntu, but on Debian I always start from the netinstall Iso, get the most minimal base system (no Xorg, nothing selected in tasksel when installing), then I build my system up exactly the way I want it. So getting every package is OK for me, since my “base” install is really basic I probably wouldn’t use the get-selections file on a new operating system version, I’d probably go through the list by hand if that were the case. However, on Debian one can pretty safely dist-upgrade from one release to the next, so I wouldn’t wipe my machine and reinstall in order to upgrade to a new version anyway, so using get-selections for that usecase just doesn’t happen on Debian.

RawConvert · February 8, 2017, 10:42pm

Thanks for the suggestions folks. I’ll look into clonezilla and systemback. Git-annex says it’s for git users who love the command line, so that rules me out twice over just for starters!

frd · February 8, 2017, 10:45pm

There is a graphical interface…

https://git-annex.branchable.com/assistant/

paperdigits · February 8, 2017, 10:46pm

Git-annex is not a backup solution because it syncs files and does not sync binary file history. So if you change/overwrite a binary file in the annex, then do git-annex sync --content the new binary will propagate to your remotes. The old binary might still be there if one hasn’t run git annex unused and removed it, but git annex doesn’t try very hard to retain the copy of the old binary. That’s bad if your intention is for it to be a backup all by itself.

frd · February 8, 2017, 10:48pm

I wouldn’t use it for system backup either… my use case is to have a server with git-annex and a bunch of other computers with git-annex clients and sync all our files in one place. This happens to work as backup in case one of the disks fail but also it works as a kind of dropbox to sync files with your team. The great advantage here is of course it can handle big chuncks of raw video flawlessly.

chris · February 8, 2017, 10:53pm

But this is what the OP asked for . But I agree with you on that it’s great!

paperdigits · February 8, 2017, 10:53pm

Please don’t confuse file redundancy with backup. Git annex is great at giving you file redundancy, which is what you’ve described, but falls short in backup use case.

frd · February 9, 2017, 12:45am

Just thought I’d share another use case scenario.

frd · February 9, 2017, 12:47am

Yep, to me they are kind of the same… what would be advantage of a backup over this?

Jonas_Wagner · February 9, 2017, 1:03am

I use rdiff-backup. I described my backup setup here: Desktop rdiff-backup Script - 29a.ch

With that said I don’t do full backups system backups, just what I consider valuable data. I have a script to relatively quickly setup a new box, and enough boxes around so I can continue working.

Isaac · February 9, 2017, 3:24am

I’ve used many backup tools over the years, including clonezilla, dejadup, custom scripts, basic cp commands, etc. I maintain about 10 Ubuntu systems between my personal computers and my lab on campus. I’ve never found a perfect solution, but recently I’ve really been enjoying aptik. It uses duplicity (and thus, rsync) as the engine and makes it easy to backup both files, settings, and program list. The developer is active and attentive (I’ve filed a few feature requests, and he got on it quickly!). So far, it’s been a pretty great piece of backup software…

paperdigits · February 9, 2017, 3:30am

Redundancy is multiple copies of a file. In the case of git-annex, dropbox, and other file syncing solutions, the software works to make sure the same files are present everywhere.

Backups are copies of files from a certain point in time. Backups are often either separate systems or external hard drives.

Consider this use case: you’re using git-annex to sync video files between you and a friend. Your friend puts a huge video file in the file share. You add it to your project. Then your friend runs out of diskspace and removes the video file. They software syncs the file deletion across all the clients. Now that file is gone. If you have a backup from the period of time where you had the file, you can recover it, but having redundancy didn’t work since the removal was intentional.

Maybe you also delete a file you think you don’t need, but end up needing it later. If you have a backup from that point in time, you can get the file. If git-annex is auto-syncing files (or you manually sync them as well), then the file is gone, as you removed it.

Andrius · February 9, 2017, 3:45am

dd command will do the job.

Get backup of a partition in the form of image:

$ sudo dd if=/dev/sda1 of=/home/jt/part.img

frd · February 9, 2017, 11:49pm

In our case we would choose one folder to serve as an archive and we never delete anything. I think you can also configure it in a way that the files would only get deleted locally and not on the server. But I see what you mean, cheers.

Bp_Tec · June 14, 2019, 7:37am

I know this thread is a little old now, but I wanted to share the process I use to backup my system, and photos, just in case any one else maybe looking for this.

There really is no reason to use external tools, as some of the best backup tools are all ready included with Linux, and all it take is creating a small script file.

This is my systembackup.sh script file, residing in my root scripts folder.

#!/usr/bin/env bash
#
# Backup system
# Preserve permissions and ACL's
# Run the task on processor nodes 10 and 11

taskset -c 10,11 \
    rsync -aAX --delete \
    --exclude=/dev/* \
    --exclude=/proc/* \
    --exclude=/sys/* \
    --exclude=/tmp/* \
    --exclude=/run/* \
    --exclude=/mnt/* \
    --exclude=/media/* \
    --exclude="swapfile" \
    --exclude="lost+found" \
    --exclude=".cache" \
    --exclude="Downloads" \
    --exclude=".VirtualBoxVMs" \
    --exclude=".ecryptfs" \
    / \
    /mnt/BACKUP_DRIVE/System \
    &

Lets break this down a little, first the rsync backup part of the script:

-a
This option is a combination flag. It stands for “archive” and syncs recursively and preserves symbolic links, special and device files, modification times, group, owner, and permissions.
-A, --acls
Preserve ACLs (implies --perms)
-X, --xattrs
Preserve extended attributes
--delete
Delete extraneous files from destination dirs (be careful with this) if a file in the destination is not in the source it will be deleted. So make sure that your source and destination are correct.
--exclude=PATTERN
Exclude files matching PATTERN (There are a whole bunch of files that you don’t need to backup, this excludes them.
/
The root directory I want to backup.
/mnt/BACKUP_DRIVE/SYSTEM
The destination drive.
&
Run the process in the background

So that’s the backup part of the script, after the initial backup it only copy across changes in the root system. Now an issue I found was that when I ran this backup my system could slowdown, even hang, as the process was running. I have a 12 core CPU, and the load was shared across all cores, meaning other running processes would suffer.

taskset -c 10,11
What this does is assign processor cores 10 and 11 to the task, allowing my machine to keeps running smoothly while the backup is running. I especially found this useful when I had to transfer 2Tb of images to a new drive.

My backup script can be set as a cron job, or ran manually as I desire.
In case of system failure, i just need to boot with a live USB, and copy the files back using the same process.

I have a separate script to backup my photos directory, it is pretty much the same:

#!/usr/bin/env bash
#
# Backup photos preserving permissions
# Run the task on processor nodes 8 and 9

taskset -c 8,9 \
    rsync -ap --delete \
    /mnt/USER_DATA/Photos/ \
    /mnt/BACKUP_DRIVE/Photos