On noticing my photo library has now surpassed 1.6TB, including 585GB from the past year, it looks like the pair of 2TB external drives will soon need to be replaced and/or supplemented with new hardware.
My current system is less than optimal. I have my full library on one 2TB drive, which I manually backup via rsync to a second 2TB drive. Usually as part of my preparations before travel, and again after I’ve done an initial cull of a new batch of travel photos. (not so regularly between big trips) I don’t have a 3rd, off-site copy yet, but that’s on my todo list.
Two important questions present themselves:
Do I move to larger storage media in order to keep the full collection together, or do I get another set of 2TB drives and retire the first pair to a static archive?
And, particularly if I move onto larger storage media, what do folks recommend for something in the 4TB+ range that hopefully won’t break the bank when I purchase them in triplicate? Is it time to invest in learning RAID or some other system?
I would keep the old drives for an off-site backup and get two new 4TB or larger (maybe 8TB for future proof) drives. I don’t think you need to complicate it with raid or zfs/btrfs at this point.
I was a longtime database administrator, and I can tell you from experience that your backup system is as important or more important than your storage system. There are several ways to approach it, and at this point I won’t try to prescribe what you should do.
As for your primary storage, I would suggest two 8 TB drives, and I would recommend true RAID-1. This would mean two full copies of your data; if one drive went bad, you would still have your complete data on the other one. Disk drives are relatively cheap, these days, and it sounds like your data is very important to you.
All of the details would depend on many factors, starting with, what is your OS?
rsync does not detect if you have filesystem corruption. While ZFS or BTRFS will detect with their checksums and automatically correct if you have enough data redundancy.
I see one crucial sentence: Don’t break the bank.
With that in mind I’d suggest an alternative, and as you seem to be fine with a little hands-on a somewhat simpler.
As an old time storage admin, yes Raid-1 and 1+0 is beautiful, but it is not in line with the first line.
As you are not relying on eg Lightroom, your software will be more flexible towards storage. So I would keep the current disk for ‘old stuff’ add another drive 2/4 Tb for ‘new stuff’ (consider per $/Gb optimization or space in your computer) which at your current growth rate should support you for 3-7 years!
The rsync script can easily have another line added to handle backup of both old and new stuff, and the destination can be 1 combined file structure. To accommodate space for that a single 8 Tb disk in place could be fine.
Personally I run a cloud storage solution for 3rd copy. The 3rd only being disaster recovery so it’s simply a mirror of the 2nd copy, mirroring done by an app from the cloud provider.
The rsync from 1st to 2nd do not include the --delete switch, so moving things around will cause build up of extra copies, so once in a while I run a --delete session of rsync.
Its very low performance but again depending on your budget there is a 4 bay Terramaster NAS for sale until the 31st I think in Canada for 207CAD…its slow but since this is a backup solution it might be something to consider… I even saw someone hack this I think recently and put OMV on it…
If you’re really open to rethinking your backups a little bit, please consider a tool that will give you multiple point-in-time backups. There are some rsync based tools that’ll do this, like rsnapshot, but there is also restic, borg, kopia, and quite a few others. I use restic, we use restic in the pixls infrastructure; it is pretty popular.
Multiple points in time are good, because as of now, if you get silent file corruption and you run your rsync backup, now your back also only has the corrupted version of that file.
Also please get something off site. Hetzner storage box or Backblaze B2 are both good choices, are relatively cheap.
I’ve seen this stated in a few threads now. Is that because true backup includes snapshots, not just a single copy? Or is it more that I need multiple copies, so more than a single RAID system?
Robust backups are all about getting back to a certain spot in time. If you have file corruption two months ago, but you only find that out today AND you only have one point in time copy of your files from last week, then you’ve lost the files that are corrupted. If you have a backup app that gives you the ability to have multiple points in time that you can go back to, then there is a higher likelihood of not loosing that data.
So restic adds a “snapshot” to the backup repo every time I run it. So in the case outlined above where I have file corruption from two months ago, I can mount the snapshots from 2 months and five days ago and get the files before the corruption happened.
Great protection from ransomware or at least better as the older snapshots wont have the modified files altered by the ransomware and will be able to revert things…
RAID gives redundancy within a storage system. Many things could happen to either ruin or obliterate that storage system. Backups should be held in a completely separate location, or at least disconnected from the system they are backing up.
I have a moderately small home system. I copy my backups (incrementally) to an external drive. As soon as I have completed my backup procedures, I umount my backup drive and physically disconnect it from my computer system until the next time I am performing backups. I have monthly copies of my backups dating back to the summer of 2022, and approximately weekly backups going back a month. The reason my monthly backups only go back to 2022 is because my backup drive was connected to my system when lightning struck. I had to get a new backup drive and start going forward from that time, again.
I am the first to admit that I don’t have a great backup system, but I do what I can within my budget. But it is probably better than what 95% of people do.
That all sounds great, but if you’ve got monthly snapshots, doesn’t that turn my 2TB library into 24TB in a year? Or is there a way to to reduce the size of the snapshots on disk?
I was thinking you’d need a full copy for each snapshot in order to protect yourself against file corruption. But even if the backup program doesn’t know if a file is corrupt, it will still detect the contents don’t match the previous version. So a series of snapshots should contain both the pre- and post-corruption versions of the file, while only keeping one copy of any file that hasn’t changed?
Correct. For tools like restic, et. al., your files are actually chunked into smaller bits and a record is kept. I have snapshots going back like 5 years or so ( which is too much) but since the raw files don’t change, the snapshots only get slightly larger than the actual files (for metadata).
You are correct that if a file gets corrupt or ransomware’d, that the backup tool will just register it as changed and store that file too.
Restic is a modern tool, so you get deduplication and compression for your backups. I have liked it a lot and I highly suggest checking it out.
Also as someone who just went thru the Southern California wildfires, get an offsite back up now. I think I pay like $3.50 USD a month for ~650 GB at Backblaze.
With ZFS/BTRFS you need to run periodic checks on filesystems to detect potential corruption. If detected and you have enough redundancy in your setup the failed files will be fixed automatically. You don’t need to go back to your older snapshots.