Fellow data-hoarders, how do you manage your files?

Jossie · July 25, 2020, 3:08pm

Hello,

my image date consist of scans of slides, negatives and prints (linear scans and processed images), digital camera (non-raw) and smartphone images.

For a start I copy my images onto an internal hard disk with a directory structure named year/month for digital camera data, which only holds images. For scans I just use consecutive numbering (but see below).

All of these are in irregular intervals copied to 4 external hard disks with identical content using a free tool traybackup (unfortunately it is only in German), which does all I need ,e.g. copy only files that have changed.

For the scans I produce quick-look jpgs to select good images to work with and they are also used to document via the metadata the images using EXIFtool GUI and geosetter (when done I will copy the metadata to the originals).

I do not rely on a dedicated archiving tool (like Picasa) since you never know, how long it will be supported. I export the metadata with EXIFtoolGUI into text files and import these into an EXCEL spreadsheet with a VBA macro I have written. So I can select images via filtering (e.g. images of my father in France between August 1968 and December 1970) and display images from within EXCEL. The macro also allows to copy the selected images to e.g. another directory for distribution to friends and relatives. From EXCEL I always can fall back to pure text files, which should be supported forever.

Hermann-Josef

Sitwon · July 25, 2020, 5:04pm

It’s not the obscurity* of the data format that bothers me, it’s more that having a server at all is a barrier to entry for many people.

Even an off-the-shelf product like Synology or QNAP would be outside my wife’s comfort level to setup on her own. So it’s more of me trying to decide if a server-side component is a necessary evil, or maybe an optional component, or maybe completely unnecessary and all remote storage can simply be treated as “dumb”.

*: I personally don’t consider the data formats of any of the mentioned tools to be “obscure”. I’m a software developer with experience implementing similar data storage solution so, I understand all the nuts and bolts of the on-disk formats.

paperdigits · July 25, 2020, 5:08pm

It’d be nice to know of you just want to abstract the technicality away from your less technical users, or if you want to avoid technicality all together.

You could easily run your own minio server, and because eit speaks S3, a bunch of easy to use backup tools support that.

garrett · July 25, 2020, 6:49pm

Not sure about how other people think about this, but:

I’d like to avoid technical complexities for my spare-time non-tech projects. I’m often debugging enough weird things during work hours.
I’d like something I could recommend to others. Nobody on any platform should have to use a weird, hard-to-use homegrown command line utility solution for backing up files. (Some of the command line utilities are not so hard to use, I know… only if one already knows how to use a shell, however.)

However, I guess an offsite solution is always going to be a little tricky. And improving my on-site backup is also going to require some effort, I suppose. So I guess a working solution, at least for me, might have to be something I have to settle for.

Looking forward to what others say. There’s some interesting information in this thread!

Oh, by-the-way: I was looking around for info and found this comparison of various cloud storage providers: GitHub - josephrocca/cloud-storage-comparison at patch-1 (it’s a fork with a commit that added a little more up-to-date info). It compares performance info and price.

garrett · July 25, 2020, 7:53pm

I found something that seems to hit most points:

Vorta

It’s:

based on the Borg backup software
has a UI
works on any server that has a locally exposed directory point (local storage, mounted storage, NFS)
works over SSH
uses Star Trek names for the UI and the software that powers it
cross-platform, on Linux, *BSD, and macOS (for those who use it) — not Windows however
can mount the filesystem over FUSE
compresses
de-duplicates
optionally encrypts
can visit and/or restore old snapshots (even at a file-level)
supports automatic pruning rules
lets you fall back to just using “borg” itself if you no longer want to use the UI
is developed as free software (by BorgBase, a backup hosting company) under the GPL v3
is available as flatpak on Flathub (you don’t even have to install the borg backend — just install the flatpak and you’re ready to go!)

I’m testing it out now.

Ideas on usage:

It may be faster to back up to a secondary drive for the first go and then hand it to a friend to plug into their server (if they have one, or provide a Raspberry Pi to do the dirty work) and sync changes over the 'Net.
Or just periodically sync changes to the other filesystem (for offsite storage that’s physically transported without Internet syncing).
And also for a standard backup on a second drive for those of you without a NAS (or want yet another backup).

Another option:

Déjà Dup

Also available as a flatpak
integrates with GNOME
- this integration also includes options to revert to previous versions directly from Nautilus (aka: “Files”), which is fantastic
- …however, the integration doesn’t work if you install Deja Dup as a flatpak
uses Duplicity (powered by rsync & rdiffdir)
free software (GPL)
supports a ton of backup providers

Usage should be mostly the same, plus many more options (if you choose a backup provider).

Comparison experience, with test data (local cache of raws + XMPs):

Deja Dup seems to have a progress bar on the initial backup, whereas Vorta didn’t when I tested both (although it looked like Vorta has a widget that might be a progress bar; it just wasn’t doing anything)
Vorta supports multiple profiles (Deja doesn’t seem to)
Deja had a smaller footprint for the test data of raw files + XMP @ 391.7 MB vs. Vorta’s 417.6 MB… original directory is 428.1.
- Vorta supports multiple compression formats; I chose the default. YMMV.
- It’s probably worth trying different compression types to see if any is more beneficial for photos. The difference between Deja and Vorta suggests that there are probably savings to be had by switching to a non-default compression type.
Restoring:
- Deja Dup: either through Nautilus (if you installed it natively instead of a flatpak) or in the UI, in a file-manager like view. But at least this second option seemed a bit clunky to view files (without searching). And it took a while, even for an XMP.
- Vorta: Mount the version of the backup somewhere (in a local folder). Then view the folder with whatever file manager you use, just like anything else. It’s quick to see files — even with thumbnails (if you have raw thumbnailing). You can also use any program on the (read-only) file from the snapshot — even in the terminal, with things like vim, grep, etc.

I assume either would work for many of us. And we could use both too. For example: Deja for documents (backed up to a cloud provider), Vorta for the massive amount of photos.

I think the benefits of recovery file speed, using native tools on the fuse mount (especially with thumbnails or simple raw file viewers), and having profiles has made me decide to go for Borg-powered Vorta.

paperdigits · July 25, 2020, 10:57pm

I’ve also heard good things about Duplicati: it has a GUI and lots of different back ends for storage.

I think ideally, you should set up your backup and it should just run automatically. It should tell you of its successes and failures. Other than the initial setup, it should not require interaction unless it fails.

Sitwon · July 26, 2020, 2:15am

I did what I thought was a pretty thorough survey of backup solutions a couple years ago… but somehow completely missed that Borg also uses CAS, like Restic, but has way better performance (particularly memory profile).

Vorta/Borg looks like a pretty good solution for general backup to a local or network-attached storage device. Doesn’t quite check every box for me, but it’s definitely worth considering further.

Duplicati v2.x also looks like a decent option. I had originally written off Duplicati because I thought it used the same storage strategy as Duplicity, but I see that they changed over with the release of 2.0 so I guess I need to re-consider.

Thanks for sharing!

nosle · July 26, 2020, 1:58pm

I chose restic for backup because you can send all devices to one repository and dedup across devices. I have a workstation, laptop, partners laptop websites etc with lots of heavy image files being duplicated on many of those. It mitigates restics lack of compression for my use case.

Janne · July 27, 2020, 12:20am

Dataflow and backup

Transfer photos to local storage HDD.
Arrange in folders. I am using chronological year/month/day -style folder structure. I do not rename raw files.
Immediate backup into an external HDD, stored in the next room. Currently using bash script running rsync. At the same time everything else in local storage is mirrored.
Memory cards can be formatted.
Periodical transfer of images to a second external HDD stored in another apartment. Rsync script takes care of it. Consider doing this via Internet connection in future.

I am using rsync with

–delete --backup --backup-dir="/xxx/xxx/deleted_$(date +%Y-%m-%d)"

which moves deleted and altered files into a folder from where they can be rescued if I have second thoughts. Periodically delete old folders to free space. More than half of taken images are eventually permanently deleted. Currently have something like 2+ TB of taken photos and videos in storage. Consider rethinking procedure when I have more data.

I end up with three copies of every photo with some lag. Only step 5) is somewhat annoying to do. System is quite resistant to accidental deletions. Third hdd in another locations saves almost all images in case of fire.

Mister_Teatime · July 29, 2020, 11:46am

This is a great thread, and I’ll have to make some time to look at the various tools for photo organisation. Digikam never quite worked for me, and I’m now trying out Shotwell. It gets the tagging an organizing pretty much right for me (though I’m still getting used to the interface), but it also auto-generates jpeg versions of every Raw file, which then confuse RawTherapee and other image viewers, who will show them as separate photos.

Regarding Backups, I’ve made good experiences with the following tools, with Photos and other data:

Syncthing keeps directories on multiple machines synched, optionally with previous versions of files being kept around when they’re replaced by a version from a different machine. It’s got a browser interface (which I’m not a big fan of) but there’s Synctrayzor, a tray application for Windows, and Syncthing-GTK for Linux, as well as a Syncthing plasmoid for KDE, all of which I find good ways of managing syncthing. It’s running on all of my machines, including the (Synology-)NAS and my phone, and I’m using it to keep working data synched between the office and my working laptop, and it deals with forewalls and NAT fairly well, too. Very useful tool. Although its purpose is mostly for synchronizing data, having updated copies on multiple machines, with some versioning in case of accidents, is also a decent insurance policy.

For cases where I have one “master” copy which I’m working on, and require a backup (which I will never actively work with, just restore from), I find Back in Time a very useful tool. it stores compressed, incremental backups either locally or remotely, and has a nicely configurable storage/deletion strategy for older snapshots. So that gets you a nice versioned backup, with a decent GUI and no interaction required after set-up, or to restore data. Duplicati provides very similar functionality on Windows.

And the final data safety tool I like to use are BTRFS snapshots on my Synology NAS. It allows me to configure very similar settings to Backintime, except they’re just for local snapshots which I can revert to in case some files get damaged. These will also work if you’re using Windows because the snapshots will be shown as “previous versions” in the folder properties dialogue of the network folder in Windows, so they can be easily browsed as if they were current folders, and you can copy files out of them as needed.

garrett · September 2, 2020, 2:37pm

Just a quick update:

If you’re using GNOME and/or you want a super-simple and very nice UI to Borg, there’s now Pika backup.

And it’s available as a Flatpak on Flathub:

https://flathub.org/apps/details/org.gnome.World.PikaBackup

As far as I can tell, it has the same featureset as Vorta (including mounting backup snapshots), with the notable exception of scheduled backups and regular expressions (both of which are currently not implemented, but planned). As both Vorta and Pika are powered by Borg, you can point either tool to an existing backup location too.

emem · September 2, 2020, 3:31pm

Use a NAS for storage of all your data. Use a RAID configuration. Use a file system that like ZFS that fixes bitrot every time you read a file.

Follow the 3-2-1 backup strategy:

3 copies of every file
2 copies locally, but in different storage types. E.g. one copy on your NAS, one copy on an external HD
1 off-site backup.

I use FreeNAS as my server OS, and Backblaze B2 for offsite backup. I also take backups to encrypted external HD that I store in my cellar.

Tim · September 2, 2020, 7:09pm

My organization tactics are poor, just various directories and subdirectories on my drives.

My backup strategy is btrfs snapshots, sent to an external drive.

pittendrigh · September 2, 2020, 9:49pm

Strictly the file system. I have two 8 tb drives. One mirrors the other from cron. Git-annex looks too good to pass up. I’ll be switching over as soon as I have some spare time (next ten years?)

/home/me/Camera/Critters/Yellowstone/09-01-2020/raw1 for instance. Darktable makes darktable_exported under that.

All jpegs or pngs are saved with a descriptive name that also contains the name of the raw, so the two can always be paired again at a later date. For instance:

DSC_1234.NEF might become DSC_1234_Charging-grizzly.jpg

or

DSC_1235.NEF might become DSC_1235_Subsequent-funeral.jpg

On linux I have the OS on a separate SSD, with /home mounted on one of the two big drives. I copy to a USB drive too, once a month or so. If the house burns down I’m toast.

Git-annex looks totally cool.

pittendrigh · September 2, 2020, 10:53pm

If you mount /home on one disk and the Os on another (SSD) then you can change Linux distributions every three or four days until you find the one you like best. Me Mint latest and greatest now. But I’ve tried them all.

BzKevin · September 3, 2020, 1:51am

What I do may be of interest to other Windows users:

Insert card into reader and copy raws into a “Working Photos” folder that resides on SSD
Once I’m done editing them I’ll use darktable’s “Move” command to move raw + xmp to an “Archive” folder that resides on an external RAID 10 system.
I use a free program called Cobian Backup (windows only, I believe) to monitor for new/changed files in my archive directories, which it then copies to a folder in a mounted cloud storage service ((I have unlimited storage with Google Drive, so that’s what I use. I’m somewhere between 2 and 3 TB of photo data on my RAID / G-Drive now).

There is some level of risk associated with the fact that there are no duplicates of the files until I’ve finished editing and relocated to the “Archive” folder. I mitigate this by not deleting the files off of the SD card immediately (my card is a 128GB so plenty of room for multiple shooting sessions based on my type of work)

I also have Cobian backing up my darktable database file as well, and it’s setup to run a sync every time the computer boots up, but can be configured in lots of ways. It’s also dead simple to right click the tray icon and select “Run All Tasks Now” if you’d like to force an immediate backup to the cloud.

Since I’m backing up my raws and the XMP I don’t really bother with backing up exported JPEGs. I have some automated tasks that will email any new files in specific folders to various email addresses (associated with Wi-fi connected digital frames at the Grandparents houses). But for the most part my JPEGs do go in to specific folders in google drive for multi-computer access as desktop backgrounds. But when making something something such as a photobook, I don’t save the specific JPEGs I used, I just make a “duplicate” in darktable (if needed) and tag it with a tag specific to certain photobook. That way if I ever neeed/want to replicate the photo book it should be fairly simple to search for that tag and re-export.

Not too different from most solutions here I think, but had not seen Cobian Backup mentioned (maybe I skimmed passed it, if so, sorry!) and it’s been working great for me. Would certainly recommend to any windows users looking to backup in a manner similar to what I describe here.

paperdigits · September 3, 2020, 3:00am

You can import directly to your archive then use darktable’s sync local feature to copy the raw files you want to edit. Then the feature will sync back just the xmp file when you’re done editing.

BzKevin · September 3, 2020, 1:45pm

Interesting, thanks Mica! I’ll read up on that so I understand it. Sounds promising.