I was afraid as much. It’s probably not surprising then that some users get into all sorts of problems.
A NAS in itself won’t prevent data loss, which can occur in more ways than just a drive failure. As @patdavid said, you should do the 3-2-1 thing:
Keeping the SD card is actually a great idea, it gets you a 2/3 on the first one, 2/2 in the second, and 1/1 in the third if you store them elsewhere. Now you just need a third copy somewhere. Have you considered just syncing that USB drive elsewhere?
P.S. those are just the first ones I found. There’s plenty more and probably cheaper.
I backup to a Raspberry Pi sitting on the local network with a usb disk attached. I was using the pi for various server duties (cal-dav, movies, music, git-annex) and enlisted it for backups as well.
So I guess I use a NAS but not for the actual storage of files only backups. I have a workstation and a laptop in addition to my partners devices. The image library is on the workstation hard drive and I no longer do any raw editing on my laptop.
I’m no expert but have been told that
- raid is not backup
- git-annex is not backup
The above makes sense to me. So I use backup software. Currently https://restic.net/.
In addition to the “nas” I have a usb drive that I occasionally run manual backups to. Unfortunately that one is using abandoned software. I also print 10x15cm old school style photos of a selection of family pictures which I store in a big stack.
The idea is that the latter usb drive shall leave my flat and so shall my full backup drive. This has not happened… so big no 1 fail on my end
I agree that RAID is not a backup, but git-annex is different: it deliberately copies files around and makes sure there are multiple copies (like RAID) but it keeps you from deleting versions without a trusted copy without confirmation (unlike RAID). Even when a file is marked as “unused” in git-annex (ie. it has been overwritten with a new version), it won’t let you delete it unless you use
--force or there is enough copies elsewhere.
You otherwise seem to have an excellent setup, and running backups on top of git-annex doesn’t hurt, even though I have personally decided to not backup files archived by git-annex. Maybe that will bite me back in the future, but so far it works well.
I am definitely considering doing 10x15cm (4x6"?) prints of my best shots as well, although that’s not as good a physical backup as film was back then…
In and of itself, RAID is not a backup. But a RAID array can be used as the media to contain a backup. Depending on the RAID level, RAID can help avoid data loss due to hardware errors.
Yes and perhaps most importantly you can have offline copies on various media. However…
https://git-annex.branchable.com/not/ note the first bullet point
what git-annex is not
git-annex is not a backup system. It may be a useful component of an archival system, or a way to deliver files to a backup system. For a backup system that uses git and that git-annex supports storing data in, see bup.
I actually don’t use git-annex for my photo library. I find that the risks of foot shooting are to great. This is from personal experience with less important data
I forgot that I have another layer of redundancy. 7k select copies of my photos are downsized to 2400x2400px and available from my websites (a fair few of those behind authentication)
Well… I guess I disagree with joeyh there. But then again, maybe the point is rather that you don’t need backup when you got proper archival policies in place…
Who needs backups when you have a first-class archiving system like git-annex anyways?
This is drifting off-topic (in git-annex land, can we split this thread out?), but I have had very good experience with git-annex in general. I lost data only once: it was a single movie that I hadn’t copied to my other locations which I forcibly removed by mistake. Definitely foot-shooting, and git-annex allows it, but I don’t know of any backup system that could have kept me from pulling that trigger, apart maybe for
bup because it doesn’t support removing older snapshots.
I’d be curious to hear what software you use for that and where it’s hosted. I host my own gallery at home with sigal but my poor uplink is showing its weakness for large images…
Funny you should ask because if I’m not mistaken I’m using some code you’ve touched for the geolocation/map stuff. I’m hosting with hosthatch OpenVS, because cheapskate, in a location close to where I am. I’m generating my sites/galleries with… https://ikiwiki.info. hehe not superfast with 3k photos but I upload once a week on the private site and 20min generation time don’t bother me as it’s fire and forget. My mother however find the commenting a bit slow…
edit: I’ve looked into sigal as well realistically it does most of what I need. I do intend to expand complicate, elaborate on my sites which prevent me from switching to such simple software atm.
About raid, I’m not well read on the topic but I thought it’s a uptime/reliability thing. Only useful for hardware failure? So essentially about decreasing time lost at failure. Not that useful compared to backups when restore time is a non issue. I consider myself and other users an equal or greater greater risk to data than hardware.
Correct. The only reason I’m thinking of RAID here is because if you want to collate multiple drives to make a larger one, the failure of any of those drives will take down the entire array. Therefore it becomes more important to have reliability there. The idea would be that instead of having (say) one large 4TB drive, you’d have (say) 4x2TB drives in RAID-10. Performance and reliability would be better, and you can reuse older drives you might otherwise throw away.
Synology RS814 NAS using four 1 TB WD ‘Y’ type server hard drives since about 2014. RS814 communicates via link aggregation to smart switch and thence to household PCs using NFS. NAS data are stored using Synology’s Hybrid RAID. Digital video can be played in real time from this device. This reminds me that I should test (for the fun of it) whether it can support two PCs playing video. Stored photography files are not yet enough file space to consider upgrading to larger disks or adding a piggy-back NAS extension.
Thanks to this topic for reminding me it’s time to back-up my back-up.
Do you guys keep your NAS running 24/7 ?
Just wondering how long would HDDs live in these conditions.
I will shut my NAS down, but its under my desk, so not a pain to turn it back on. I’ve had some WD red disks for like 4 years.
Absolutely. Turning things on and off is annoying and error-prone. In particular, my SSHFS setup doesn’t deal well with disconnects, and NFS is not much better either. If you do do that, wake-on-LAN is a good avenue to consider…
HDDs are designed to keep running. It’s counter-intuitive, but spin-up/down can impose more stress on the drive than just the regular spinning. As for the actual numbers, this is usually part of the specs, and also includes how much data is written to the drive.
For example, the Seagate IronWolf 8TB has the following specs, according to Newegg:
- Workload rate of 180TB/year.
- Always-on, always-accessible 24×7 performance.
- 1M hours MTBF, 3-year limited warranty.
If we would be to believe those specs (and if my math is right), it would mean the drive could run continuously for 114 years. Notice how that stands in stark contrast with the 3-year warranty.
Those drives are designed to stay on and keep a load. You do pay a little more, but if you’re going to be building a NAS anyways, I think it’s worth the cost.
The real problem with drive is random, out of spec failures of all sorts. Some drives will just die earlier than those specs for no reasons: you can return them, get a RMA and a new drive, but your data is still lost. This is why checksums and/or backups and/or RAID or so critical to data integrity and/or reliability and/or high availability. Ultimately, things fail, if only because cosmic radiations will flip a bit on the disk or the controller and damage your data.
Keeping your drives running at least allows you to monitor their health continuously. The real trade-off, for me, is not reliability, but power usage and the associated environmental impact. Then you need to make other calculations regarding the power usage at spin-up/down time and continuous use, which are much harder because less clearly specified.
Yes, turning such stuff on and off reduces its operating life. There’s inrush current that stresses the electronics, and more significant, there are friction and mechanical stress dynamics associated with getting masses such as disk platters spun up.
All of this points to having a well-considered redundancy strategy. And it’s not mirrored RAID, especially with same make/model/purchasedate HDDs…
I think RAID is about availability not redundancy.
Depends on the mode. RAID 1 is “mirroring”, where the identical data is written to at least two separate drives. That is redundancy supporting availability…
Certain RAID modes are also about performance, spreading the head movement to access data across multiple drives. This was actually the original incentive behind the concept.
RAID ( Redundant Array of Independent Disks , originally Redundant Array of Inexpensive Disks )
I have never used a NAS but I want to try it. I have a Raspberry Pi kicking around and was thinking to start with something very simple like an RPi with two external HDDs over Samba (shared folder). I will be backing up the HDDs to another drive via rsync once in a while. The main application will be hosting the image collection for digiKam. I also develop the images I like the most using darktable.
The host (RPi) will be connected over 5.8 GHz wi-fi but I can hard wire it if required. The clients are two laptops (one with Linux, the other one with Windows 10) and possibly a desktop PC (Linux). Both laptops support 5.8 GHz wi-fi.
I don’t think I will ever need to access the collection from outside of my home network.
Do you think the setup is going to work well? I suspect it to be a bit slow but I hope the speed will be acceptable what what I do.