Cloud storage - Pros and cons?

Hi all,

My wife suggested I look to put my work up on the cloud, as a safe form of back up. But I’ve always been sceptical (dubious) when it comes to the cloud.

I feel if my work were to be on the cloud, then I might somehow lose some control over it. It might not be that secure. How do I know that even though encrypted, it really is safe from prying eyes? Not so much prying eyes, but theiving hands (individuals or corporates).

How many of you use cloud storage? What service do you use, and why? What are your experiences? Is the cloud something you would recommend? What are it’s advantages and cons from your point of view? What is a reasonable amount to pay per month/year?

Any help would be much appreciated!

My ‘cloud’ is a hot plug SATA drive I insert into the SATA slot for backup. That’s fast and secure (because when being removed after backup it can’t be damaged by over voltage) For additional security from time to time I back up to a second SATA which is stored at my parents home (and vice versa). So if one home burns down, there’s still some not too old backup available for both.

I don’t use any kind of online storage. I have six external drives that hold redundant copies of my files managed by git-annex.

3 of the 6 drives use the gpg backend and I rotate them to friends and parents houses.

The downside is that it isn’t as easy as cloud storage, but its not so difficult that I don’t do it.

“Cloud” is a marketing term. Your question is about safety and control. If you want to back your work up to one or more remote servers, I see no problem with that. They are probably more secure against unauthorized access than your personal computer. Authorized access (gov) is also not an issue since this is photography. As for control, read the licence agreement of the storage service you choose to use, to make sure you don’t lose any rights over your work other than giving said service the minimum rights they need to store the data.

1 Like

If you are looking at storage providers: rsync.net, amazon s3, maybe amazon glacier if you read the ToS very carefully, spideroak.

1 Like

There is a sort of “rule of thumb” that I try to follow that makes sense.

If your work doesn’t exist on at least 3 different drives in at least 2 different physical locations, then it truly isn’t backed up. Some might even add at least 2 different types of media (magnetic disc + tape/etc…).

Personally, I keep my data on two different physical drives in my home, one primary that I normally work from, and one that gets rsync’ed to every night. There is a third drive at my office that also gets rsync’ed to every night. This was easiest as I could simply seed a large drive at home and physically walk it into my office. I also have a secondary Flickr account that I just sync all my jpg to for the heck of it (1TB of free storage with them, why not). Everything in it is private by default.

I’ve heard good things about crashplan or mozy (I think owned by Dell now) for creating another remote copy of your data as well as folks like Amazon. At the end of the day, as @Morgan_Hardwood already mentioned, chances are that it’s more secure with them than your own home network. :slight_smile:

Amazon glacier is for long-term storage that is not time-sensitive for retrieval (if you c.an wait several hours to get the data back). It’s also pretty cheap I think, like $0.007 per gigabyte per month.

Amazon Glacier can get pretty pricey if you want a lot of data back all at once, as I said, read the pricing carefully.

1 Like

I’ve used the cloud for a number of years,
Flickr for family albums, Dropbox mainly for sharing and a place to put groups of photos in temporary storage untill I back up to a external hard drive.

Biggest con, my internet speed is awful, so it can take an age to upload :disappointed:

If you really want to minimise the chances of unwanted access to your photos despite using someone else’s storage, you can always encrypt before syncing. I use duplicity, which uses GnuPG, to encrypt potentially sensitive information before sending data to remote locations.

http://duplicity.nongnu.org

However, I imagine the time overhead of encryption with a largish photography library could be rather inconvenient. As I consider it unlikely that anyone is going to be targeting my photography, unless I had photographs of a sensitive nature I probably wouldn’t bother encrypting before uploading to places like rsync.net.

Data uploaded to Spideroak is, I believe, encrypted and made unreadable, even to the provider, as part the automated process. (“Zero-knowledge”.) I do use Spideroak for some pre-encrypted work and personal data, and it’s been very easy to use so far.

Actually, I would expect the big ones to automatically process all images that you upload, extract metadata and maybe do some automated image processing like face detection. If not now then in the future, and if not themselves then by some government agencies. Assuming anything else after all the things we know since Snowden (and assumed for a decade before) is careless.

Always remember: There is no cloud just other people’s computers.

1 Like

Yep, agreed. I was thinking primarily of targeting in the sense of using my landscape/hobby shots for commercial gain rather than personal photos including people when I was talking about my photography. But even then, I suppose it does make sense to encrypt anything that’s destined for storage out of one’s control and be done with it.

Using duplicity even on large collections of data may not be that slow actually; I know it keeps meta information locally so it may be able to quickly work out which files need re-encrypting based on checksums before it starts the process.

I have used several “cloud” storage options. Currently, I use Dropbox pro (1TB for $99 a year), because it currently offers best bang for the buck, and has a fairly seamless Linux experience. Not open source, and I do have some mild concerns about security, but not enough to stop me from using it. I’ve found it to be the best at syncing across multiple computers and for collaborating with others. I currently used Dropbox to backup absolutely everything on all my of data - about 350gb. This includes all of my academic work and data sets, as well as photography and other personal things.
I have also a Google drive account for 15gb, but only use it for certain things. I use Google photos as a second “cloud” backup of my photos, but it limits to 12 megapixels or so. I upload all of my “good” photos to Flickr to back up and share in full resolution.
In the past I have used “spideroak” and “crash plan”. Both have Linux clients, and both are more foss-friendly than is Dropbox. Crash plan offers the most flexibility for backups, and is probably the most secure. It was more expensive than Dropbox for less storage, however, and is more difficult to sync across computers (you need to upgrade to the “business” version). Spideroak was good, but was difficult for collaboration via shared files and folders. If it was just me and one computer to back up, I might still be with one of these two services, but since I have other needs, it’s Dropbox for me for the foreseeable future.
Oh, and I used to use Ubuntu One cloud backup. I loved it. It was basically an open source Dropbox. But, alas, canonical killed it a few years back. :frowning:

@patdavid mentioned this, but I thought I’d link to the whole article: http://www.dpbestflow.org/backup/backup-overview

I think that article provides a very nice baseline for keeping your data backed up.

I would hesitate to call files in Dropbox, especially if the Dropbox is being synched everywhere, a backup. Dropbox has lots of redundancy on their end and certainly it is another copy of your files, but it is not a backup. What if you accidently remove all the files and don’t notice? Not that this isn’t aimed at Dropbox specifically, but any solution that is automatically syncing your files.

1 Like

I’ve used crashplan, it didn’t work out. It took lots of resources on my old box; and it stopped working for some reason; their only response was to tell me to uninstall and reinstall - and it did not work. I did some investigation, it turned out to be a GTK+ vs. Java (in which the CrashPlan client is written) compatibility issue, but they ignored my advice, and kept telling me to uninstall/reinstall.

Anyway, what I use now is Duplicati v2 (OSS software with strong encryption) + 1TB of storage from Microsoft OneDrive that comes with Office 365 (I don’t use the software, only the storage).

Locally, I store my OS and software, and some data on an SSD; large files such as photos and videos go to an HDD; that is all backed up to an external HDD using dirvish (another nice piece of OSS, that would work remotely, too, as it uses rsync - the main advantage is that it lets me keep several days of backups in a small place, since it simply links unchanged files from yesterday’s backup directory into today’s, and transfers only new/changed files). Since dirvish uses simple directories for backup, restoring is a matter of simply copying back whatever I need to.

I have something similar to @patdavid’s setup. The work computers are rsync’ed to an external 3TB USB drive. There’s a second USB drive at my parents’ house, and every time I visit I just swap. Both drives are luks-encrypted in case of robberies. The drives are btrfs-formatted, and on every successful rsync, my script makes a snapshot (having snapshots has saved me many times, and I wouldn’t call it backups without some way of going at least a little bit back in time). Since the drives are encrypted, I just backup everything from the home directories (with rsync excludes for ~/.cache and similar).

The only issue is that those 3TB are filling up, and 2x4TB is a bit expensive (and I don’t want to start raiding drives together, which will make things take more physical space and mean more things that can go wrong etc.). I’ve seen “unlimited” cloud storage providers, but 1) the first upload will take forever and 2) none of the unlimited ones provide client-side encryption and 3) they typically require some proprietary client-side program that needs to survive updates and linux support etc (there’s always something). If tarsnap comes with an unlimited option maybe, but I find that unlikely.

I have used Flickr sync (Or something like that) in the past on Windows which worked perfectly and it was a free option for up to 1 TB. I am not sure if it works still or if there is a linux equivalent to it. Flickr have made many major changes in their API over last few years and many of those 3rd party tools can’t seem to keep up.