Bash script as rapid photo downloader

For those who (from a terminal window) want to copy files from a card (perhaps removed from camera and plugged into a USB card reader) to the local directory on Linux, the “cp” command is annoyingly slow. dd is faster.

I find the following useful. Does anyone have anything faster yet? Or slicker? More concise? For copying images directly from the keyboard with bash?

This script does assume the card will be mounted under /media/$USER, which has always been the case for me.

 #!/bin/bash 
 
 SUFF=NEF  ## edit to file suffix you use. This for files like PIC_1234.NEF

 #or send the desired suffix in as the first argument to the script 
 if [[  -n  $1  ]] 
 then 
        SUFF=${1} 
 fi 

 ##the ugly sed below hacks around spaces in directory names 
 for file in `find /media/$USER  -name "*$SUFF" | sed 's/\ /0237/'` 
 do 
          ffile=`echo "$file" | sed 's/0237/\ /'` 
          IFS=/ arr=( $file ) 
          base=${arr[-1]} 
          dd if="$ffile" of=$base 2>/dev/null
 done

That is extremely strange that there is a noticeable difference between cp and dd, especially since you’re not specifying a block size to dd, which on most implementations defaults the block size to 1, which is HORRIBLE for performance.

What distro are you on that has a dd version which doesn’t default to bs=1 ?

Interesting. I’ll look into it. I’ve been using this script with cp for years. I changed to dd and it got noticeably faster. I’m heading out to my (boat) shop now. I’ll get back to it.

I thought cp does buffering that dd doesn’t do, so when it ran faster that made sense to me.

Was rsync an option (speedwise)? I always default to rsync from paranoia of interrupted transfers for whatever reason.

1 Like

I use an exiftool one liner for this. It renames files according to metadata as well. I can share it a bit later on.

OK I’m an idiot. Well I’m old so I have an excuse.

The “cp” script I have used for years echos the name of each file as it copies.

The “dd” version pipes all terminal output to /dev/null, so on a card with a zillion files it makes a huge time difference. But it has nothing to do with cp vs dd.

I stand chastised and humiliated. By myself. It is a useful script. It can be made to work with cp or dd

Hah! Yeah, terminal output can sometimes have surprisingly negative consequences on performance…

Always wondered what the problem is. I use mv to move files from the card to the HDD (less paranoid than some, actually mv across filesystems is safe). Just measured 77MB/s in the integrated card reader of my PC. Given that the card (Sandisk 32GB Extreme Pro) is rated 95MB/s and the target is a LUKS-encrypted HDD, I doubt I can get much faster anyway.

Hi Andy,

What distro do you use that has a bs default = 1?
Manjaro == Debian == Gentoo == bs default = 512.

Have fun!
Claes in Lund, Sweden

Hmm, maybe it has changed in the 5-6 years since I last made sure to always use a MUCH larger number.

Even 512 is honestly too small if you want decent performance. For multi-megabyte files I usually make it bs=1M

Hi @pittendrigh since you are asking about performance in another discussion thread, you are welcome to benchmark your script against Rapid Photo Downloader and share the results here. Do keep in mind that Rapid Photo Downloader does other things on the system as it downloads, like create thumbnails for use in programs like Gnome Files. If you do benchmark, be sure to share details about your CPU and storage media, because they make a difference (Rapid Photo Downloader uses multiple cores and can write to multiple destinations in parallel).

Ok. Will do.

Rapid Photo Downloader is cool software. I try to do as much as possible with the keyboard (thereby avoiding menu manipulation with a mouse). But not always. I started off with Borland C++ in the early 1990s and ended up with vi in a terminal window, 20 years later.

I believe you can operate RPD from the command line.

rapid-photo-downloader --help …now that is cool. Thank you

1 Like

That’s funny. I started with Borland Turbo Pascal for DOS (which was not cheap!), but my first job coding used vi (and word perfect for Unix). These days I use PyCharm, which is an impressive editor, even on a year 2010 Thinkpad. I’ve pretty much forgotten vi. Except for dd, for some reason!

Borland C++. I’m not trying to make this a sparring match. The only contests I’m guaranteed to win are about age and perhaps stubborness. I got my cs degree in 1995. When i was 45. After 23 years swinging a hammer. And rough necking in the oil fields. I’m comfortably obsolete now.


damonlynch

    August 19

pittendrigh:
I started off with Borland C++ in the early 1990s and ended up with vi in a terminal window, 20 years later.

That’s funny. I started with Borland Turbo Pascal for DOS (which was not cheap!), but my first job coding used vi (and word perfect for Unix). These days I use PyCharm, which is an impressive editor, even on a year 2010 Thinkpad. I’ve pretty much forgotten vi. Except for dd, for some reason!


Visit Topic or reply to this email to respond.


In Reply To


pittendrigh

      [Colin Pittendrigh](https://discuss.pixls.us/u/pittendrigh)




    August 19

Ok. Will do. Rapid Photo Downloader is cool software. I try to do as much as possible with the keyboard (thereby avoiding menu manipulation with a mouse). But not always. I started off with Borland C++ in the early 1990s and ended up with vi in a terminal window, 20 years later.


Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.

Interesting… Multiple cores??? For a file transfer program???

That seems like a great recipe for I/O thrashing to me? Maybe not a huge issue for flash memory as source media, but unless you’re doing a really oddball variation of RAID striping (one file to drive A, one to drive B), it sounds to me like a recipe for making the heads of a rustspinner bounce back and forth constantly, killing transfer rates?

I haven’t seen a file transfer operation be CPU-bound since the PIO IDE days… What am I missing?

(Edit: I assume you’re primarilly using multicore for thumbnail generation?)

Read this: Under the Hood of Rapid Photo Downloader
That should answer your questions, but if not, feel free to ask me here.

One thing I may not have mentioned in that document is that libgphoto2 is not multithreaded, meaning to access multiple cameras in parallel, multiple processes are an absolute necessity.

Also keep in mind the program can be configured to back up as it downloads too, and people can have multiple backup devices.

Aaah. Interesting. Are there defaults that attempt to prevent thrashing cases? (e.g. multiple files from a source device simultaneously, multiple files on a single destination device simultaneously)?

One source multiple simultaneous dests (e.g. the multiple-backup use case you describe) would probably be one situation where multicore outside of thumbnailing would be beneficial. Overall it appears the primary multicore use case is thumbnailing?

(In case you haven’t guessed, I’ve experienced severe system degradation due to I/O thrashing many times.)

If you look at the diagram, you will see that there is one copy process per device. So that’s impossible.

When downloading from multiple devices to the computer itself (not backup drives), that’s possible. But it’s normally not an issue because copying from external devices is normally much slower than copying to the hard drive / SSD on the computer.

No, not at all. The Point behind using multiple processes is (1) to make the program easier to code and maintain, and (2) for performance. Python has a global interpreter lock. A multithreaded Python program uses only one core, which is not good now that most computers have four or more cores. The main Rapid Photo Downloader process draws the GUI, some housekeeping, and coordinates everything else. The main work is done in a bunch of helper processes whose task is to solve a particular problem in coordination with other processes.

Consider the timeline. Generating that is computationally expensive. Some people download several hundred thousand files at a time from multiple devices (e.g. time-lapse photographers). The GUI has to be responsive while the timeline is being generated, so timeline generation is done in its own process.

The multi-process approach is elegant and fast.