The Rapid Photo Downloader 0.9 process model

For those who are into programming or just curious, this is a model of OS-level processes that Rapid Photo Downloader uses when it runs:

Rapid Photo Downloader is written in Python. To get the best performance using Python on modern multi-core computers, I designed the program to use multiple OS-level processes that communicate with each other using the messaging library 0MQ.

Each box in the diagram represents one and sometimes more OS-level processes. The dashed lines emanating from the main process indicate that the process being pointed to is typically short-lived. For example, the Scan Device process exists only as long as it takes to scan a device. As soon as the scan is finished, the Scan Device process exits. Because the program can scan multiple devices at one time, the diagram shows two lines going to the Scan Device process. To keep the diagram simple, there are only two lines to the Scan Device process, but in reality when the program runs the number of scan processes running at any one time will match the number of unique devices being scanned.

Solid lines between the processes indicate that the process runs for the duration that Rapid Photo Downloader is being run. As soon as the program is started, Rapid Photo Downloader starts one process each to take care of renaming files and generating subfolder names, another for the load balancer for extracting thumbnails, two to four thumbnail extractor processes, and a helper process onto which computationally expensive tasks are offloaded from the main GUI process.

To keep the code as simple as possible, I designed the program so that different units of the program’s code handle specific tasks. For example, the Copy Files process does only one thing: copy photos, videos and associated files (XMP file, WAV file etc.) from a device (camera, memory card, hard drive etc.) onto the computer, and that’s it. It doesn’t know anything about backing up the files or generating thumbnails, for example. The Rename Files process generates file and folder names, renames the files that have just been copied and creates subfolders to put them in. The Backup Files process copies files that have just been copied and renamed from the computer to a backup device. Each backup device has its own backup process. For instance if you have two external drives you’re backing up to, there will be two backup processes running as long as those drives remain plugged in.

Although it’s probably not obvious from the diagram at first glance, when downloading simultaneously from multiple devices (e.g. two memory cards), a process for each device is used to copy the files onto the computer, but only one process renames them. That keeps the name generation code as simple as possible. Given aspects of generating file names can actually be fairly complex, especially with respect to sequence numbers, the simpler the overall renaming series of steps is, the better.

You might have already guessed from the diagram that the thumbnailing part of the program is the most complex. It’s complex because I carefully optimized it for speed. Thanks to its load balancer and multiple processes, Rapid Photo Downloader can extract and resize thumbnails from one file’s metadata while it is simultaneously copying another file’s metadata. Consider for instance generating thumbnails for photos on a DSLR. Getting the file’s metadata requires the program calling libgphoto2 to copy the file’s metadata from the camera onto the computer. So Rapid Photo Downloader dedicates a process to doing only that (the Thumbnailer box in the diagram). All that process does is check the if the file already has a thumbnail in Rapid Photo Downloader’s thumbnail cache, and if it doesn’t, copies the photos’ metadata which contains the photo’s thumbnails and then sends a message to the load balancer that it can assign one of the Thumbnail Extractors the task of extracting the photo’s thumbnail. The Thumbnail Extractor process simply extracts and resizes thumbnails from metadata (if it can) or by rendering it from the file itself if required. Because there are at least two and up to four extractors running at any one time, the extractors can work on multiple files simultaneously. The extractors always work on metadata or actual files that are already on the computer, whether in memory or on disk. They don’t need to know whether the file just came from a camera or from a hard drive or memory card. They have a specific task and that’s all they work on.

In case you’re wondering about the purpose of the Thumbnail Daemon process, that’s used to generate thumbnails for the GUI after a file has been downloaded (which happens when auto-downloading is activated) and also to generate thumbnails, which are used in programs like Nautilus and other file browsers. thumbnails come in two sizes, 128x128 and 256x256. The latter size is larger than the 160x120 thumbnail found in a photo’s metadata. Nowadays all DSLR cameras output JPEG and RAW files with several embedded thumbnails, the largest of which is full size for RAW files. However large embedded thumbnails are much slower to extract and resize than the 160x120 thumbnail, so Rapid Photo Downloader puts off doing that until it absolutely has to. It has no need to do that while generating thumbnails for the GUI, so the Thumbnailer process doesn’t do that for DSLR photos, for instance. But the program does need to extract those large embedded thumbnails while generating an up to 256x256 thumbnail for the cache for RAW files, so it does that after the file is download onto the computer, using the Thumbnail Daemon process, which like the Thumbnailer passes the task of extracting the thumbnail from the metadata onto the load balancer, which it turn passes that task onto the extractors.

Admittedly Rapid Photo Downloader’s design does not suit itself to all photo & video download situations. For some very simple computers that have only one core and don’t have much memory, the multi-process setup is ridiculously complex. Moreover, even on modern multi-core computers, because to run the program involves starting several processes at once, version 0.9 starts up slower than previous versions.

However the program’s design comes into its own when dealing with thousands of images at a time, or when downloading from or backing up to multiple devices. Some photographers come back to their computer with tens or even hundreds of thousands of images at a time, e.g. time lapse photographers. For them it can take many minutes just to generate the thumbnails. Even if you don’t download thousands of images at a time, you might notice that version 0.9 generates thumbnails considerably more quickly than previous versions.

This post is already getting long, so I won’t go into the details of the 0MQ messaging, which is what the Receive and Send section of each box refers to. For those who are interested, what those terms mean is described in book-level detail in Pieter Hintjens’ marvellous 0MQ Guide. Basically the Rapid Photo Downloader processes communicate with each other by sending messages such as “work on this task” or “I’ve finished working on this task”.

TCP/IP is used to exchange the 0MQ messages, which means one day in the future Rapid Photo Downloader can be extended to have its various components running on separate devices over a network. Imagine for example a Rapid Photo Downloader scan process running on a smart phone and the GUI process on your desktop communicating with it over your local network.

If the Rapid Photo Downloader code interests you and you like to code, please feel free to contribute to it! I’m the only person working on the project, but more hands on deck are definitely welcome.


Great write up!

I am not a developer, but it was still cool to hear about what goes on under the hood :slight_smile:

That was very informative. Thanks a lot!

Wow… Thanks for the post! I’m going to have to reread it five or six more times before comprehend half of it. But I certainly have a renewed appreciation! Thanks so much for all your efforts!