Building on that, you can get those elements in a shell (if jq
is installed) via:
curl "https://raw.pixls.us/json/getrepository.php?set=all" | jq '.data[][7]'
And if you have lynx
installed, you can build on my previous command to easily and accurately extract the URLs via:
curl "https://raw.pixls.us/json/getrepository.php?set=all" | jq '.data[][7]' | lynx -stdin -dump -listonly -nonumbers
I eventually hacked this crappy code together to perform the downloads:
#!/usr/bin/env sh
set -eu
wget -O- "https://raw.pixls.us/json/getrepository.php?set=all" | jq '.data[][7]' | lynx -stdin -dump -listonly -nonumbers | uniq > images.txt &&\
wget -nc -i images.txt
It downloads the list as a JSON file and grabs the RAW file URLs, then saves the file out as images.txt
. Then, it downloads each file one by one from the contents of the text file, taking care not to re-download anything already present.
Ideally, I would have used wget
's -N
flag to check file dates and prevent re-downloading in a more intelligent manner, but unfortunately, every file gets the current date time. This is (I guess) caused by a bug in getfile.php
which should be using the file’s actual stats.
Ideally, we need another API with directly useable data like:
[
{
"manufacturer": "Canon",
"model": "EOS 7D",
"type": "sRAW2",
"ratio": "3:2",
"fileSize": "17.92",
"license": "Creative Commons 0 - Public Domain",
"licenseUrl": "https://creativecommons.org/publicdomain/zero/1.0/",
"created_at": "2016-12-29",
"added_at": "2016-12-29",
"updated_at": "2016-12-29",
"url": "https://raw.pixls.us/getfile.php/129/nice/Canon - EOS 7D - sRAW2 (sRAW) (3:2).CR2",
"checksum": "9a32e26509c5c7b3346c27a2135d2b8c2e37ba1c",
"metadataUrl": "https://raw.pixls.us/getfile.php/129/exif/RAW_CANON_EOS_7D-sraw.CR2.exif.txt"
},
{
...
}
]