Flickr’s Uploadr is fine for small uploads, but tends to die consistently and unpleasantly when I have several hundred photos to upload, like those from Thursday’s opening of “The boys next door”, this year’s Morris Area High School one-act. It almost always takes me several tries to get a large pool of photos uploaded, which is a pain, but not fatal. This time, however, it chose to upload them in a semi-random order, so then it died I had 80-ish photos scattered all across the show, which meant I couldn’t just delete the first K from the list and restart the upload. Ugh.
Because it was late and I was in a hurry, I ended up just uploading the whole set (over several attempts), but marked them as private so people wouldn’t end up seeing two copies of that first group of images, figuring I’d sort things out in the morning.
The morning came, and it turned out that I really didn’t have a workable plan. All the pictures were on Flickr, but there was no good (i.e., automated) way to figure out which were the duplicates. If I could identify them, then deleting the duplicates and making the rest visible would be easy, but I didn’t have a clue how to find the duplicates using Flickr’s tools.
Sigh.
This would, however, be pretty straight forward in a script if I had all the data I needed, and this is where Flickr redeemed itself. They have a very rich API for accessing (and modifying) photos and their associated information (like tags), so if I could figure out how to use that I’d be golden. I’d poked a little with some Ruby Flickr libraries in the past, but none of them ever seemed very complete and they were always struggling to stay on top of Flickr’s changes and extensions to the API. A little searching this time, however, turned up Flickraw, which uses some really nifty Ruby metaprogramming to essentially build the Ruby part of the API “on the fly”, ensuring that it will be complete and up-to-date all automagically!
It turns out that Flickraw was indeed powerful, flexible, and easy to use. After authenticating (following the example on the Flickraw web site), I was able to use it to pull down a list of all the photos from “The boys next door”
my_owner_id = "68457656@N00" play_title = "The boys next door" my_stream = flickr.photos.search( :user_id => my_owner_id, :text => play_title, :per_page => 500)
I then split that list into the initial set of publicly visible photos, and the photos I’d uploaded after things got screwy and kept private (i.e., visible only to me):
public_photos = my_stream.find_all {|photo| photo.ispublic == 1} private_photos = my_stream.find_all {|photo| photo.ispublic == 0}
My next task was to determine which of the private photos were duplicates of one of the public photos people were already looking at. All I really needed was the list of duplicates, but I decided to create lists of both the duplicates and the non-duplicates. I had to compare titles here because the Flickr IDs would be different; as far as Flickr knew they were all different photos. Happily, I had named them in a way that they each had a unique title, so if two photos had the same title, I knew they were the same shot uploaded twice.
dups = [] non_dups = [] private_photos.each do |photo| public_duplicate = public_photos.find { |pub| photo.title == pub.title } if public_duplicate dups.push(photo) else non_dups.push(photo) end end
At this point, I could apply tags to all the photos in the two groups, and all the rest of the fiddling could be done through Flickr’s web tools:
non_dups.each do |photo| flickr.photos.addTags(:photo_id => photo.id, :tags => "to_keep") end dups.each do |photo| flickr.photos.addTags(:photo_id => photo.id, :tags => "to_delete") end
I could have actually done everything with the Ruby script (delete the duplicates, change the remaining images to publicly visible, and add them to the appropriate set), but wanted to do that via Flickr so I could see what was happening as I went. And once the tags were in place, the work in Flickr was quite straightforward. The result: A set of 339 images that contains all the photos I uploaded, with no duplicates, all accomplished without deleting any of the original uploads.
Big thanks to Maël Clérambault, the author of Flickraw, for his excellent little library, and thanks to Flickr for providing this very nice set of API calls. (Now go fix Flickr Uploadr, damnit!)
As for the play – I just heard that they took second at today’s sub-sections competiton, which means they move on to sections next week, and Tom got a star performance award! Congratulations all!
Thanks for the write up, now I know where to turn to if I run in to problems!
No prob! It was nice working through something (mildly) technical like that, especially since it worked out so well.