Setting Good SmugMug Key Words

Finding good key words is not easy.  In many ways it is like creating a good book index.  Decent book indexes are carefully constructed by human readers that understand what is being indexed and why certain terms should be included. Superior indexes showcase the gems and bury the garbage. There is a lot of mediocre software out there that purports to automate this task but I remain unimpressed.  Machine indexing is like machine language translation: they both suffer from a lack of real understanding. I was reminded of how difficult indexing is while writing some code to update my SmugMug key words.

My first attempt at computing key words followed this recipe:

  1. Run my little C# SmugMug metadata dumper to update image metadata.
  2. Sift through the image metadata and extract all current key words.
  3. Remove common English words from current key words.
  4. Similarly, extract all image caption text and remove common English words.
  5. Sort the remaining caption text key words by frequency.
  6. Append the frequency sorted caption key words to corresponding current key words.
  7. Take at most seven words from the appended list as key words.

My thinking was high frequency uncommon English words would make good keys.  This is generally the case but the removal of common English words from currently assigned keys was a big mistake.

I take care when naming image files.  SmugMug picks up words in image file names and uses them as default key words.  If you have good file names you will get useful keys automatically.  Removing common English words from file names deleted words that were present for good reasons.  For example: two common English words, “before” and “after,” are used in the file names of image restorations like this picture I took in Cyprus way back in 1968.

I think “before” and “after” are perfectly good keys for this image.

To avoid such problems I now leave file name words intact and augment these words with high frequency caption text words and print size keys like: 4×5, 4×6, 5×7 and 8×10.  Print size keys is another story. You can view my entire list of SmugMug keys here.

More on SmugMug Duplicates

In my previous post I described a method for eliminating duplicate SmugMug images based on MD5’s.   This method does not get all the duplicates.

If you uploaded the same image to different galleries,  at different times,  the SmugMug MD5’s may differ.  I don’t know what  SmugMug  rolls into their MD5 ‘s but I suspect it’s more than image data.

To get all the copies you must remove duplicate picture ids ,  file names and MD5’s.  Furthermore, to maintain a pure duplicate free state,  you need to check these items often.   Now,  after uploading or rearranging pictures,  I update my SmugMug metadata and execute this J verb to insure I don’t introduce duplicates.

CheckSmugDups=:3 : 0

NB.*CheckSmugDups v-- checks duplicate SmugMug images.
NB. monad:  CheckSmugDups uuIgnore

'albums images'=. readsmugtables 0
images=. }. images [ imhead=. 0 { images

NB. images should be unique in three ways:
r=. ,: 'PID unique: '; # ~.(imhead i. <'PID') {"1 images
r=. r, 'MD5 unique: '; # ~.(imhead i. <'MD5') {"1 images
r=. r, 'FILENAME unique: '; # ~.(imhead i. <'FILENAME') {"1 images

if. 1 <# ~. ;{:"1 r do. smoutput 'duplicates present' end.

SmugMug Duplicate Image Hunting

One of the many things the developers of Thumbsplus got right was a proper normalized database schema.  When I first inspected the layout of a Thumbsplus database I knew I was in good hands.  In Thumbsplus image files get unique keys and image galleries are simply lists of image keys.  Images can appear in any number of galleries, without duplication,  just the way the gods of database design intended.

Assigning unique keys and grouping by key lists is so correct that it was a shock to discover that SmugMug,  until recently,  eschewed this principle.  Prior to a recent upgrade if you wanted to display an image in more than one gallery you had to … shudder with horror …. make copies!  Whenever I made an image copy I felt  like I was  masturbating in an art museum.

This outrage is now fixed and you can place an image in as many galleries as you want without copying.  Unfortunately there is a residual problem.  How do you hunt down and exterminate all your bogus copies?  In an acronym:  MD5.  SmugMug  assigns MD5′ s to all images.  If two MD5’s are the same there is an extremely high likelihood you are dealing with copies.  So all you have to do is find images with identical MD5’s and delete the extra copies.  The following J verb uses image tables created from the XML captured by my SmugMug metadata dumper to do just this. 

SmugDupsFrMD5=:3 : 0

NB.*SmugDupsFrMD5 v-- duplicate SmugMug images from MD5.
NB. monad:  btct =. SmugDupsFrMD5 clDirectory
NB.   SmugDupsFrMD5 'c:\pd\docs\smugmug\data\'

NB. read table files
path=.tslash y
albums=. readtd2 path,SMUGALBUMTABLE
images=. readtd2 path,SMUGIMAGETABLE
images=. }. images [ imhead=. 0 { images

NB. all duplicate MD5's
pos=. imhead i. <'MD5'
md5=. pos {"1 images
dup=. md5 #~ -. ~:md5
images=. (md5 e. dup)#images
images=. (/: pos {"1 images) { images

NB. remove images with matching smugmug pids
NB. these are proper virtual images and not copies
pos=. imhead i. <'PID'
pid=. pos {"1 images
dup=. pid #~ -. ~:pid
if. #images=. ( e. dup)#images do.

  NB. retain selected columns and insert album names
  images=. (imhead i. ;:'FILENAME GID PID MD5 ALBUMURL') {"1 images
  albums=. ((0 {"1 albums) i. 1 {"1 images){ 1 {"1 albums
  images=. albums (<a:;1)} images

  NB. group by MD5
  images=. (~:3 {"1 images) <;.1 images
  images=. >&.>@:(<"1@|:) &> images

  NB. order MD5 groups by galleries in groups
  NB. this results in a good order for editing
  NB. out the duplicates on SmugMug
  images=. (\:&.> 1 {"1 images) {&.> images
  (\: 0 {&> 1 {"1 images){images
  NB. no duplicates
  0 5$''

J source is not supported by the WordPress source code plugin so no syntax coloring for now.

Command Line C# SmugMug API Metadata Download

I have a skeleton in my photographic closet!  I enjoy hacking pictures as much as I enjoy shooting them.  Before digital photography I got my jollies the old fashioned way with chemicals:  dark room chemicals.  I still get all emotional when I remember the scent of a fixer.   Ahhh — those were the days.

Now,  instead of inhaling fumes in the dark, I hang out on picture sites:   SmugMug is my current favorite.   Over the last year I have uploaded thousands of carefully cataloged  images:  you can view them here.   I may not be much of photographer but when it comes to image metadata my anal analytic side shines.  I can EXIF, IPTC and GEOTAG with the best of them.

Because I tweak metadata online, and I suffer from a retentive character flaw,  it’s only natural that I would seek to download my sacred metadata.  This is what SmugMug’s API is for!  When I started experimenting with the SmugMug API I made the mistake of reading the documentation.  SmugMug documentation is,  at best,  a “work in progress.”  It may help but probably not!  I found trolling the web looking for code examples more productive.

To help the next SmugMug API geek I am posting a fragment of a simple command line C# metadata dump utility I put together.   The core of the program  is shown below and all the C# source is available here.  This program is to trivial to license so help yourself.

namespace SmugMugMDDumper
class Program
private const string xmlHeader = @"<?xml version=""1.0"" encoding=""UTF-8""?>";

// defaults - insert your own SmugMug apikey, password, email here
// defaults are used if corresponding command line arguments are missing
private const string apiKey = "<YOUR SMUGMUG APIKEY>";
private const string passWord = "<YOUR SMUGMUG PASSWORD";
private const string emailAddress = "<YOUR SMUGMUG EMAIL>";
private const string outFile = @"c:\temp\smugmugdata.xml";

static void Main(string[] args)
DataSet ds = new DataSet();
XmlDocument doc = new XmlDocument();
Arguments comline = new Arguments(args);
SmugmugMetaData smugmd = new SmugmugMetaData();

// parse and set any command line arguments
if (comline["help"] != null)
string __helpMsg = @"
Typical command line calls:

SmugMugMDDumper.exe -apikey:""xQDzWwLp2I1GUGli88g999VrQWN4Xz56"" -email:""youremail"" -password:""nimcompoop"" -output:""c:\test\smugdata.xml""
SmugMugMDDumper.exe -output:""d:\mystuff\smuggy.xml""
SmugMugMDDumper.exe -password:""newpassword"" -output:""c:\temp\out.xml""
SmugMugMDDumper.exe -help


string __apiKey;
if (comline["apikey"] != null) __apiKey = comline["apikey"];
else __apiKey = apiKey;

string __emailAddress;
if (comline["email"] != null) __emailAddress = comline["email"];
else __emailAddress = emailAddress;

string __passWord;
if (comline["password"] != null) __passWord = comline["password"];
else __passWord = passWord;

string __outputFile;
if (comline["output"] != null) __outputFile = comline["output"];
else __outputFile = outFile;

// start output file
smugmd.WriteToFile(xmlHeader + "<SmugMugData>", __outputFile);

// open SmugMusg session - uses https
string __sessionID = smugmd.StartSMSession(__apiKey, __emailAddress, __passWord);

// collect all galleries
ds = smugmd.GetGalleries(__sessionID, __apiKey, __outputFile);
DataTable myTable = ds.Tables[0];
DataRow myRow;

// image metadata for each gallery
smugmd.AppendToFile("<GalleryImages>", __outputFile);
int rowcnt = myTable.Rows.Count;
string rowstr = "/" + rowcnt.ToString() + "]: ";
for (int i = 0; i < rowcnt; i++)
myRow = myTable.Rows[i];
Console.WriteLine("gallery [" + (i + 1).ToString() + rowstr + (string)myRow["Title"]);
doc = smugmd.GetGalleryImages(__sessionID, __apiKey, (int)myRow["id"], __outputFile);
smugmd.AppendToFile("</GalleryImages>", __outputFile);

// complete output file - end SmugMug session
smugmd.AppendToFile("</SmugMugData>", __outputFile);
smugmd.EndSMSession(__sessionID, __apiKey);

Console.WriteLine("[Complete] output file: " + __outputFile);
catch (Exception ex)
Console.WriteLine("[Fail] SmugMug Metadata Dumper Failure - error message: " + ex.Message);