This blog post started out as an experimental rendering of a Jupyter notebook. I wanted to see how difficult it would be to convert a notebook to a WordPress.com blog post. Even though Jupyter exports notebooks in HTML and Markdown they do not display well “out of the box.” No doubt one could craft CSS that would help but the entire point of Jupyter exports is to cut down on pointless format thrashing.
This post is a teaser. If you want to get to the source notebook follow this link to my GitHub repositories.
If you only want to read the notebook use this nbviewer link.
Why am I doing this?
My photo captions have evolved into a form of milliblogging. Milliposts (milliblog posts) are terse and tiny; many are single sentences or paragraphs. Taken one-at-a-time milliposts seldom impress but when gathered in hundreds or thousands accidental epics emerge. So, to prevent “epic loss” I want a simple way of downloading and archiving my captions off-line.
If you don’t control it you cannot trust it!
When I started blogging I knew that you could not depend on blogging websites to archive and preserve your documents. We had already seen cases of websites mangling content, shutting down without warning, and even worse, censoring bloggers. It was a classic case of, “If you don’t control it you cannot trust it.” I resolved to keep complete off-line version controlled copies of my blog posts.
Maintaining off-line copies was made easier by WordPress.com‘s excellent blog export utility. A simple button push downloads a large XML file that has all your blog posts with embedded references to images and other inclusions. XML is not my preferred archive format. I am a huge fan of LaTeX and Markdown: two text formats that are directly supported in Jupyter Notebooks. I wrote a little system that parses the WordPress XML file and generates LaTeX and Markdown files. Yet, despite milliblogging long before blogging, I don’t have a similar system for downloading and archiving SmugMug metadata. This Jupyter notebook addresses this omission and shows how you can use Python and the SmugMug API to extract gallery and image metadata and store it in version controlled local directories as CSV files.
Finding good key words is not easy. In many ways it is like creating a good book index. Decent book indexes are carefully constructed by human readers that understand what is being indexed and why certain terms should be included. Superior indexes showcase the gems and bury the garbage. There is a lot of mediocre software out there that purports to automate this task but I remain unimpressed. Machine indexing is like machine language translation: they both suffer from a lack of real understanding. I was reminded of how difficult indexing is while writing some code to update my SmugMug key words.
My first attempt at computing key words followed this recipe:
- Run my little C# SmugMug metadata dumper to update image metadata.
- Sift through the image metadata and extract all current key words.
- Remove common English words from current key words.
- Similarly, extract all image caption text and remove common English words.
- Sort the remaining caption text key words by frequency.
- Append the frequency sorted caption key words to corresponding current key words.
- Take at most seven words from the appended list as key words.
My thinking was high frequency uncommon English words would make good keys. This is generally the case but the removal of common English words from currently assigned keys was a big mistake.
I take care when naming image files. SmugMug picks up words in image file names and uses them as default key words. If you have good file names you will get useful keys automatically. Removing common English words from file names deleted words that were present for good reasons. For example: two common English words, “before” and “after,” are used in the file names of image restorations like this picture I took in Cyprus way back in 1968.
I think “before” and “after” are perfectly good keys for this image.
To avoid such problems I now leave file name words intact and augment these words with high frequency caption text words and print size keys like: 4×5, 4×6, 5×7 and 8×10. Print size keys is another story. You can view my entire list of SmugMug keys here.
I have a skeleton in my photographic closet! I enjoy hacking pictures as much as I enjoy shooting them. Before digital photography I got my jollies the old fashioned way with chemicals: dark room chemicals. I still get all emotional when I remember the scent of a fixer. Ahhh — those were the days.
Now, instead of inhaling fumes in the dark, I hang out on picture sites: SmugMug is my current favorite. Over the last year I have uploaded thousands of carefully cataloged images: you can view them here. I may not be much of photographer but when it comes to image metadata my anal analytic side shines. I can EXIF, IPTC and GEOTAG with the best of them.
Because I tweak metadata online, and I suffer from a retentive character flaw, it’s only natural that I would seek to download my sacred metadata. This is what SmugMug’s API is for! When I started experimenting with the SmugMug API I made the mistake of reading the documentation. SmugMug documentation is, at best, a “work in progress.” It may help but probably not! I found trolling the web looking for code examples more productive.
To help the next SmugMug API geek I am posting a fragment of a simple command line C# metadata dump utility I put together. The core of the program is shown below and all the C# source is available here. This program is to trivial to license so help yourself.
private const string xmlHeader = @"<?xml version=""1.0"" encoding=""UTF-8""?>";
// defaults - insert your own SmugMug apikey, password, email here
// defaults are used if corresponding command line arguments are missing
private const string apiKey = "<YOUR SMUGMUG APIKEY>";
private const string passWord = "<YOUR SMUGMUG PASSWORD";
private const string emailAddress = "<YOUR SMUGMUG EMAIL>";
private const string outFile = @"c:\temp\smugmugdata.xml";
static void Main(string args)
DataSet ds = new DataSet();
XmlDocument doc = new XmlDocument();
Arguments comline = new Arguments(args);
SmugmugMetaData smugmd = new SmugmugMetaData();
// parse and set any command line arguments
if (comline["help"] != null)
string __helpMsg = @"
Typical command line calls:
SmugMugMDDumper.exe -apikey:""xQDzWwLp2I1GUGli88g999VrQWN4Xz56"" -email:""youremail"" -password:""nimcompoop"" -output:""c:\test\smugdata.xml""
SmugMugMDDumper.exe -password:""newpassword"" -output:""c:\temp\out.xml""
if (comline["apikey"] != null) __apiKey = comline["apikey"];
else __apiKey = apiKey;
if (comline["email"] != null) __emailAddress = comline["email"];
else __emailAddress = emailAddress;
if (comline["password"] != null) __passWord = comline["password"];
else __passWord = passWord;
if (comline["output"] != null) __outputFile = comline["output"];
else __outputFile = outFile;
// start output file
smugmd.WriteToFile(xmlHeader + "<SmugMugData>", __outputFile);
// open SmugMusg session - uses https
string __sessionID = smugmd.StartSMSession(__apiKey, __emailAddress, __passWord);
// collect all galleries
ds = smugmd.GetGalleries(__sessionID, __apiKey, __outputFile);
DataTable myTable = ds.Tables;
// image metadata for each gallery
int rowcnt = myTable.Rows.Count;
string rowstr = "/" + rowcnt.ToString() + "]: ";
for (int i = 0; i < rowcnt; i++)
myRow = myTable.Rows[i];
Console.WriteLine("gallery [" + (i + 1).ToString() + rowstr + (string)myRow["Title"]);
doc = smugmd.GetGalleryImages(__sessionID, __apiKey, (int)myRow["id"], __outputFile);
// complete output file - end SmugMug session
Console.WriteLine("[Complete] output file: " + __outputFile);
catch (Exception ex)
Console.WriteLine("[Fail] SmugMug Metadata Dumper Failure - error message: " + ex.Message);