For the last few weeks, my normal hobbies have been preempted by my obsessive refactoring of an old SmugMug keyword setting program. Many years ago I wrote a small application that helped me maintain meaningful SmugMug keywords. I used this application for years until inevitable software changes broke it. Replacing it with something better has been on my to-do list for ages.
I kept putting it off because I knew the tools I used in the past, mostly J and VB, were not well suited to REST API programming. Yes, I know just about any programming tool can be forced to serve, Turning completeness guarantees it, but I have grown weary of programming to prove tiresome points. If a tool makes something more difficult or tedious than it should be then change tools!
I eventually decided to go with Python. I’ve always found Python code more readable than many other programming languages. This is a widespread opinion. Python is free, open source, and comes with a large diverse set of tools. One of the best tools to come out of the Python world is Jupyter. Jupyter is the first public domain open source literate programming tool that has gained a large following.
I’ve been a fan of literate programming ever since I read Knuth’s seminal book. He used the technique to create some of the best program documentation ever created. I always wondered why literate programming never caught on. I suspected the basic problem was simply that many programmers are not particularly literate. Well, Jupyter is proving me wrong.
Jupyter is certainly helping me write and program with such clarity that all my ideas seem trivial.
Compare this notebook, (use the first link for the best layout), to my earlier blog post about setting SmugMug print size keywords.
- Nbviewer: Setting SmugMug Print Size and Geotag Keywords with Jupyter and Python
- GitHub: Setting SmugMug Print Size and Geotag Keywords with Jupyter and Python
- Be highly suspicious of people who claim to fully understand any programming language. Only delusional nitwits would make such a claim for any natural language. Does anybody, even luminaries like Shakespeare, truly understand English? We all sort-of-know our mother tongues and if we’re honest, we’re continually surprised by how others make use of it. Literature, it’s a thing. The same holds for programming languages. Every day I’m surprised, by unusual, stupid, silly, clever and freaking brilliant code fragments in programming languages that I have used for decades.↩
This blog post started out as an experimental rendering of a Jupyter notebook. I wanted to see how difficult it would be to convert a notebook to a WordPress.com blog post. Even though Jupyter exports notebooks in HTML and Markdown they do not display well “out of the box.” No doubt one could craft CSS that would help but the entire point of Jupyter exports is to cut down on pointless format thrashing.
This post is a teaser. If you want to get to the source notebook follow this link to my GitHub repositories.
If you only want to read the notebook use this nbviewer link.
Why am I doing this?
My photo captions have evolved into a form of milliblogging. Milliposts (milliblog posts) are terse and tiny; many are single sentences or paragraphs. Taken one-at-a-time milliposts seldom impress but when gathered in hundreds or thousands accidental epics emerge. So, to prevent “epic loss” I want a simple way of downloading and archiving my captions off-line.
If you don’t control it you cannot trust it!
When I started blogging I knew that you could not depend on blogging websites to archive and preserve your documents. We had already seen cases of websites mangling content, shutting down without warning, and even worse, censoring bloggers. It was a classic case of, “If you don’t control it you cannot trust it.” I resolved to keep complete off-line version controlled copies of my blog posts.
Maintaining off-line copies was made easier by WordPress.com‘s excellent blog export utility. A simple button push downloads a large XML file that has all your blog posts with embedded references to images and other inclusions. XML is not my preferred archive format. I am a huge fan of LaTeX and Markdown: two text formats that are directly supported in Jupyter Notebooks. I wrote a little system that parses the WordPress XML file and generates LaTeX and Markdown files. Yet, despite milliblogging long before blogging, I don’t have a similar system for downloading and archiving SmugMug metadata. This Jupyter notebook addresses this omission and shows how you can use Python and the SmugMug API to extract gallery and image metadata and store it in version controlled local directories as CSV files.