Typesetting UTF8 APL code with the LaTeX lstlisting package

UTF8 APL characters within a LaTeX lstlisting environment. Click for *.tex source code

Typesetting APL source code has always been a pain in the ass! In the dark ages, (the 1970’s), you had to fiddle with APL type-balls and live without luxuries like lower case letters. With the advent of general outline fonts it became technically possible to render APL glyphs on standard display devices provided you:

  1. Designed your own APL font.
  2. Mapped the atomic vector of your APL to whatever encoding your font demanded.
  3. Wrote WSFULL‘s of junk transliteration functions to dump your APL objects as font encoded text.

It’s a testament to either the talent, or pig headedness of APL programmers, that many actually did this. We all hated it! We still hate it! But, like an abused spouse, we kept going back for more.  It’s our fault; if we loved APL more it would stop hitting us!

When Unicode appeared APL’ers cheered — our long ASCII nightmare was ending. The more politically astute worked to include the APL characters in the Unicode standard. Hey if Klingon is there why not APL? Everyone thought it was just a matter of time until APL vendors abandoned their nonstandard atomic vectors and fully embraced Unicode. With a few notable exceptions we are still waiting. While we wait the problem of typesetting APL source code festers.

My preferred source code listing tool is the \LaTeX lstlisting package. lstlisting works well for standard ANSI source code.  I use it for J, C#, SQL, C, XML, Ocaml, Mathematica, F#, shell scripts and \LaTeX source code, i.e. everything except APL! lstlisting is an eight bit package; it will not handle arbitrary Unicode out of the box.  I didn’t know how to get around this so I handled APL by enclosing UTF8 APL text in plain \begin{verbatim} … \end{verbatim} environments. This works for XeLaTeX and LuaLaTeX but you lose all the lstlisting goodies. Then I saw an interesting tex.stackexchange.com posting about The ‘listings’ package and UTF-8. One solution to the post’s “French ligature problem” showed how to force Unicode down lstlisting‘s throat. I wondered if the same method would work for APL. It turns out that it does!

If you insert the following snippet of TeX code in your document preamble LuaLaTeX and XeLaTeX will properly process UTF8 APL text in lstlisting environments. You will need to download and install the APL385 Unicode font if it’s not on your system.  A test \LaTeX document illustrating this hack is available here. The compiled PDF is available here. As always these files can be accessed in the files sidebar.

% set lstlisting to accept UTF8 APL text
\makeatletter
\lst@InputCatcodes
\def\lst@DefEC{%
 \lst@CCECUse \lst@ProcessLetter
  ^^80^^81^^82^^83^^84^^85^^86^^87^^88^^89^^8a^^8b^^8c^^8d^^8e^^8f%
  ^^90^^91^^92^^93^^94^^95^^96^^97^^98^^99^^9a^^9b^^9c^^9d^^9e^^9f%
  ^^a0^^a1^^a2^^a3^^a4^^a5^^a6^^a7^^a8^^a9^^aa^^ab^^ac^^ad^^ae^^af%
  ^^b0^^b1^^b2^^b3^^b4^^b5^^b6^^b7^^b8^^b9^^ba^^bb^^bc^^bd^^be^^bf%
  ^^c0^^c1^^c2^^c3^^c4^^c5^^c6^^c7^^c8^^c9^^ca^^cb^^cc^^cd^^ce^^cf%
  ^^d0^^d1^^d2^^d3^^d4^^d5^^d6^^d7^^d8^^d9^^da^^db^^dc^^dd^^de^^df%
  ^^e0^^e1^^e2^^e3^^e4^^e5^^e6^^e7^^e8^^e9^^ea^^eb^^ec^^ed^^ee^^ef%
  ^^f0^^f1^^f2^^f3^^f4^^f5^^f6^^f7^^f8^^f9^^fa^^fb^^fc^^fd^^fe^^ff%
  ^^^^20ac^^^^0153^^^^0152%
  ^^^^20a7^^^^2190^^^^2191^^^^2192^^^^2193^^^^2206^^^^2207^^^^220a%
  ^^^^2218^^^^2228^^^^2229^^^^222a^^^^2235^^^^223c^^^^2260^^^^2261%
  ^^^^2262^^^^2264^^^^2265^^^^2282^^^^2283^^^^2296^^^^22a2^^^^22a3%
  ^^^^22a4^^^^22a5^^^^22c4^^^^2308^^^^230a^^^^2336^^^^2337^^^^2339%
  ^^^^233b^^^^233d^^^^233f^^^^2340^^^^2342^^^^2347^^^^2348^^^^2349%
  ^^^^234b^^^^234e^^^^2350^^^^2352^^^^2355^^^^2357^^^^2359^^^^235d%
  ^^^^235e^^^^235f^^^^2361^^^^2362^^^^2363^^^^2364^^^^2365^^^^2368%
  ^^^^236a^^^^236b^^^^236c^^^^2371^^^^2372^^^^2373^^^^2374^^^^2375%
  ^^^^2377^^^^2378^^^^237a^^^^2395^^^^25af^^^^25ca^^^^25cb%
  ^^00}
\lst@RestoreCatcodes
\makeatother

The Return of APL Fingers

APL typewriter ball (1970s)

APL typewriter ball (1970s)

I am programming in APL again after a six-year hiatus. My APL fingers are rusty but it’s amazing how deep muscle memory goes. I still know where all the beautiful APL glyphs’ hide on standard keyboards. I’ve programmed in almost a dozen programming languages but I maintain warm feelings for APL because I lost my coding virginity to her.

APL was gentle: abstract, clean, austere and so intoxicatingly elegant. I loved Iverson notation and how it manifested in what remains the most beautiful symbol set ever devised for a programming language. Blinded by passion I overlooked APL’s faults but as my ardor cooled I took notice of one glaring APL problem that persists to this day. Displaying, printing, emailing and now blogging with APL is a pain! Much has improved with the steady adoption of Unicode but even today handling APL imposes burdens. For example, without the APL385 Unicode font your browser will mangle the following APL characters.

!*+-/<=>?\^|~¨¯×÷←↑→↓∆∇∊∘∨∩∪∵∼≠≡≢≤≥⊂⊃⊖⊢⊣⊤⊥⋄⌈⌊

⌶⌷⌹⌻⌽⌿⍀⍂⍇⍈⍉⍋⍎⍐⍒⍕⍗⍙⍝⍞⍟⍡⍢⍣⍤⍥⍨⍪⍫⍬⍱⍲⍳⍴⍵⍷⍸⍺⎕◊○

By the way, if you have not installed APL385 Unicode I highly recommend downloading and installing it. You can get APL385 at VectorRegrettably installing a Unicode APL font will not fix all your APL character problems! In particular you cannot reliably:

  1. Copy and paste APL code between APL vendors or between APL and other languages.
  2. Typeset APL listings with ubiquitous \LaTeX packages like lstlisting.
  3. Post APL idioms on Twitter. Twitter’s 140 character limit is not such a big deal for APL.
  4. Print on arbitrary network printers! It’s the 21st century yet office printers are routinely limited by parochial IT policies to a small set of standard fonts.

APL’ers  almost relish these irritants after all problem solving is what APL’ers do! We don’t go crying to momma; we squash whatever is annoying us and get on with deeper problems.

Currently I am dealing with how to display APL functions on WordPress. WordPress supports a source code highlighting plug-in based on Alex Gorbatchev’s excellent Javascript SyntaxHighlighter. The WordPress plug-in produces wonderful results for well known languages like C# and by defining language specific Javascript classes you can highlight languages like APL. Eric Lescasse has done this for APL+WIN code so it is possible, (given Unicode APL fonts), to render highlighted web friendly APL code. Unfortunately WordPress does not support APL highlighting, (what a surprise), and has banned user Javascript classes on their freebie blogs. Apparently some programmers abuse JavaScript, (another surprise), and uncontrolled Javascript’ing might endanger the WordPress business model.

This leaves the users of peculiar programming languages with a problem. We can pester WordPress to support new languages or we can roll our own. I am starting a campaign to get APL and J on WordPress’s list of highlighted languages and while I am waiting for official support I will roll my own. Fortunately, highlighting code for blogs is not difficult. The much maligned MS Word (2007 and beyond) can crank out blog happy APL provided you have UTF-8 APL Unicode text to format. Getting UTF-8’ed APL is the tricky bit. Some APL systems like Dyalog directly support UTF-8 and others are planning to do so. APL+WIN cannot spit out UTF-8 but it’s not difficult to transform APL+WIN to UTF-8. The APL Wiki contains some slick APL+WIN functions to convert internal APL text to and from UTF-8.

To get a sense of why all this fuss is worthwhile consider the following APL function taken from Eugene McDonnell’s superb essay Life: Nasty, Brutish, and Short.

∇ z ← LifeKnuth w;v
v ← w
w ← w + (1 ⌽ w) + ¯1 ⌽ w
w ← w + (1 ⊖ w) + ¯1 ⊖ w
w ← w + w – v
z ← w ∊ 5 6 7

This little function is not the shortest APL Life function but in my opinion it’s the clearest and most concise description of Life’s generation rules out there. Tool of thought is not an empty APL marketing slogan. It’s the real deal!