Typesetting UTF8 APL code with the LaTeX lstlisting package

UTF8 APL characters within a LaTeX lstlisting environment. Click for *.tex source code

Typesetting APL source code has always been a pain in the ass! In the dark ages, (the 1970’s), you had to fiddle with APL type-balls and live without luxuries like lower case letters. With the advent of general outline fonts it became technically possible to render APL glyphs on standard display devices provided you:

  1. Designed your own APL font.
  2. Mapped the atomic vector of your APL to whatever encoding your font demanded.
  3. Wrote WSFULL‘s of junk transliteration functions to dump your APL objects as font encoded text.

It’s a testament to either the talent, or pig headedness of APL programmers, that many actually did this. We all hated it! We still hate it! But, like an abused spouse, we kept going back for more.  It’s our fault; if we loved APL more it would stop hitting us!

When Unicode appeared APL’ers cheered — our long ASCII nightmare was ending. The more politically astute worked to include the APL characters in the Unicode standard. Hey if Klingon is there why not APL? Everyone thought it was just a matter of time until APL vendors abandoned their nonstandard atomic vectors and fully embraced Unicode. With a few notable exceptions we are still waiting. While we wait the problem of typesetting APL source code festers.

My preferred source code listing tool is the \LaTeX lstlisting package. lstlisting works well for standard ANSI source code.  I use it for J, C#, SQL, C, XML, Ocaml, Mathematica, F#, shell scripts and \LaTeX source code, i.e. everything except APL! lstlisting is an eight bit package; it will not handle arbitrary Unicode out of the box.  I didn’t know how to get around this so I handled APL by enclosing UTF8 APL text in plain \begin{verbatim} … \end{verbatim} environments. This works for XeLaTeX and LuaLaTeX but you lose all the lstlisting goodies. Then I saw an interesting tex.stackexchange.com posting about The ‘listings’ package and UTF-8. One solution to the post’s “French ligature problem” showed how to force Unicode down lstlisting‘s throat. I wondered if the same method would work for APL. It turns out that it does!

If you insert the following snippet of TeX code in your document preamble LuaLaTeX and XeLaTeX will properly process UTF8 APL text in lstlisting environments. You will need to download and install the APL385 Unicode font if it’s not on your system.  A test \LaTeX document illustrating this hack is available here. The compiled PDF is available here. As always these files can be accessed in the files sidebar.

% set lstlisting to accept UTF8 APL text
\makeatletter
\lst@InputCatcodes
\def\lst@DefEC{%
 \lst@CCECUse \lst@ProcessLetter
  ^^80^^81^^82^^83^^84^^85^^86^^87^^88^^89^^8a^^8b^^8c^^8d^^8e^^8f%
  ^^90^^91^^92^^93^^94^^95^^96^^97^^98^^99^^9a^^9b^^9c^^9d^^9e^^9f%
  ^^a0^^a1^^a2^^a3^^a4^^a5^^a6^^a7^^a8^^a9^^aa^^ab^^ac^^ad^^ae^^af%
  ^^b0^^b1^^b2^^b3^^b4^^b5^^b6^^b7^^b8^^b9^^ba^^bb^^bc^^bd^^be^^bf%
  ^^c0^^c1^^c2^^c3^^c4^^c5^^c6^^c7^^c8^^c9^^ca^^cb^^cc^^cd^^ce^^cf%
  ^^d0^^d1^^d2^^d3^^d4^^d5^^d6^^d7^^d8^^d9^^da^^db^^dc^^dd^^de^^df%
  ^^e0^^e1^^e2^^e3^^e4^^e5^^e6^^e7^^e8^^e9^^ea^^eb^^ec^^ed^^ee^^ef%
  ^^f0^^f1^^f2^^f3^^f4^^f5^^f6^^f7^^f8^^f9^^fa^^fb^^fc^^fd^^fe^^ff%
  ^^^^20ac^^^^0153^^^^0152%
  ^^^^20a7^^^^2190^^^^2191^^^^2192^^^^2193^^^^2206^^^^2207^^^^220a%
  ^^^^2218^^^^2228^^^^2229^^^^222a^^^^2235^^^^223c^^^^2260^^^^2261%
  ^^^^2262^^^^2264^^^^2265^^^^2282^^^^2283^^^^2296^^^^22a2^^^^22a3%
  ^^^^22a4^^^^22a5^^^^22c4^^^^2308^^^^230a^^^^2336^^^^2337^^^^2339%
  ^^^^233b^^^^233d^^^^233f^^^^2340^^^^2342^^^^2347^^^^2348^^^^2349%
  ^^^^234b^^^^234e^^^^2350^^^^2352^^^^2355^^^^2357^^^^2359^^^^235d%
  ^^^^235e^^^^235f^^^^2361^^^^2362^^^^2363^^^^2364^^^^2365^^^^2368%
  ^^^^236a^^^^236b^^^^236c^^^^2371^^^^2372^^^^2373^^^^2374^^^^2375%
  ^^^^2377^^^^2378^^^^237a^^^^2395^^^^25af^^^^25ca^^^^25cb%
  ^^00}
\lst@RestoreCatcodes
\makeatother

More on Kindle Oriented LaTeX

I’ve been compiling \LaTeX PDFs for the Kindle. If you like \LaTeX typefaces, especially mathematical fonts, you’ll love how they render on the Kindle. It’s a good thing because you won’t like the Kindle’s cramped page dimensions. For simple flow-able text this isn’t a big deal but for complex \LaTeX documents it is!

There are two basic \LaTeX \Longrightarrow Kindle  workflows.

  1. Convert your \LaTeX to HTML and then convert the HTML to mobi.
  2. Compile your \LaTeX for Kindle page dimensions.

For simple math and figure free documents mobi is the best choice because it’s a native Kindle format. You will be able to re-flow text and change font sizes on the fly. There are many \LaTeX to HTML converters. This is a good summary of your options. You can also find a variety of HTML to mobi converters. I’ve used Auto Kindle; it’s slow but produces decent results.

Compiling \LaTeX for Kindle page dimensions is more work. First decide what works best for your document: landscape or portrait. Portrait is the Kindle default but I’ve found that landscape is better for math and figure rich documents. You can flip back and forth between landscape and portrait on the Kindle but it will not re-paginate PDFs. Of course with mobi this is no problemo!

After choosing a basic layout expunge all hard-coded lengths from your source *.tex files. Replace all fixed lengths with relative page lengths. For example, 4in might become 0.75\textwidth. If you have hundreds of figures and images to adjust write a little program to replace fixed lengths. I did this while preparing a Kindle version of Hilbert’s Foundations of Geometry.

The next hurdle to overcome is the Kindle’s blase attitude about length units. \LaTeX is extremely precise: an inch is an inch to six decimals. This is not the case on the Kindle! You will have to load your PDFs on the Kindle and inspect margins for text overflows. Be prepared for a few rounds of page dimension tweaking! For more details about preparing \LaTeX source check out LaTeX Options for Kindle.

Finally, after you have compiled your PDF and loaded it on your Kindle, there are some Kindle options you should set to optimize your PDF reading experience. My next post will walk you through setting these options.

The following *.tex file loads packages that are useful for Kindle sizing. It also shows how to print out \LaTeX dimensions with the printlen package.

% A simple test document that displays some packages and settings
% that are useful when compiling LaTeXe documents for the Kindle.
% Compile with pdflatex or xelatex.
%
% Tested on MikTeX 2.9
% July 22, 2011

\documentclass[12pt]{article}

% included graphics in immediate subdirectory
\usepackage{graphicx}
\graphicspath{{./image/}}

% extended coloring
\usepackage[usenames,dvipsnames]{color}

% hyperref link colors are chosen to display
% well on Kindle monochrome devices
\usepackage[colorlinks, linkcolor=OliveGreen, urlcolor=blue,
            pdfauthor={your name}, pdftitle={your title},
            pdfsubject={your subject},
            pdfcreator={MikTeX+LaTeXe with hyperref package},
            pdfkeywords={your,key,words},
            ]{hyperref}

\usepackage{breqn}         % automatic equation breaking
\usepackage{microtype}     % microtypography, reduces hyphenation

% kindle page geometry (no page numbers)
%\usepackage[papersize={3.6in,4.8in},hmargin=0.1in,vmargin={0.1in,0.1in}]{geometry}

% portrait kindle page geometry space reserved for page numbers
\usepackage[papersize={3.6in,4.8in},hmargin=0.1in,vmargin={0.1in,0.255in}]{geometry}

% landscape geometry
%\usepackage[papersize={4.8in,3.6in},hmargin={0.1in,0.18},vmargin={0.1in,0.255in}]{geometry}

% headers and footers
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhead{}            % clear page header
\fancyfoot{}            % clear page footer

\setlength{\abovecaptionskip}{2pt} % space above captions
\setlength{\belowcaptionskip}{0pt} % space below captions
\setlength{\textfloatsep}{2pt}     % space between last top float or first bottom float and the text
\setlength{\floatsep}{2pt}         % space left between floats
\setlength{\intextsep}{2pt}        % space left on top and bottom of an in-text float

% print LaTeX dimensions
\usepackage{printlen}

% reduces footer text separation adjusted for page numbers
\setlength{\footskip}{14pt}

% scales down page number font size if document is at 12pt -> page numbers 10 pt
\renewcommand*{\thepage}{\footnotesize\arabic{page}}

\begin{document}

The \verb|\textwidth| is \printlength{\textwidth} which is also
\uselengthunit{in}\printlength{\textwidth} and
\uselengthunit{mm}\printlength{\textwidth}.

\uselengthunit{pt}
The \verb|\textheight| is \printlength{\textheight} which is also
\uselengthunit{in}\printlength{\textheight} and
\uselengthunit{mm}\printlength{\textheight}.

\end{document}

Open Source Hilbert for the Kindle

David Hilbert

David Hilbert

While searching for free Kindle books I found Project Gutenberg. Project Gutenberg offers free Kindle books but they also have something better! Would you believe \LaTeX source code for some mathematical classics.

The best book I’ve found so far is an English translation of David Hilbert’s Foundations of Geometry. Hilbert’s Foundations exposed some flaws in the ancient treatment of Euclidean geometry and recast the subject with modern axioms. Because it is relatively easy to follow, compared to Hilbert’s more recondite publications, this little book exercised disproportionate influence on 20th century mathematics. We still see its style aped, but rarely matched, in mathematics texts today.

I couldn’t resist the temptation of compiling a mathematical classic so I eagerly downloaded the source and ran it through \LaTeX.  Foundations compiled without problems and generated a nice letter-sized PDF. Letter-size is fine but I was looking for free Kindle books! I decided to invest a little energy modifying the source to produce a Kindle version. Project Gutenberg makes it clear that we are free to modify the source. Isn’t open source wonderful!

Converting Foundations was simple. The main \LaTeX file included 52 *.png illustrations with hard-coded widths in \includegraphics commands. I wrote a J script that converted all these fixed widths to relative \textwidth‘s. This lets \LaTeX automatically resize images for arbitrary page geometries. When compiled with Kindle page dimensions this fixed most of the illustrations. I had to tweak a few wragfig‘s to better typeset images surrounded by text. The result is a very readable Kindle oriented PDF version of Hilbert’s book. There are still a few problems. The Table of Contents is a plain tabular that does not wrap well and one table rolls off the right Kindle margin. Neither of these deficiencies seriously impair the readability of the text.  If these defects annoy you download the Project Gutenberg source with my modifications and build your own version.

This little experiment convinced me that providing free classic books, in source code form, is a service to mankind.  Not only does it allow you to “publish” classics on new media it also fundamentally changes your attitude toward books. Hilbert was one of the great mathematical geniuses of the 19th and 20th century. It’s hard to suppress we are not worthy moments and maintain a sharp critical eye when reading his “printed” works.  You don’t get the same vibe when reading raw \LaTeX.  Source code puts you in a, it’s just another bug infested program, frame of mind. You expect errors in code and you typically find them. This is exactly the hard-nosed attitude you need when reading mathematics.

Soon we will all be Software Archeologists

One of my pet peeves is the ridiculously short lifetimes of digital media.  I remember 9 track mainframe tapes and 5.5 inch floppies: technologies that thrived in an ancient bygone epoch known as the Eighties. Good luck trying to read 9 track tapes or 5.5 inch floppies today! You will have better luck with older paper punch cards. Punch card readers are hard to find these days but you can see the damn card holes with your own eyes! In fact you don’t even need eyes to read punch cards. I once knew a blind mainframe programmer that banged out massive FORTRAN programs by feeling the holes on punch cards. Try that with a USB flash drive.

Of course I appreciate that you can stuff the data from an entire filing cabinet of 5.5 inch floppies onto one modern USB flash drive but I am disturbed by the fact that all those gigabytes will soon be more unreadable than cuneiform. I am not the first to worry about our distressed digital data. Kevin Kelly considers the word “storage” a dangerous misnomer and advocates the use of “movage” instead. You had better move your data from old to new formats or you will lose it!

Rosetta Ball

Rosetta Ball

Movage is one of the reasons I have not jumped on the eReader bandwagon. Replacing myriagrams of books with one lightweight tablet is appealing but iPads and Kindles are not stable! High quality books have shelf lives measured in centuries.  With digital media you’re lucky to get through a decade.  It’s a good bet you won’t be able to read what’s on your eReaders in ten short years!  You poor dumb suckers will have to repurchase your library just like you repurchased your record and movie collections. It’s not in Amazon’s or Apple’s interest to worry too much about media durability. Fortunately some people do worry about media stability.  Check out The Long Now’s Rosetta project for what I consider a stable medium.

To belabor this point, while I was unpacking boxes of old-fashioned books, (we recently moved again),  I came across a notebook I put together for a poster I presented at the 1994 APL conference in Antwerp. My notebook contained a paper version, still eminently readable, and four 3.5 inch disks.  My oldest computer has a vestigial 3.5 inch disk drive so I tried copying these sixteen year old disks. Some of the disks were unreadable, (surprise surprise), but I was able to recover a directory containing my poster’s source. Some of these files were old Microsoft Word documents. Word 2007 could not read them! Even when bits survive changes in software can render them useless. Fortunately I loathed Word in 1994, a sentiment I still maintain, and wrote my poster in \LaTeX.

\LaTeX source is dull ASCII text. Civilization will collapse before we lose the ability to read it! Of course \LaTeX, like Word, has changed since 1994 so, just for the hell of it, I decided to compile this old document with MikTeK 2.9.  It didn’t compile;  I was missing some old graphics macros and a key style file. It didn’t take me long to fix these problems. I replaced the graphics macros with standard \includegraphics{} commands and converted all the Windows *.bmp files to *.png files. Google even found the long-lost missing style file qqaaelba.sty in arxmliv. After making these trivial changes pdflatex.exe gobbled my poster source and moved Using FoxPro and DDE to Store J Words into the 21st century.

Resume blues partly alleviated by LaTeX

Once again your fearless correspondent is seeking new consulting opportunities.  One of the major drawbacks of consulting is the constant need to keep marketing yourself!  When it comes to self promotion that old standby, the resume,  is still one of your most effective tools.  When communicating with potential clients their first question is; “Can you send me a resume?”

Resumes are a black art.  There are many,  mostly bogus,  theories about what constitutes a good resume and an entire cottage industry has sprung up to support resume creation.  I am sure you have walked down the power resume aisle in your local big-box bookstore marveling at how people can write entire books on composing three page resumes.  Maybe you have suffered through a corporate out-placing where well dressed human-resource types will earnestly criticize your use of bullets and personal pronouns.  Whenever people go on about resumes I always think of Monty Python’s theory of Brontosauruses.

Here’s the nasty truth: a resume is an advertisement!  Do you honestly think anyone would dare to propose a theory of advertisements? A good ad gets noticed and helps sell the product.  The same holds for resumes.

I have a simple resume style that has worked well.  The only complaints I get relate to file types.  Some clients want plain text, some want Word documents, others want PDFs and most don’t care!

Lately I revised the LaTeX version of my resume.  LaTeX is my preferred document format.  LaTeX source documents are simple text files. You can manipulate them with any text editor on any computer system.  Hence LaTeX documents cannot be held hostage by software vendors that encode your words in version specific binary formats. If you have ever converted a Word document to an old or new format you will know of what I speak.  Because LaTeX files are simple text it’s easy to share LaTeX on the web.  My current resume borrowed from a number of authors.  When I borrow I try to give back.  The following links point to the LaTeX source of my resume and the final PDF output.  Help yourself but be courteous and maintain the creative common license block in the LaTeX code.

WordPress to LaTeX Hack

This post is obsolete. Look here for details on converting WordPress to \LaTeX.

I have stumbled on another coding nuisance. Last weekend I spent a few moments exploring ways to export and print this blog in a nice typeset fashion. I first tried Blurb’s book making software. It has a nice feature that automatically downloads blog posts and formats them as ready to print books. Sounds great ehh?  Download your precious blog rants, press a few buttons, generate a slick PDF, upload it to a site like Lulu and then wait a few days for your blog to appear in hardcover.  As usual the devil is in the details.

Blurb’s book making software makes a mess of:

  1. Any  \LaTeX inclusions. All \LaTeX code is echoed verbatim as ASCII.  I suppose we should be grateful that it’s not deleted.
  2. Source code listings are mangled beyond repair.  Your elegantly formatted code comes out worse than HEX dumps.
  3. Embedded images are improperly placed.  When it comes to blogs restricting image inclusions to simple center of the page layouts pays big dividends.  Wrapping text around images may look ok on your blog but RSS software like Google Reader will wreck it.  Apparently simple center of the page layouts is too much for Blurb.  The default layout stuffs square thumbnails in the margins:  arghhhh……
  4. Web Links are inserted as verbatim footnotes at the end of each posting.  The link footnotes often sprawl onto extra pages that have only one or two lines.  I can see why book publishers might not care about efficient page allocation ($$$) but I certainly do.

After hitting all these shortcomings on my first test I abandoned Blurb and started searching for ways to export WordPress blogs as \LaTeX.  There are a number of  useful tools for converting \LaTeX to WordPress. Some of these tools are used by Fields Medal winners: see Terence Tao’s blog.   Unfortunately going the other way does not appear to be well supported.  Damn!

After failing to come up with an acceptable WordPress to \LaTeX freebie I downloaded this blog in WordPress’s XML export format and took a look.  To my surprise WordPress XML is what I call good XML.  Good XML is designed to be read and understood by human beings!  This contrasts with bad XML.  Bad XML is essentially a binary format that some idiot decided should be rendered as XML.  Bad XML is useful when you want to slow down computers.

Converting WordPress XML to \LaTeX looks simple enough to make a nice C# coding exercise.  When I have hacked up a converter that panders to my idiosyncratic tastes I will post the source code.