Category Archives: TeX/LaTeX

Using jodliterate

The JODSOURCE addon, (a part of the JOD system), contains a handy literate programming tool that enables the generation of beautiful J source code documents.

The Bible, Koran, and Bhagavad Gita of Literate Programming is Donald Knuth’s masterful tome of the same name.

Knuth applied Literate Programming to his \TeX systems and produced what many consider enduring masterpieces of program documentation.

jodliterate is certainly not worthy of \TeX level accolades but with a little work it’s possible to produce fine documents. This J kernel notebook outlines how you can install and use jodliterate. Jupyter notebooks are typically executed but to accommodate J users that do hot have Jupyter this notebook is also available on GitHub as a static PDF document.

Notebook Preliminaries

In [1]:
NB. show J kernel version
9!:14 ''
j901/j64avx/windows/release-e/commercial/www.jsoftware.com/2020-01-29T11:15:50
In [2]:
NB. load JOD in a clear base locale
load 'general/jod' [ clear ''

NB. The distributed JOD profile automatically RESETME's.
NB. To safely use dictionaries with many J tasks they must
NB. be READONLY. To prevent opening the same put dictionary
NB. READWRITE comment out (dpset) and restart this notebook.
dpset 'RESETME'

NB. Converting Jupyter notebooks to LaTeX is 
NB. simplified by ASCII box characters.
portchars ''

NB. Verb to show large boxed displays in
NB. the notebook without ugly wrapping.
sbx_ijod_=: ' ... ' ,"1~ 75&{."1@":

Installing jodliterate

To use jodliterate you need to:

  1. Install a current version of J.
  2. Install the J addons JOD, JODSOURCE, and JODDOCUMENT.
  3. Build the JOD development dictionaries from JODSOURCE.
  4. Install a current version of pandoc.
  5. Install a current version of \TeX and \LaTeX.
  6. Make the jodliterate J script.
  7. Run jodliterate on a JOD group with pandoc compatible document fragments.
  8. Compile the files of the previous step to produce a PDF

When presented with long lists of program prerequisites my impulse is to run! Life is too short for configuration wars. Everything should be easy. Installing jodliterate requires more work than phone apps but compared to enterprise installations setting up jodliterate is trivial. We’ll go through it step by step.

Step 1: Install a current version of J

J is freely available at jsoftware.com. J installation instructions can be found on the J Wiki on this page.

Follow the appropriate instructions for your OS.

Note: JOD runs on Windows, Linux, and MacOS versions of J, hence these are the only platforms that currently support jodliterate.

Step 2: Install the J addons JOD, JODSOURCE and JODDOCUMENT

After installing J install the J addons. J addons are installed with the J package manager pacman. Pacman has three IDE flavors: a command-line flavor and two GUI flavors. The GUI flavors depend on JQT or JHS. The GUI flavors of pacman are only available on some versions of J whereas the command line version is part of the base J install and is available on all platforms.

I install all the addons. I recommend that you do the same.

JOD depends on some J modules like jfiles, regex, and task that are sometimes distributed as addons. If you install all addons JOD’s modules and dependents are both installed.

Installing addons with command line pacman

Start J and do:

In [3]:
NB. install J addons with command-line pacman

load 'pacman'    NB. load pacman jpkg services
In [4]:
'help' jpkg ''   NB. what can you do for me?
Valid options are:
 history, install, manifest, remove, reinstall, search,
 show, showinstalled, shownotinstalled, showupgrade,
 status, update, upgrade

https://code.jsoftware.com/wiki/JAL/Package_Manager/jpkg

In [5]:
NB. install all addons
NB. see https://code.jsoftware.com/wiki/Pacman

NB. uncomment next line if addons not installed
NB. 'install' jpkg '*'  NB.
In [6]:
3 {. 'showinstalled' jpkg '' NB. first few installed addons
+---------+------+------+-----------------------------+
|api/expat|1.0.11|1.0.11|libexpat                     |
+---------+------+------+-----------------------------+
|api/gles |1.0.31|1.0.31|Modern OpenGL API            |
+---------+------+------+-----------------------------+
|api/java |1.0.2 |1.0.2 |api: Java to J shared library|
+---------+------+------+-----------------------------+
In [7]:
'showupgrade' jpkg ''  NB. list addon updates

Installing addons with JQT GUI pacman

I mostly use the Windows JQT version of pacman to install and maintain J addons. You can find pacman on the tools menu.

pacman shows all available addons and provides tools for installing, updating, and removing them.

The GUI version is easy to use. Press the Select All button and then press the Install button to install all the addons. To update addons select the Upgrades menu and select the addons you want to update.

Step 3: Build the JOD development dictionaries from JODSOURCE

JOD source code is distributed in the form of JOD dictionary dumps. Dictionary dumps are large J scripts that serialize JOD dictionaries. Dumps contain everything stored in dictionaries. You will find source code, binary data, test scripts, documentation, build macros, and more in typical JOD dictionaries.

jodliterate is stored as a JOD dictionary group. A dictionary group is simply a collection of J words with optional header and post-processor scripts. JOD generates J scripts from groups. Before we can make jodliterate we must load the JOD development dictionaries. The JODSOURCE addon includes a J script that loads development dictionaries.

Again, start J and do:

In [8]:
require 'general/jod'
In [9]:
NB. set a JODroot user folder 
NB. if not set /jod/ is the default

NB. use paths for your OS
UserFolders_j_=: UserFolders_j_ , 'JODroot';'c:/temp'

NB. show added folder
UserFolders_j_ {~ (0 {"1 UserFolders_j_) i. <'JODroot'
+-------+-------+
|JODroot|c:/temp|
+-------+-------+
In [10]:
NB. load JOD developement dictionaries
load_dev_tmp=: 3 : 0
if. +./ (;:'joddev jod utils') e. od '' do.
  'dev dictionaries exist'
else.
  0!:0<jpath'~addons/general/jodsource/jodsourcesetup.ijs'
end.
)

load_dev_tmp 0
dev dictionaries exist
In [11]:
NB. joddev, jod, utils should exist

erase 'load_dev_tmp'
(;:'joddev jod utils') e. od ''
1 1 1

Step 4: Install a current version of pandoc

pandoc is easily one of the most useful markup utilities on the intertubes. If you routinely deal with markup formats like markdown, XML, \LaTeX, json and you aren’t using pandoc you are working too hard.

Be lazy! Install pandoc.

jodliterate uses the task addon to shell out to pandoc. Versions of pandoc after 2.9.1.1 support J syntax high-lighting.

In [12]:
NB. show pandoc version from J - make sure you are running 
NB. a recent version of pandoc. There may be different
NB. versions in many locations on various systems.

ppath=: '"C:\Program Files\Pandoc\pandoc"'
THISPANDOC_ajodliterate_=: ppath
shell THISPANDOC_ajodliterate_,' --version'
pandoc 2.9.1.1
Compiled with pandoc-types 1.20, texmath 0.12, skylighting 0.8.3
Default user data directory: C:\Users\john\AppData\Roaming\pandoc
Copyright (C) 2006-2019 John MacFarlane
Web:  https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

In [13]:
NB. make sure your version of pandoc 
NB. supports J syntax-highlighting

NB. appends line feed character if necessary
tlf=:] , ((10{a.)"_ = {:) }. (10{a.)"_

NB. J is on the supported languages list
pcmd=: THISPANDOC_ajodliterate_,' --list-highlight-languages'
(<;._2 tlf (shell pcmd) -. CR) e.~ <,'j'
1

Step 5: Install a current version of LaTeX

jodliterate uses \LaTeX to compile PDF documents. When setjodliterate runs it sets an output directory and writes a \LaTeX preamble file JODLiteratePreamble.tex to it. It’s a good idea to review this file to get an idea of the \LaTeX packages jodliterate uses. It’s possible that some of these packages are not in your \LaTeX distribution and will have to be installed.

To ease the burden of \LaTeX package maintenance I use freely available \TeX versions that automatically install missing packages.

  1. On Windows I use MiKTeX
  2. On other platforms I use TeXLive

If your system automatically installs packages the first time you compile jodliterate output it may fetch missing packages from The Comprehensive \TeX Archive Network (CTAN). If new packages are installed reprocess your files a few times to insure all the required packages are downloaded and installed.

Step: 6 Make the jodliterate J script

Once the JOD development dictionaries are built (Step 3) making jodliterate is easy. Start J and do:

In [14]:
require 'general/jod'

NB. open dictionaries
od ;:'joddev jod utils' [ 3 od ''
+-+--------------------+------+---+-----+
|1|opened (rw/ro/ro) ->|joddev|jod|utils|
+-+--------------------+------+---+-----+
In [15]:
NB. generate jodliterate
sbx mls 'jodliterate'
+-+--------------------+------------------------------------+               ... 
|1|load script saved ->|c:/jod/joddev/script/jodliterate.ijs|               ... 
+-+--------------------+------------------------------------+               ... 

mls creates a standard J load script. Once generated this script can be loaded with the standard J load utility. You can test this by restarting J without JOD and loading jodliterate.

In [16]:
NB. load generated script
load 'jodliterate'
NB. (jodliterate) interface word(s):
NB. --------------------------------
NB. THISPANDOC      NB. full pandoc path - use (pandoc) if on shell path
NB. grplit          NB. make latex for group (y)
NB. ifacesection    NB. interface section summary string
NB. ifc             NB. format interface comment text
NB. setjodliterate  NB. prepare LaTeX processing - sets out directory writes preamble

NOTE: adjust pandoc path if version (pandoc 2.9.1.1) is not >= 2.9.1.1

Step 7: Run jodliterate on a JOD group with pandoc compatible document fragments

This sounds a lot worse than it is. There is a group in utils called sunmoon that has an interesting pandoc compatible document fragment.

Start J and do:

In [17]:
require 'general/jod'

od 'utils' [ 3 od ''
+-+--------------+-----+
|1|opened (ro) ->|utils|
+-+--------------+-----+
In [18]:
NB. display short explanations for (sunmoon) words
sbx hlpnl }. grp 'sunmoon'
+-----------------+-------------------------------------------------------- ... 
|IFACEWORDSsunmoon|interface words (IFACEWORDSsunmoon) group                ... 
|NORISESET        |indicates sun never rises or sets in (sunriseset0) and ( ... 
|ROOTWORDSsunmoon |root words (ROOTWORDSsunmoon) group                      ... 
|arctan           |arc tangent                                              ... 
|calmoons         |calendar dates of new and full moons                     ... 
|cos              |cosine radians                                           ... 
|fromjulian       |converts Julian day numbers to dates, converse (tojulian ... 
|moons            |times of new and full moons for n calendar years         ... 
|round            |round (y) to nearest (x) (e.g. 1000 round 12345)         ... 
|sin              |sine radians                                             ... 
|sunriseset0      |computes sun rise and set times - see group documentatio ... 
|sunriseset1      |computes sun rise and set times - see group documentatio ... 
|tabit            |promotes only atoms and lists to tables                  ... 
|tan              |tan radians                                              ... 
|today            |returns todays date                                      ... 
|yeardates        |returns all valid dates for n calendar years             ... 
+-----------------+-------------------------------------------------------- ... 
In [19]:
NB. display part of the (sunmoon) group document header
NB. this is pandoc compatible markdown - note the LaTeX
NB. commands - pandoc allows markdown/LaTeX mixtures
900 {. 2 9 disp 'sunmoon'
`sunmoon` is a collection of basic astronomical algorithms
The key verbs are `moons`, `sunriseset0` and `sunriseset1.`  
All of these verbs were derived from BASIC programs published
in *Sky & Telescope* magazine in the 1990's. The rest of
the verbs in `sunmoon` are mostly date and trigonometric
utilities.

\subsection{\texttt{sunmoon} Interface}

~~~~ { .j }
  calmoons      NB. calendar dates of new and full moons                     
  moons         NB. times of new and full moons for n calendar years         
  sunriseset0   NB. computes sun rise and set times - see group documentation
  sunriseset1   NB. computes sun rise and set times - see group documentation
~~~~

\subsection{\textbf\texttt{sunriseset0} \textsl{v--} sunrise and sunset times}

This  verb has been adapted from a BASIC program submitted by
Robin  G.  Stuart  *Sky & Telescope's*  shortest  sunrise/set
program  cont
In [20]:
NB. run jodliterate on (sunmoon)
require 'jodliterate'

NB. set the output directory - when 
NB. running in Jupyter use a subdirectory
NB. of your notebook directory.

ltxpath=: 'C:\Users\john\AnacondaProjects\testfolder\grplit\' 
setjodliterate ltxpath
+-+-------------------------------------------------+
|1|C:\Users\john\AnacondaProjects\testfolder\grplit\|
+-+-------------------------------------------------+
In [21]:
NB. (grplit) returns a list of generated 
NB. LaTeX and command files. The *.bat 
NB. file compiles the generated LaTeX

,. grplit 'sunmoon'
+-----------------------------------------------------------------+
|1                                                                |
+-----------------------------------------------------------------+
|C:\Users\john\AnacondaProjects\testfolder\grplit\sunmoon.tex     |
+-----------------------------------------------------------------+
|C:\Users\john\AnacondaProjects\testfolder\grplit\sunmoontitle.tex|
+-----------------------------------------------------------------+
|C:\Users\john\AnacondaProjects\testfolder\grplit\sunmoonoview.tex|
+-----------------------------------------------------------------+
|C:\Users\john\AnacondaProjects\testfolder\grplit\sunmooncode.tex |
+-----------------------------------------------------------------+
|C:\Users\john\AnacondaProjects\testfolder\grplit\sunmoon.bat     |
+-----------------------------------------------------------------+

Step 8: Compile the files of the previous step to produce a PDF

In [22]:
_250 {. shell ltxpath,'sunmoon.bat'
gular.otf><c:/program files/miktex 2.9/fonts/ope
ntype/public/lm/lmmono12-regular.otf>
Output written on sunmoon.pdf (22 pages, 107711 bytes).
Transcript written on sunmoon.log.

(base) C:\Users\john\AnacondaProjects\testfolder\grplit>endlocal

In [23]:
NB. uncomment to display generated PDF 
 NB. shell ltxpath,'sunmoon.pdf'

Storing jodliterate pandoc compatible document fragments in JOD

Effective use of jodliterate requires a melange of Markdown, \LaTeX, JOD, and J skills combined with a healthy attitude about experimentation. You have to try things and see if they work!

However, before you can try jodliterate document fragments you have put them in JOD dictionaries.

jodliterate uses two types of document fragments:

  1. markdown overview group documents.
  2. \LaTeX overview macros.

Markdown group documents are transformed by pandoc into \LaTeX but the overview macros are not altered in any way. This enables the use of arbitrarily complex \LaTeX. The following examples show how to insert document fragments.

Create a jodliterate Demo Dictionary

In [24]:
NB. create a demo dictionary - (didnum) insures new name
require 'general/jod'

NB. new dictionary in default JOD directory
sbx newd itslit_ijod_=: 'aaa',":didnum_ajod_ ''
+-+---------------------+------------------------------------------+------- ... 
|1|dictionary created ->|aaa327403631806685638405507439206657280913|c:/user ... 
+-+---------------------+------------------------------------------+------- ... 
In [25]:
NB. 1 if new dictionary created
(<itslit) e. od ''
1
In [26]:
od itslit [ 3 od '' NB. open only new dictionary
+-+--------------+------------------------------------------+
|1|opened (rw) ->|aaa327403631806685638405507439206657280913|
+-+--------------+------------------------------------------+
In [27]:
NB. define some words
freq=:~. ; #/.~
movmean=:-@[ (+/ % #)\ ]
geomean=:# %: */
bmi=: 704.5"_ * ] % [: *: [
polyprod=:+//.@(*/)

wlst=: ;:'freq movmean geomean bmi polyprod'

NB. put in dictionary
put wlst

NB. short word explanations
t=: ,:  'freq';'frequency distribution'
t=: t , 'movmean';'moving mean'
t=: t , 'geomean';'geometric mean of a list'
t=: t , 'bmi';'body mass index - (x) inches (y) lbs'
t=: t , 'polyprod';'polynomial product'

0 8 put t
+-+-------------------------------+------------------------------------------+
|1|5 word explanation(s) put in ->|aaa327403631806685638405507439206657280913|
+-+-------------------------------+------------------------------------------+
In [28]:
NB. make header and macro groups
grp 'litheader' ; wlst
grp 'litmacro'  ; wlst
+-+--------------------------+------------------------------------------+
|1|group <litmacro> put in ->|aaa327403631806685638405507439206657280913|
+-+--------------------------+------------------------------------------+
In [29]:
IFACEWORDSlitheader=: wlst
put 'IFACEWORDSlitheader'
+-+-------------------+------------------------------------------+
|1|1 word(s) put in ->|aaa327403631806685638405507439206657280913|
+-+-------------------+------------------------------------------+

Use Group Document Overview Markdown

In [30]:
NB. add group header markdown
litheader=: (0 : 0)
`litheader` is a markdown demo group. 

This markdown text will be 
[transmogrified](https://calvinandhobbes.fandom.com) 
by `pandoc` to \LaTeX. A group interface will be 
generated from the `IFACEWORDSlitheader`
list. Interface lists are usually, but 
not always, associated with a *class group*.

\subsection{\texttt{litheader} Interface}

`{~{insert_interface_md_}~}`
)

NB. store markdown as a JOD group document
2 9 put 'litheader';litheader
+-+-----------------------------+------------------------------------------+
|1|1 group document(s) put in ->|aaa327403631806685638405507439206657280913|
+-+-----------------------------+------------------------------------------+
In [31]:
NB. run jodliterate on group
ltxpath=: 'C:\Users\john\AnacondaProjects\testfolder\grplit\' 
setjodliterate ltxpath
{: grplit 'litheader'
+--------------------------------------------------------------+
|C:\Users\john\AnacondaProjects\testfolder\grplit\litheader.bat|
+--------------------------------------------------------------+
In [32]:
NB. compile latex
_250 {. shell ltxpath,'litheader.bat'
lar.otf><c:/program files/miktex 2.9/fonts/o
pentype/public/lm/lmmono12-regular.otf>
Output written on litheader.pdf (4 pages, 47726 bytes).
Transcript written on litheader.log.

(base) C:\Users\john\AnacondaProjects\testfolder\grplit>endlocal

In [33]:
NB. uncomment to show PDF
NB. shell ltxpath,'litheader.pdf'

Use Macro Overview LaTeX

In [34]:
NB. add a LaTeX overview - this code will not 
NB. be altered by jodliterate the suffix
NB. '_oview_tex' is required to associate 
NB. the overview with the group 'litmacro'

litmacro_oview_tex=: (0 : 0)

This \LaTeX\ code will not be 
touched by \texttt{jodliterate}. 

\subsection{Business Babel}

``Truth management is enabled.''

\emph{Excerpt from an actual business document!}
Obviously composed in an irony free zone.

\subsection{Some Complicated \LaTeX}

\medskip

\[
\frac{1}{\Bigl(\sqrt{\phi \sqrt{5}}-\phi\Bigr) e^{\frac25 \pi}} =
1+\frac{e^{-2\pi}} {1+\frac{e^{-4\pi}} {1+\frac{e^{-6\pi}}
{1+\frac{e^{-8\pi}} {1+\ldots} } } }
\]

)

NB. store LaTeX as JOD text macro 
4 put 'litmacro_oview_tex';LATEX_ajod_;litmacro_oview_tex
+-+--------------------+------------------------------------------+
|1|1 macro(s) put in ->|aaa327403631806685638405507439206657280913|
+-+--------------------+------------------------------------------+
In [35]:
NB. run jodliterate on group
{: grplit 'litmacro'
+-------------------------------------------------------------+
|C:\Users\john\AnacondaProjects\testfolder\grplit\litmacro.bat|
+-------------------------------------------------------------+
In [36]:
NB. compile latex
_250 {. shell ltxpath,'litmacro.bat'
e1/public/lm/lmsy6.pfb><C:/Program Files/MiKTeX 2.9/fonts/type1/public/lm/lms
y8.pfb>
Output written on litmacro.pdf (4 pages, 138976 bytes).
Transcript written on litmacro.log.

(base) C:\Users\john\AnacondaProjects\testfolder\grplit>endlocal

In [37]:
NB. display PDF
NB. shell ltxpath,'litmacro.pdf'

Using jodliterate with larger J systems

The main jodliterate verb grplit works with single JOD groups. Larger systems are typically made from many groups. JOD macro and test scripts are one way to work around this limitation. The JOD development dictionaries contain several macros that illustrate this approach.

In [38]:
od ;:'joddev jod utils' [ 3 od ''

NB. list macros with substring 'latex'
4 2 dnl 'latex'
+-+-------------+---------------------+
|1|buildjodlatex|buildjodliteratelatex|
+-+-------------+---------------------+
In [39]:
NB. display start of macro that 
NB. applies jodliterate to JOD code
250 {. 4 disp 'buildjodlatex'
NB.*buildjodlatex s--  generates syntax highlighted JOD source LaTeX.
NB.
NB. Files are written to the put dictionary's document directory.
NB.
NB. assumes: current versions of pandoc (pandoc 2.9.1.1 or later)
NB.          check noun (THISPANDOC

Final Remarks

jodliterate is an idiosyncratic anal-retentive software utility; it’s mainly for people that consider source code an art form. Nobody likes ugly undocumented art!

If you have any questions, suggestions, or complaints please leave a comment on this post. To include others join one of J discussion forums and post your queries there.

May the source be with you!

WordPress conversion from UsingJodliterate.ipynb by nb2wp v0.3.1

Semi-Literate JOD

JOD Logo

Click to view jodliterate.pdf

Despite seven decades of programming experience documenting software remains a challenge. There are many reasons for this sorry state of affairs with the most important being that programmers simply do not agree on the need for documentation. As pathetic as this sounds it’s not without merit. It all depends on what you call “documentation.”

Writing technical documents for management, marketing or users usually results in excruciating rounds of Dilbertian critiques. Everyone understands your code better than you do. If you provide too much detail, you get complaints. If you use unfamiliar words, you get complaints. If you point out limitations, assumptions or caveats, you get complaints. If you assume basic 8th grade reading levels, you get complaints. If you use nonstandard fonts or unauthorized style templates, you get complaints. No wonder many programmers hate “documentation” and blow off the entire problem by making ludicrous claims about “self documenting code.” The self documenting cabal may have fooled management but they’re not fooling the rest of us. The need for illuminating program documentation is as pressing today as it was for ENIAC coders in the 1940’s and, when in it comes to illuminating documentation, the best overall approach was pioneered by Donald Knuth over twenty-five years ago and goes by the moniker literate programming.

Providing basic literate programming support in JOD has been on my to-do list for ages. I’ve held off until recently because I have never been happy with my mark up options. JOD directly supports simple J scriptdoc compatible leading comment block formatting. For example many of my J verbs start with a comment block like:

betweenstrs=:4 : 0

NB.*betweenstrs v-- select sublists between  nonnested delimiters
NB. discarding delimiters.
NB.
NB. dyad:  blcl =. (clStart;clEnd) betweenstrs cl
NB.        blnl =. (nlStart;nlEnd) betweenstrs nl
NB.
NB.   ('start';'end') betweenstrs 'start yada yada end boo hoo start ahh end'
NB.
NB.   NB. also applies to numeric delimiters
NB.   (1 1;2 2) betweenstrs 1 1 66 666 2 2 7 87 1 1 0 2 2

's e'=. x
llst=. ((-#s) (|.!.0) s E. y) +. e E. y
mask=. ~:/\ llst
(mask#llst) <;.1 mask#y
)

Even if you can’t spell J I bet you have a good idea about what this “program” does and, if you doubt my claims, I’ve left you with some examples to try the next time you find yourself in J. Stupid comments may be for losers but telling comments, especially example laden ones, really help! And, if you really find comments distracting, JOD has a deal for you!

   ;1{compj 'betweenstrs' 
betweenstrs=:4 :0
's e'=.x
a=.((-#s )(|.!.0)s E.y)+.e E.y
b=.~:/\a
(b#a)<;.1 b#y
)

compj purges pesky comments and reduces tedious long identifiers like mask to pure compact J. Getting rid of comments is trivial, putting them back in: not so much! JOD’s simple comment block formatting has been very effective but it’s hardly literate programming.

Literate programming requires more muscle. Knuth used his own TeX. TeX and LaTeX are certainly up to the job, as are many HTML and XML approaches. Unfortunately, all these mark up formats suffer from “distracting taggyness.” I can tolerate LaTeX but HTML and XML drives me nuts. Yes, there are perfectly fine editors for all these formats, but remember, we are inserting the resulting text into code that we will be looking at for the rest of our miserable coding lives! We need a mark up format that’s stable, readable, versatile, easy to use and, this is very important, easy to ignore! Markdown is such a format. It’s almost ideal for program comments and is capable of much more. I’ve started using markdown in JOD and it’s already paying its way.

jodliterate.ijs is a J utility script that can generate semi-literate LaTeX documents directly from JOD groups. It uses a version of pandoc with J syntax highlighting, see Pandoc based J Syntax Highlighting for details. I consider jodliterate semi-literate because it’s completely at the mercy of the programmer. If you don’t store coherent markdown text fragments in JOD all you get is a nice syntax highlighted listing. But, if you actually write about your group, jodliterate can produce essential documents. jodliterate.pdf is an example of this tool being used on itself. Self reference always makes an excellent test case. jodliterate will be included in the next JOD release. Until then you can download the J script from this directory. As always referenced files are available in the files sidebar. Enjoy!

Pandoc based J Syntax Highlighting

John MacFarlane’s excellent command line utility Pandoc is a Haskell program that converts to and from various text markup languages. Pandoc’s help option lists its supported input and output formats.

The following examples are Linux bash shell commands. Windows shell commands are identical.

$ pandoc --help
pandoc [OPTIONS] [FILES]
Input formats:  native, json, markdown, markdown+lhs, rst, rst+lhs, docbook,
                textile, html, latex, latex+lhs
Output formats: native, json, html, html5, html+lhs, html5+lhs, s5, slidy,
                slideous, dzslides, docbook, opendocument, latex, latex+lhs,
                beamer, beamer+lhs, context, texinfo, man, markdown,
                markdown+lhs, plain, rst, rst+lhs, mediawiki, textile, rtf, org,
                asciidoc, odt, docx, epub

Some Pandoc conversions are better than others. Pandoc does a better job of turning markdown into LaTeX than LaTeX into markdown. It’s also better at converting HTML into LaTeX than LaTeX into HTML. Pandoc works best when converting markdown, the simplest of its inputs, to other formats. In fact Pandoc does such a good job of converting markdown to HTML, HTML+MathJax, LaTeX or PDF that many writers are now saving their source documents as markdown text knowing they can easily produce other formats as needed.

As handy as Pandoc’s markup conversions are this nifty tool also supports syntax highlighting for over a hundred programming languages. Unfortunately, my favorite language J is not on Pandoc’s list of highlighted languages. [1] Where have I run into this problem before? Luckily for me Pandoc is an open source tool and Pandoc’s author has made it easy to add new highlight languages.

Pandoc is a Haskell program. I’ve been aware of Haskell’s existence for years but until I decided to take on this specialized Pandoc hack I had never studied or used the language. Usually when you set out to modify a large program in an unfamiliar programming language you’re in for what can only be described as an f’ing educational experience. It’s a testament to the quality of the Haskell’s global libraries and standard tools that a complete Haskell novice can effectively tweak large Haskell programs. Here’s what you have to do.

  1. Install the Haskell Platform. The Haskell Platform is available for all the usual suspects. I’ve used both the Windows and Linux versions. I almost installed the Mac version on my wife’s Mac but resisted the urge.
  2. Get with the Cabal. Cabal is the main Haskell package distribution and build utility. Cabal comes with the Haskell Platform and is easily accessed from the command line. Type cabal --help in your favorite shell to view the program’s options.
  3. Spend sometime playing with Hackage. Hackage contains a large set of Haskell packages including all the source code required to build Pandoc.

After installing the Haskell Platform and familiarizing yourself with Cabal try building Pandoc. This will thoroughly exercise your Haskell system. Instructions for building Haskell packages are here. After reading the package build instructions run the following in your command shell:

$ cabal update
$ cabal install pandoc

This will download, compile and install a number of Haskell packages. Where Cabal puts the packages depends on your operating system. Cabal saves Linux packages in a hidden local directory. On my machine they ended up in:

/home/john/.cabal/lib

If you managed to build Pandoc you’re now ready to add a new highlighting language. Pandoc uses the highlighting-kate package for highlighting. highlighting-kate works by reading a directory of Kate editor xml language regex based definition files and generating custom language parsers. We want to generate a custom J parser so we need to download highlighting-kate source and add a Kate xml definition file for J.

You can find such a J Kate file on the J Wiki here. Download this file by cutting and pasting and save it as j.xml. Now do the following.

  1. Run the Pandoc version command pandoc --version of the Pandoc you just built to determine the version of the highlighting-kate package you need.
  2. Use Cabal to unpack the required highlighting-kate package. This downloads the required package and creates a temporary subdirectory in your current directory that contains package source code.
    $ cabal unpack highlighting-kate-0.5.3.2
    Unpacking to highlighting-kate-0.5.3.2/
  3. Move into the temporary subdirectory and copy the Kate j.xml file to the package’s xml subdirectory.
    $ cd highlighting-kate-0.5.3.2
    $ cp ~/pd/blog/j.xml ~/temp/highlighting-kate-0.5.3.2/xml/j.xml
  4. Configure the package.
    $ cabal configure
    Resolving dependencies...
    Configuring highlighting-kate-0.5.3.2...
  5. Build the highlighting-kate package.
    $ cabal build
    Resolving dependencies...
        ... (omitted) ...
  6. If highlighting-kate builds without problems run the command.
    $ runhaskell ParseSyntaxFiles.hs xml
    Writing Text/Highlighting/Kate/Syntax/SqlPostgresql.hs
    Writing Text/Highlighting/Kate/Syntax/Scala.hs
        ... (omitted) ...

    ParseSyntaxFiles scans the package’s xml subdirectory and generates language specific parsers. If all goes well you will find J.hs in this directory.

    ~/temp/highlighting-kate-0.5.3.2/Text/Highlighting/Kate/Syntax

    J.hs, like all the files referred to in this post, are available in the files sidebar in the Haskell/Pandoc subdirectory.

  7. Now rebuild the highlighting-kate package. This compiles your new J.hs parser file.
    $ cabal build
    Resolving dependencies...
        ... (omitted) ...
  8. After rebuilding the package run the Cabal copy command to put the modified package in the expected library location.
    $ cabal copy
    Installing library in
    /home/john/.cabal/lib/highlighting-kate-0.5.3.2/ghc-7.4.1

Now that the highlighting library is up to date we have to rebuild Pandoc. To do this mirror the steps taken to download and build the highlighting package.

  1. Use Cabal to unpack the Pandoc package.
    $ cd ~/temp
    $ cabal unpack pandoc-1.9.4.2
    Unpacking to pandoc-1.9.4.2/
  2. Switch to the Pandoc subdirectory and configure the package.
    $ cabal configure
    Resolving dependencies...
    [1 of 1] Compiling Main      ( Setup.hs, dist/setup/Main.o )
        ... (omitted) ...
  3. Rebuild Pandoc.
    $ cabal build 
    Building pandoc-1.9.4.2...
    Preprocessing executable 'pandoc' for pandoc-1.9.4.2...
        ... (omitted) ...

    If all goes well a Pandoc executable will be written to this subdirectory.

    ~/temp/pandoc-1.9.4.2/dist/build/pandoc
  4. You can check the new executable by running pandoc --version. The result should display J in the list of supported languages.

Now that we have a Pandoc that can highlight J we’re almost ready to blog gaudy J code. However before doing this we need to install some custom CSS. Custom CSS is not available on free WordPress.com blogs. To apply custom coloring schemes get the custom package and learn how to use WordPress’s custom CSS editor. As daunting as this sounds it’s no problemo for my limited purposes. To enable tango style Pandoc syntax highlighting on your WordPress blog paste tango.css into the custom CSS editor, check the “Add my CSS to CSS stylesheet” button and then press the “Save Stylesheet” button. Now your WordPress blog will be sensitive to the HTML span tags generated by Pandoc.

To show that all this hacking works as intended you can check out the Pandoc generated versions of this blog post. I’ve posted the original markdown source with PDF, LaTeX and HTML versions. All these files are available via the files sidebar. You can generate the HTML version with the command:

$ pandoc -s --highlight-style=tango PJHighlight.markdown -o PJHighlight.html

To get other versions simply change the file extension of the output -o file.

Bonebridge puzzle in MYST IV

Bonebridge puzzle in MYST IV
Click for “Haven Age” Walkthrough

Finally we are ready to post syntax highlighted J code. The following J verb bonebridge generates all “likely” lock combinations for the MYST IV Bonebridge puzzle in Pandoc’s tango style. At one time I was a big fan of MYST computer games. I always enjoyed being lost in a beautiful puzzle which, if you discard the beautiful bit, is a pretty accurate description of my programmer day job.

bonebridge=:3 : 0

NB.*bonebridge  v--  lists  totem  symbol  permutations for  bone
NB. bridge.
NB.
NB. The  solution to  this MYST IV puzzle is similiar to the book
NB. shelf puzzle in Tomanha but requires far more  exploration of
NB. the age.
NB.
NB. You are confronted with  5  bones on the lock.  All the bones
NB. move independently. You can see the settings for 4 bones. One
NB. bone  has a  broken display.  The four  visible bones  have 8
NB. symbols on them in the  same order.  The  5th bone also has 8
NB. symbols and you can "safely" infer they are in the same order
NB. as the visible bones.
NB.
NB. Four  bone  symbols   match  symbols  found  on  totem  poles
NB. distributed around the  age. There is a  5th  totem pole  but
NB. fruit eating mangrees  obscure  the  totem symbol and  I have
NB. never  seen it.  The  totem  poles are  associated  with  age
NB. animals. In addition to the totem poles  there is  a chart in
NB. the  mangree  observation  hut  that  displays  a  triangular
NB. pattern  of paw  prints.  The  paw  prints  define an  animal
NB. ordering. The order  seems to be how  dangerous a  particular
NB. animal is;  big scary animals  are at the top and vegetarians
NB. are at the bottom.
NB.
NB. Putting the clues together you infer:
NB.
NB. a)  the  bridge  combination  is  some  permutation  of  five
NB. different symbols
NB.
NB. b) two possible symbol orders are given by the paw chart
NB.
NB. c) you know 5 symbols and the 4th is one of the remaining 4
NB.
NB. If this is  the  case  the number of  possible  lock settings
NB. shrinks from 32768 to the ones listed by this verb.
NB.
NB. monad:  bonebridge uuIgnore
NB.
NB.   bonebridge 0

NB. known in paw order
known=.    s: ' square triangle hourglass yingyang'
unknown=.  s: ' clover cross xx yy'

NB. all possible lock permutations
settings=. ~. 5 {."1 tapl known,unknown
assert. ((!8)%!8-5) = #settings

NB. possible ordering - we don't know
NB. what the fifth symbol is but it
NB. occurs in the 3rd slot
order=. 8#s:<''
order=. known (0 1 6 7)} order
order=. unknown (2 3 4 5)} order

NB. keep unknown only in 3rd slot
settings=. settings #~ -. +./"1 (0 1 3 4{"1 settings) e. unknown
settings=. settings #~ (2 {"1 settings) e. unknown

NB. strict row sequence adverb
srsm=.  1 : '*./"1 u/&> 2 <\"1 y'

NB. retain strictly increasing and strictly decreasing rows
grade=. order i. settings
settings #~ ((< srsm)"1 grade) +. (> srsm)"1 grade
)

[1] J has its own syntax highlighting tools but they are not part of a document generation system. Pandoc’s highlighters elegantly feed into many output formats making them far more useful.

Turn your Blog into an eBook

If you have worked through the exhausting procedure of converting your blog to LaTeX: see posts (1), (2) and (3), you will be glad to hear that turning your blog into an image free eBook is almost effortless. In this post I will describe how I convert my blog into EPUB and MOBI eBooks.

eBooks how the cool kids are reading

eBook readers like Kindles, Nooks, iPads and many cell phones are optimized for plain old prose. They excel at displaying reflowable text in a variety of fonts, sizes and styles. One eBook reader feature, dear to my old fart eyes, is the ability to increase the size of text.  All eBooks are potentially large print editions. There are other advantages: most readers can store hundreds, if not thousands of books, making them portable libraries. It’s now technically possible to hand a kindergarten student a little tablet that holds every single book he will use from preschool to graduate school. The only obstacle is the rapacious textbook industry and their equally rapacious eBook publishing enablers. But fear not open source man will save the day. The days of overpriced digital goods are over! I will never pay more than a few bucks for an eBook because I can make my own and so can you! Let’s get together and kill off another industry that so has it coming!

PDFs, EPUBs and MOBIs

Native eBook file formats like EPUB and MOBI do not handle complex page layouts well. If your document contains a lot of mathematics, figures and well placed illustrations stick with PDF workflows.[1] You will save yourself and your readers a lot of grief.  But, if your document is a prose masterpiece, a veritable great American novel, then “publishing” it as an EPUB or MOBI is great way to target eBook readers. EPUBs and MOBIs can be compiled from many sources.  I start with the LaTeX files I created for the PDF version of this blog because I hate doing the same boring task twice. By far the most time-consuming part of converting WordPress export XML to LaTeX is editing the pandoc generated *.tex files to resolve figures and fix odd run-together-words and paragraphs. To preserve these edits I use pandoc to convert my edited *.tex to *.markdown files.

Markdown

Markdown is a very simple text oriented format. A markdown file is completely readable exactly the way it is. All you need is a text editor. Even text editors are overkill. You could compose markdown with early 20th century mechanical typewriters; it’s a low tech format for the ages: perfect for prose.

The J verb MarkdownFrLatex [2] calls pandoc and converts my *.tex files to *.markdown. I place my markdown in the directory

c:/pd/blog/wp2epub

and to track changes to my markdown files I GIT this directory. MarkdownFrLatex strips out image inclusions and removes typographic flourishes.  When it succeeds it writes a simple markdown file and when it fails it writes a *.baddown file. Baddown files are *.tex files that contain lstlistings and complex figure environments that are best resolved with manual edits. After removing such problematic LaTeX environments the J verb FixBaddown calls pandoc and turns baddown files into markdown files.

Generating EPUB and MOBI files

When the conversion to markdown is complete I run MainMarkdown to mash all my files into one large markdown file with an eBook header. The eBook header for this blog is:

% Analyze the Data not the Drivel
% John D. Baker

The first few lines of the consolidated bm.markdown file are:

% Analyze the Data not the Drivel
% John D. Baker

#[What’s In it for
Facebook?](https://bakerjd99.wordpress.com/2009/09/05/whats-in-it-for-facebook/)

-------------------------------------------------------------------------------------------------

*Posted: 05 Sep 2009 22:44:50*

[Facebook](http://www.facebook.com) is huge: they brag about a user
count well north of one hundred million. If only 0.5% of their users are
active that’s 500,000 *concurrent users.* How many expensive servers
does it take to support such a load? .....

Generating an EPUB from bm.markdown is a simple matter of opening up your favorite command line shell and issuing the pandoc command:

pandoc -S --epub-cover-image=bmcover.jpg -o bm.epub bm.markdown

You can read the resulting EPUB file bm.epub on any EPUB eBook reader. Here’s a screen shot of bm.epub on my iPhone.

iPhone loaded with my blog

iPhone loaded with my blog

The last step converts bm.epub to bm.mobi. MOBI is a native Kindle format. Pandoc can generate MOBI from bm.markdown but it inexplicably omits a table of contents. No problemo:  I use Calibre to convert bm.epub to bm.mobi. Calibre properly converts the embedded EPUB table of contents to MOBI.  Here’s bm.mobi on a Kindle.

Kindle loaded with my blog

Kindle loaded with my blog

All the “published” versions of this blog are available on the Download this Blog page so please help yourself!


[1] LaTeX is usually compiled to PDF making it one of hundreds of PDF workflows.

[2] All the J verbs referenced in this post are in the script TeXfrWpxml.ijs

WordPress to LaTeX with Pandoc and J: Using TeXfrWpxml.ijs (Part 3)

WordPress to LaTeX

WordPress to LaTeX

In this post I will describe how to use the J script TeXfrWpxml.ijs to generate LaTeX source from WordPress export XML.  I am assuming you have worked through (Part 1) and (Part 2) and have:

  1. Successfully installed and tested Pandoc.
  2. Installed and tested a version of J.
  3. Set up appropriate directories (Part 2).
  4. Know how to use LaTeX.

Item #4 is a big if.  Inexperienced LaTeX users will probably not enjoy a lot of success with this procedure as the source generated by TeXfrWpxml.ijs requires manual edits to produce good results.  However, if you’re not a LaTeX guru, do not get discouraged. It’s not difficult to create blog documents like bm.pdf.

Step 1: download WordPress Export XML

How to download WordPress export XML is described here.  Basically you go to your blog’s dashboard, select Tools, choose Export  and select the All content option.

Tools > Export > All Content

Tools > Export > All Content

When you press the Download Export File  button your browser will download a single XML file that contains all your posts and comments. Remember where you save this file. I put my export XML here.

c:/pd/blog/wordpress/analyzethedatanotthedrivel.wordpress.xml

Step 2: download TeXfrWpxml.ijs

Download TeXfrWpxml.ijs and remember where you save it.  I put this script here.

c:/pd/blog/TeXfrWpxml.ijs

Step 3: start J and load TeXfrWpxml.ijs

TeXfrWpxml.ijs was generated from JOD dictionaries. With JOD it’s easy to capture root word dependencies and produce complete standalone scripts. TeXfrWpxml.ijs needs only the standard J load profile to run.  It does not require any libraries or external references and should run on all Windows and Linux versions of J after 6.01.  Loading this script is a simple matter of executing:

load 'c:/pd/blog/TeXfrWpxml.ijs'

The following shows this script running in a J 7.01 console. The console is the most stripped down J runtime.

Step 4: review directories and necessary LaTeX files

The conversion script assumes proper directories are available up: see Part 2. The first time you run TeXfrWpxml.ijs it’s a good idea to check that the directories and files the script is expecting are the ones you want to process.  You can verify the settings by displaying TEXFRWPDIR, TEXINCLUSIONS, TEXROOTFILE and TEXPREAMBLE.

  TEXPREAMBLE
bmamble.tex
  TEXFRWPDIR
c:/pd/blog/wp2latex/
  TEXINCLUSIONS
inclusions
  TEXROOTFILE
bm.tex
  TEXPREAMBLE
bmamble.tex

If all these directories and files exist go to step (5).

Step 5: make sure you are online

The first time you run the converter it will attempt to download all the images referenced in your blog. This is where wget.exe gets executed.  Obviously to download anything you must be connected to the Internet.

Step 6: run LatexFrWordpress

Run the verb LatexFrWordpress.  The monadic version of this verb takes a single argument: the complete path and file name of the export XML file you downloaded in step (1).

xml=: 'c:/pd/blog/wordpress/analyzethedatanotthedrivel.wordpress.xml'

LatexFrWordpress xml

As the verb runs you will see output like:

   LatexFrWordpress xml
What's In it for Facebook?
downloading: c:/pd/blog/wp2latex/inclusions/demotivational-posters-facebook-you.jpg
1 downloaded; 0 not downloaded; 0 skipped
Fake Programming
downloading: c:/pd/blog/wp2latex/inclusions/672169130_vajvn-M.png
1 downloaded; 0 not downloaded; 0 skipped
Laws or Suggestions
downloading: c:/pd/blog/wp2latex/inclusions/i-B5mfdRF-M.jpg
1 downloaded; 0 not downloaded; 0 skipped
Lens Lust

... many lines omitted ...

downloading: c:/pd/blog/wp2latex/inclusions/i-mNK4RHL-M.png
1 downloaded; 0 not downloaded; 0 skipped
WordPress to LaTeX with Pandoc and J: LaTeX Directories (Part 2)
0 downloaded; 0 not downloaded; 1 skipped
+-++
|1||
+-++

When the verb terminates you should have a directory c:/pd/blog/wp2latex full of *.tex files:  one file for each blog post. Now the hard work starts.

Step 7: editing LaTeX posts

The conversion from WordPress XML to LaTeX produces files that require manual edits. The more images, video, tables and other elements in your posts the more demanding these edits will become.  My blog has about one image per post.  Most of these images are wrapped by text. LaTeX has a mind of its own when it comes to floating figures and getting illustrations to behave requires far more parameter tweaking than it should. This is a longstanding weakness of LaTeX that pretty much everyone bitches about. My advice is start at the front of your document and work through it post by post. The files generated by LatexFrWordpress do not attempt to place figures for you but they do bolt in ready-made figure templates as comments that you can experiment with.  Each post file is also set up for separate LaTeX compilation. You don’t have to compile your entire blog to tweak one post. The one good thing about this edit step is once you have sorted out your old posts you do not have to revisit them unless you make major global document changes. The next time you run LatexFrWordpress it will only bring down new posts and images.

Step 8: compile your LaTeX blog

I use batch files and shell scripts to drive LaTeX compilations.  I processed my blog with this batch file.

echo off
rem process blog posting (bm.tex) root file
title Running Blog Master/LaTeX ...

rem first pass for aux file needed by bibtex
lualatex bm

rem generate/reset bbl file
bibtex bm
makeindex bm

rem resolve all internal references - may
rem comment out when debugging entire document
lualatex bm
lualatex bm

rem display pdf - point to prefered PDF reader
title Blog Master/LaTeX complete displaying PDF ...
"C:\Program Files\SumatraPDF\SumatraPDF.exe" bm.pdf

The presence of Unicode APL, see this post, forced me to use lualatex. I needed some very nonstandard APL fonts.  See bm.pdf — also available on the Download this Blog page — to judge the effectiveness of my edits. Producing nice figure laden typeset blog documents is work but, as I will describe in the next post, producing image free eBooks is a simple and far less laborious variation on this process.

WordPress to LaTeX with Pandoc and J: LaTeX Directories (Part 2)

WordPress to LaTeX

WordPress to LaTeX

In this post I will describe the LaTeX directory structure the J script TeXfrWpxml.ijs is expecting. To convert WordPress export XML to LaTeX with this script you will have to set up similar directories.

LaTeX documents are built from *.tex[1] files. This makes LaTeX more like a compiled programming language than a word processing program. There are advantages and disadvantages to the LaTeX way. In LaTeX’s favor, the system is enormously adaptable, versatile and powerful. There is very little that LaTeX/TeX and associates cannot do.  Unfortunately, “with great power comes great responsibility.” LaTeX is demanding! You have to study LaTeX like any other programming language. It’s not for everyone but for experienced users it’s the best way to produce documents with the highest typographic standards.

LaTeX directory structure

To use LaTeX efficiently it’s wise to pick a document directory structure and stick with it. I use a simple directory layout. Each document has a root directory. The root directory used by TeXfrWpxml.ijs is:

Windows c:/pd/blog/wp2latex
Linux /home/john/pd/blog/wp2latex

I put my document specific *.tex, *.bib, *.sty and other LaTeX/TeX files in the root. To handle graphics I create an immediate subdirectory called inclusions.

c:/pd/blog/wp2latex/inclusions

The inclusions directory holds the document’s *.png, *.jpg, *.pdf, *.eps and other graphics files.  To reference files in the inclusions directory with the standard LaTeX graphicx package insert

\usepackage{color,graphicx,subfigure,sidecap}
\graphicspath{{./inclusions/}}

in your preamble. Finally, to track document changes I create a GIT repository in the root directory.

c:/pd/blog/wp2latex/.git

Self contained directories

I take care to keep my document directories self-contained. Zipping up the root and inclusions directory collects all the document’s files. This means that I sometimes have to copy files that are used in more than one document. Many LaTeX users maintain a common directory for such files but I’ve found that common directories complicate moving documents around. You’re always forgetting something in the damn common directory or you are copying a buttload of mostly irrelevant files from one big confusing common directory to another.

TeXfrWpxml.ijs files

The TeXfrWpxml.ijs script searches for these files in the root directory.

bm.tex Main LaTeX root file
bmamble.tex LaTeX preamble

bm.tex references bmtitlepage.tex.  I prefer a separate title page file; simply comment out this file if you create titles in other ways. The zip file wp2latex.zip contains a test directory in the format expected by TeXfrWpxml.ijs.  It also has a subset of my blog posts already converted to LaTeX. To get ready for WordPress to LaTeX with Pandoc and J: Using TeXfrWpxml.ijs (Part 3) download wp2latex.zip and attempt to compile bm.tex.  You might have to download a number of LaTeX packages.  Once you have successfully compiled bm.tex you are ready for the next step.


[1] LaTeX uses many other file types but key files are usually *.tex files.

WordPress to LaTeX with Pandoc and J: Prerequisites (Part 1)

There are no quick WordPress to LaTeX fixes

WordPress to LaTeX

WordPress to LaTeX

Over the next three posts I will describe how to convert WordPress’s export XML to LaTeX source code.  I know that many of you are looking for a quick WordPress to LaTeX fix; unfortunately there are no quick fixes. The two formats come from different worlds and are used in different ways.  Producing useful LaTeX source from WordPress export XML will require manual edits.  My goal here is to minimize manual edits, produce high quality LaTeX source and to outline what you will have to contend with. To get an idea of what you can expect download the LaTeX compiled version of this post.

Visual and Logical composition

WordPress and LaTeX are examples of the two basic approaches, visual and logical, taken by writing software.  Visual systems value appearance. It matters what things look like and no effort is spared to get the right look. Logical systems value content. What’s said is far more important than what it looks like. Logical systems impose order and structure and typically defer visual elements.  As you might expect there is no such thing as a pure visual or logical writing system. Successful systems use both approaches to a greater or lesser degree. Composing WordPress blog posts is roughly 35% visual and 65% logical.[1]  LaTeX composition is about 10% visual and 90% logical. The numbers do not line up; there is a basic mismatch here.

Many format X to LaTeX converters tackle this mismatch by attempting to maintain visual fidelity. This is a catastrophic error that renders the entire conversion useless.  Here’s a hint. If you’re using a predominantly logical system like LaTeX you don’t give a rodent’s posterior about visual fidelity. This method dispenses with all but the most basic of visual elements. No attempt is made to preserve fonts, type sizes, image scale, justification, hyphenation, text color and so forth.  The goal is to produce working LaTeX source that can be transformed to whatever final layout the author desires.

Prerequisite Software

I use two programs to transform WordPress export XML to LaTeX:  the J programming language and John MacFarlane’s Pandoc.  Pandoc is an excellent text mark-up to mark-up converter.  It wisely avoids attempting to convert entire complex documents and focuses on getting parts of documents right.  It does a particularly good job of converting HTML to LaTeX which is a crucial part of this process.  I use Pandoc to transform the HTML embedded in WordPress export XML CDATA elements to *.tex files and I use J to preprocess and post process Pandoc inputs and outputs and to stitch everything together into a set of LaTeX ready files.

Download Pandoc from here. I use the Windows command line version. There are Linux and Mac versions as well. Download J from here.  The easiest J install is the 32 bit Windows J 6.02 version. Other versions require additional steps to configure and deploy. If you are already a J user there is no need to install a particular system but you will need:

  1. The task library require 'task'
  2. The utility program wget.exe

Both of these components are typically part of the J distribution.

Install and check prerequisites

To continue download and install Pandoc and J and run the following tests; if you succeed you’re system is ready for WordPress to LaTeX with Pandoc and J: LaTeX Directories (Part 2).

Pandoc Test:

Download the test file: cdata.html and run Pandoc from the command line:

pandoc –o cdata.tex cdata.html

cdata.html is an example of the HTML code you find in WordPress export XML CDATA elements.  Note: required files are also available in the files sidebar in the WordPress to LaTeX directory.

J Test:

Start a J session and enter the following commands:

require 'task'

shell 'wget –help'

shell 'wget http://conceptcontrol.smugmug.com/photos/i-mNK4RHL/0/L/i-mNK4RHL-L.png'

If the shell command is properly loaded and wget.exe is found you will see help text. The second shell command downloads an image file.  Downloading post images is part of the overall conversion process.


[1] Actually this is not bad. Page layout systems are far worse. A typical layout system might be 90% visual and 10% logical making layout systems polar opposites of LaTeX.

Typesetting UTF8 APL code with the LaTeX lstlisting package

UTF8 APL characters within a LaTeX lstlisting environment. Click for *.tex source code

Typesetting APL source code has always been a pain in the ass! In the dark ages, (the 1970’s), you had to fiddle with APL type-balls and live without luxuries like lower case letters. With the advent of general outline fonts it became technically possible to render APL glyphs on standard display devices provided you:

  1. Designed your own APL font.
  2. Mapped the atomic vector of your APL to whatever encoding your font demanded.
  3. Wrote WSFULL‘s of junk transliteration functions to dump your APL objects as font encoded text.

It’s a testament to either the talent, or pig headedness of APL programmers, that many actually did this. We all hated it! We still hate it! But, like an abused spouse, we kept going back for more.  It’s our fault; if we loved APL more it would stop hitting us!

When Unicode appeared APL’ers cheered — our long ASCII nightmare was ending. The more politically astute worked to include the APL characters in the Unicode standard. Hey if Klingon is there why not APL? Everyone thought it was just a matter of time until APL vendors abandoned their nonstandard atomic vectors and fully embraced Unicode. With a few notable exceptions we are still waiting. While we wait the problem of typesetting APL source code festers.

My preferred source code listing tool is the \LaTeX lstlisting package. lstlisting works well for standard ANSI source code.  I use it for J, C#, SQL, C, XML, Ocaml, Mathematica, F#, shell scripts and \LaTeX source code, i.e. everything except APL! lstlisting is an eight bit package; it will not handle arbitrary Unicode out of the box.  I didn’t know how to get around this so I handled APL by enclosing UTF8 APL text in plain \begin{verbatim} … \end{verbatim} environments. This works for XeLaTeX and LuaLaTeX but you lose all the lstlisting goodies. Then I saw an interesting tex.stackexchange.com posting about The ‘listings’ package and UTF-8. One solution to the post’s “French ligature problem” showed how to force Unicode down lstlisting‘s throat. I wondered if the same method would work for APL. It turns out that it does!

If you insert the following snippet of TeX code in your document preamble LuaLaTeX and XeLaTeX will properly process UTF8 APL text in lstlisting environments. You will need to download and install the APL385 Unicode font if it’s not on your system.  A test \LaTeX document illustrating this hack is available here. The compiled PDF is available here. As always these files can be accessed in the files sidebar.

% set lstlisting to accept UTF8 APL text
\makeatletter
\lst@InputCatcodes
\def\lst@DefEC{%
 \lst@CCECUse \lst@ProcessLetter
  ^^80^^81^^82^^83^^84^^85^^86^^87^^88^^89^^8a^^8b^^8c^^8d^^8e^^8f%
  ^^90^^91^^92^^93^^94^^95^^96^^97^^98^^99^^9a^^9b^^9c^^9d^^9e^^9f%
  ^^a0^^a1^^a2^^a3^^a4^^a5^^a6^^a7^^a8^^a9^^aa^^ab^^ac^^ad^^ae^^af%
  ^^b0^^b1^^b2^^b3^^b4^^b5^^b6^^b7^^b8^^b9^^ba^^bb^^bc^^bd^^be^^bf%
  ^^c0^^c1^^c2^^c3^^c4^^c5^^c6^^c7^^c8^^c9^^ca^^cb^^cc^^cd^^ce^^cf%
  ^^d0^^d1^^d2^^d3^^d4^^d5^^d6^^d7^^d8^^d9^^da^^db^^dc^^dd^^de^^df%
  ^^e0^^e1^^e2^^e3^^e4^^e5^^e6^^e7^^e8^^e9^^ea^^eb^^ec^^ed^^ee^^ef%
  ^^f0^^f1^^f2^^f3^^f4^^f5^^f6^^f7^^f8^^f9^^fa^^fb^^fc^^fd^^fe^^ff%
  ^^^^20ac^^^^0153^^^^0152%
  ^^^^20a7^^^^2190^^^^2191^^^^2192^^^^2193^^^^2206^^^^2207^^^^220a%
  ^^^^2218^^^^2228^^^^2229^^^^222a^^^^2235^^^^223c^^^^2260^^^^2261%
  ^^^^2262^^^^2264^^^^2265^^^^2282^^^^2283^^^^2296^^^^22a2^^^^22a3%
  ^^^^22a4^^^^22a5^^^^22c4^^^^2308^^^^230a^^^^2336^^^^2337^^^^2339%
  ^^^^233b^^^^233d^^^^233f^^^^2340^^^^2342^^^^2347^^^^2348^^^^2349%
  ^^^^234b^^^^234e^^^^2350^^^^2352^^^^2355^^^^2357^^^^2359^^^^235d%
  ^^^^235e^^^^235f^^^^2361^^^^2362^^^^2363^^^^2364^^^^2365^^^^2368%
  ^^^^236a^^^^236b^^^^236c^^^^2371^^^^2372^^^^2373^^^^2374^^^^2375%
  ^^^^2377^^^^2378^^^^237a^^^^2395^^^^25af^^^^25ca^^^^25cb%
  ^^00}
\lst@RestoreCatcodes
\makeatother

More on Kindle Oriented LaTeX

I’ve been compiling \LaTeX PDFs for the Kindle. If you like \LaTeX typefaces, especially mathematical fonts, you’ll love how they render on the Kindle. It’s a good thing because you won’t like the Kindle’s cramped page dimensions. For simple flow-able text this isn’t a big deal but for complex \LaTeX documents it is!

There are two basic \LaTeX \Longrightarrow Kindle  workflows.

  1. Convert your \LaTeX to HTML and then convert the HTML to mobi.
  2. Compile your \LaTeX for Kindle page dimensions.

For simple math and figure free documents mobi is the best choice because it’s a native Kindle format. You will be able to re-flow text and change font sizes on the fly. There are many \LaTeX to HTML converters. This is a good summary of your options. You can also find a variety of HTML to mobi converters. I’ve used Auto Kindle; it’s slow but produces decent results.

Compiling \LaTeX for Kindle page dimensions is more work. First decide what works best for your document: landscape or portrait. Portrait is the Kindle default but I’ve found that landscape is better for math and figure rich documents. You can flip back and forth between landscape and portrait on the Kindle but it will not re-paginate PDFs. Of course with mobi this is no problemo!

After choosing a basic layout expunge all hard-coded lengths from your source *.tex files. Replace all fixed lengths with relative page lengths. For example, 4in might become 0.75\textwidth. If you have hundreds of figures and images to adjust write a little program to replace fixed lengths. I did this while preparing a Kindle version of Hilbert’s Foundations of Geometry.

The next hurdle to overcome is the Kindle’s blase attitude about length units. \LaTeX is extremely precise: an inch is an inch to six decimals. This is not the case on the Kindle! You will have to load your PDFs on the Kindle and inspect margins for text overflows. Be prepared for a few rounds of page dimension tweaking! For more details about preparing \LaTeX source check out LaTeX Options for Kindle.

Finally, after you have compiled your PDF and loaded it on your Kindle, there are some Kindle options you should set to optimize your PDF reading experience. My next post will walk you through setting these options.

The following *.tex file loads packages that are useful for Kindle sizing. It also shows how to print out \LaTeX dimensions with the printlen package.

% A simple test document that displays some packages and settings
% that are useful when compiling LaTeXe documents for the Kindle.
% Compile with pdflatex or xelatex.
%
% Tested on MikTeX 2.9
% July 22, 2011

\documentclass[12pt]{article}

% included graphics in immediate subdirectory
\usepackage{graphicx}
\graphicspath{{./image/}}

% extended coloring
\usepackage[usenames,dvipsnames]{color}

% hyperref link colors are chosen to display
% well on Kindle monochrome devices
\usepackage[colorlinks, linkcolor=OliveGreen, urlcolor=blue,
            pdfauthor={your name}, pdftitle={your title},
            pdfsubject={your subject},
            pdfcreator={MikTeX+LaTeXe with hyperref package},
            pdfkeywords={your,key,words},
            ]{hyperref}

\usepackage{breqn}         % automatic equation breaking
\usepackage{microtype}     % microtypography, reduces hyphenation

% kindle page geometry (no page numbers)
%\usepackage[papersize={3.6in,4.8in},hmargin=0.1in,vmargin={0.1in,0.1in}]{geometry}

% portrait kindle page geometry space reserved for page numbers
\usepackage[papersize={3.6in,4.8in},hmargin=0.1in,vmargin={0.1in,0.255in}]{geometry}

% landscape geometry
%\usepackage[papersize={4.8in,3.6in},hmargin={0.1in,0.18},vmargin={0.1in,0.255in}]{geometry}

% headers and footers
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhead{}            % clear page header
\fancyfoot{}            % clear page footer

\setlength{\abovecaptionskip}{2pt} % space above captions
\setlength{\belowcaptionskip}{0pt} % space below captions
\setlength{\textfloatsep}{2pt}     % space between last top float or first bottom float and the text
\setlength{\floatsep}{2pt}         % space left between floats
\setlength{\intextsep}{2pt}        % space left on top and bottom of an in-text float

% print LaTeX dimensions
\usepackage{printlen}

% reduces footer text separation adjusted for page numbers
\setlength{\footskip}{14pt}

% scales down page number font size if document is at 12pt -> page numbers 10 pt
\renewcommand*{\thepage}{\footnotesize\arabic{page}}

\begin{document}

The \verb|\textwidth| is \printlength{\textwidth} which is also
\uselengthunit{in}\printlength{\textwidth} and
\uselengthunit{mm}\printlength{\textwidth}.

\uselengthunit{pt}
The \verb|\textheight| is \printlength{\textheight} which is also
\uselengthunit{in}\printlength{\textheight} and
\uselengthunit{mm}\printlength{\textheight}.

\end{document}