# More J Pandoc Syntax HighLighting

Syntax highlighting is essential for blogging program code. Many blog hosts recognize this and provide tools for highlighting programming languages. WordPress.com (this host) has a nifty highlighting tool that handles dozens of mainstream programming languages. Unfortunately, one of my favorite programming languages, J, (yes it’s a single letter name), is way out of the mainstream and is not supported.

There are a few ways to deal with this problem.

1. Eschew J highlighting.
2. Upgrade1 your WordPress.com subscription and install custom syntax highlighters that can handle arbitrary language definitions.
3. Find another blog host that freely supports custom highlighters.
4. Roll your own or customize an existing highlighter.

A few years ago I went with the fourth option and hacked the superb open-source tool pandoc. The grim details are described in this blog post. My hack produced a customized version of pandoc with J highlighting. I still use my hacked version and I’d probably stick with it if current pandoc versions had not introduced must-have features like converting Jupyter notebooks to Markdown, PDF, LaTeX and HTML. Jupyter is my default thinking-things-through programming environment. I’ve even taken to blogging with Jupyter notebooks. If you write and explain code you owe it to yourself to give Jupyter a try.

Unwilling to eschew J highlighting or forgo Jupyter I was on the verge of re-hacking pandoc when I read the current pandoc (version 2.9.1.1) documentation and saw that J is now officially supported by pandoc. You can verify this with the shell commands.

pandoc --version
pandoc --list-highlight-languages

The pandoc developers made my day! I felt like Wayne meeting a rock star.

Highlighting J is now a simple matter of placing J code in markdown blocks like:

~~~~ { .j }
... code code code ...
~~~~

and issuing shell commands like:

pandoc --highlight-style tango --metadata title="J test" -s jpdh.md -o jpdh.html

The previous command generated the HTML of this post which I pasted into the WordPress.com Classic Editor. Not only do I get J code highlighting on the cheap I also get footnotes which, for god freaking sakes,2 are not supported by the new WordPress block editor for low budget blogs.

The source markdown used for this post is available here – enjoy!

NB. Some J code I am currently using to test TAB

NB. read TAB delimited table files as symbols - see long document
readtd2s=:[: s:@<;._2&> (9{a.) ,&.>~ [: <;._2 [: (] , ((10{a.)"_ = {:) }. (10{a.)"_) (13{a.) -.~ 1!:1&(]`<@.(32&>@(3!:0)))

tdkeytest=:4 : 0

NB.*tdkeytest v-- test natural key columns  of TAB delimited text
NB. files.
NB.
NB. Many of the raw tables of the ETL process depend on  compound
NB. primary keys. This verb applies a basic  test of primary  key
NB. columns. Passing this test  makes it very  likely  the  table
NB. will load  without key constraint  violations.  Failures  are
NB. still possible depending  on how  text  data is converted  to
NB. other  datatypes. Failure of this test indicates  a very high
NB. chance of key constraint violations.
NB.
NB. dyad:  il =. blclColnames tdkeytest clFile
NB.
NB.   f0=. 'C:\temp\dailytsv\raw_ST_BU.txt'
NB.   k0=. ;:'BuId XMLFileDate'
NB.   k0 tdkeytest f0
NB.
NB.   f1=. 'C:\temp\dailytsv\raw_ST_Item.txt'
NB.   k1=. ;:'BuId ItemId XMLFileDate'
NB.   k1 tdkeytest f1

NB. key column positions
'header key column(s) missing' assert -.(#h) e. p=. h i. s: x

c=. #d=. }. d
b=. ~:p {"1 d

NB. columns unique, rowcnt, nonunique rowcnt
if. r=. c = +/b do.
r , c , 0
else.
NB. there are duplicates show some sorted duplicate keys
k=. p {"1 d
d=. d {~ I. k e. k #~ -.b
d=. (/: p {"1 d) { d
b=. ~:p {"1 d
m=. +/b
smoutput (":m),' duplicate key blocks'
n=. DUPSHOW <. m
smoutput 'first ',(":n),' duplicate row key blocks'
smoutput (<p { h) ,&.> n {. ,. b <;.1 p {"1 d
r , c , #d
end.
)

1. The pay more option is always available.
2. WordPress.com is beginning to remind me of Adobe. Stop taking away longstanding features when upgrading!

# Turning JOD Dump Script Tricks

Have you ever wondered how extremely prolific bloggers do it? How is it possible to crank out thousands of blog entries per year without creating a giant stinking pile of mediocre doo-doo? Like most deep medium mysteries it’s not very deep and there are no mysteries. The spewers, people who post like teenage girls tweet, use two basic strategies:

1. Multiple authors: The heroic image of the lone blogger waging holy war against a sea of Internet idiocy is largely a myth. Many popular prolific blogs are the work of many hands. The editor at Analyze the Data not the Drivel eschews this tactic. Apparently he’s an incontinent and argumentative prima donna that sane people steer clear of.
2. Content recycling: In elementary school this was called copying. Now that we’re all grown up we use terms like, “excerpting”, “abstracting”, and my favorite “re-purposing.” The basic idea is simple. Take something you’ve written elsewhere and repackage it as something new. Hey, all the cool kids are doing it!

The following is a slightly edited new appendix I have just added to the JOD manual. I am working to properly publish the JOD manual mostly so I can say that I’ve written a legitimate, albeit strange and queer, book.

I created this post by running the $\LaTeX$ code of the manual appendix through the excellent utility pandoc, tweaking the resulting markdown, and then using pandoc again to generate html for this blog. pandoc is a great “re-purposing” tool!

Finally, re-purposing is not entirely cynical. The act of moving material from one medium to another exposes problems. I found a few editing errors while creating this post that eluded my $\LaTeX$ eyes. If you find more this is your chance to tell me what a moron I am.

# Turning JOD Dump Script Tricks

Dump script generation is my favorite JOD feature. Dump scripts serialize JOD dictionaries; they are mainly used to back up dictionaries and interact with version control systems. However, dump scripts are general J scripts and can do much more! Maintaining a stable of healthy JOD dictionaries is easier if you can turn a few dump script tricks.1

1. Flattening reference paths: Open JOD dictionaries define a reference path. For example, if you open the following dictionaries:
NB. open four dictionaries
od ;:'smugdev smug image utils'
+-+-----------------------+-------+----+-----+-----+
|1|opened (ro/ro/ro/ro) ->|smugdev|smug|image|utils|
+-+-----------------------+-------+----+-----+-----+

the reference path is /smugdev/smug/image/utils.

When objects are retrieved each dictionary on the path is searched in reference path order. If there are no compelling reasons to maintain separate dictionaries you can improve JOD retrieval performance and simplify dictionary maintenance by flattening all or part of the path.

To flatten the reference path do:

NB. reopen the first three dictionaries on the path
od ;:'smugdev smug image' [ 3 od ''
+-+--------------------+-------+----+-----+
|1|opened (ro/ro/ro) ->|smugdev|smug|image|
+-+--------------------+-------+----+-----+

NB. dump to a temporary file (df)
df=: {: showpass make jpath '~jodtemp/smugflat.ijs'
+-+---------------------------+-----------------------+
|1|object(s) on path dumped ->|c:/jodtemp/smugflat.ijs|
+-+---------------------------+-----------------------+

NB. create a new flat dictionary
newd 'smugflat';jpath '~jodtemp/smugflat' [ 3 od ''
+-+---------------------+--------+--------------------+
|1|dictionary created ->|smugflat|c:/jodtemp/smugflat/|
+-+---------------------+--------+--------------------+

NB. open the flat dictionary and (utils)
od ;:'smugflat utils'
+-+-----------------+--------+-----+
|1|opened (rw/ro) ->|smugflat|utils|
+-+-----------------+--------+-----+

NB. reload dump script ... output not shown ...
0!:0 df

The collapsed path /smugflat/utils will return the same objects as the longer path. It is important to understand that the collapsed dictionary smugflat does not necessarily contain the same objects found in the three original dictionaries smugdev, smug and image. If objects with the same name exist in the original dictionaries only the first one found will be in the collapsed dictionary.

2. Merging dictionaries: If two dictionaries contain no overlapping objects it might make sense to merge them. This is easily achieved with dump scripts. To merge two or more dictionaries do:
NB. open and dump first dictionary
od 'dict0' [ 3 od ''
+-+--------------+-----+
|1|opened (rw) ->|dict0|
+-+--------------+-----+
df0=: {: showpass make jpath '~jodtemp/dict0.ijs'
+-+---------------------------+--------------------+
|1|object(s) on path dumped ->|c:/jodtemp/dict0.ijs|
+-+---------------------------+--------------------+

NB. open and dump second dictionary
od 'dict1' [ 3 od ''
+-+--------------+-----+
|1|opened (rw) ->|dict1|
+-+--------------+-----+
df1=: {: showpass make jpath '~jodtemp/dict1.ijs'
+-+---------------------------+--------------------+
|1|object(s) on path dumped ->|c:/jodtemp/dict1.ijs|
+-+---------------------------+--------------------+

NB. create new merge dictionary
newd 'mergedict';jpath '~jodtemp/mergedict' [ 3 od ''
+-+---------------------+---------+---------------------+
|1|dictionary created ->|mergedict|c:/jodtemp/mergedict/|
+-+---------------------+---------+---------------------+

NB. open merge dictionary and run dump scripts
od 'mergedict'
+-+--------------+---------+
|1|opened (rw) ->|mergedict|
+-+--------------+---------+

NB. reload dump scripts ... output not shown ...
0!:0 df0
0!:0 df1

Be careful when merging dictionaries. If there are common objects the last object loaded is the one retained in the merged dictionary.

3. Updating master file parameters: When a new parameter is added to jodparms.ijs it will not be available in existing dictionaries. With dump scripts you can rebuild existing dictionaries and update parameters. To rebuild a dictionary with new or custom parameters do:
NB. save current dictionary registrations
(toHOST ; 1 { 5 od '') write_ajod_ jpath '~temp/jodregister.ijs'

NB. open dictionary requiring parameter update
od 'dict0' [ 3 od ''
+-+--------------+-----+
|1|opened (rw) ->|dict0|
+-+--------------+-----+

NB. dump dictionary and close
df=: {: showpass make jpath '~jodtemp/dict0.ijs'
+-+---------------------------+--------------------+
|1|object(s) on path dumped ->|c:/jodtemp/dict0.ijs|
+-+---------------------------+--------------------+

3 od ''
+-+---------+-----+
|1|closed ->|dict0|
+-+---------+-----+

NB. erase master file and JOD object id file
1
1

NB. recycle JOD - this recreates (jmaster.ijf) and (jod.ijn)
NB. using the new dictionary parameters defined in (jodparms.ijs)
(jodon , jodoff) 1
1 1

NB. re-register dictionaries

NB. create a new dictionary - it will have the new parameters
newd 'dict0new';jpath '~jodtemp/dict0new' [ 3 od ''
+-+---------------------+---------+-------------------+
|1|dictionary created ->|dict0new|c:/jodtemp/dict0new/|
+-+---------------------+---------+-------------------+

od 'dict0new'
+-+--------------+--------+
|1|opened (rw) ->|dict0new|
+-+--------------+--------+

NB. reload dump script ... output not shown ...
0!:0 df

Before executing complex dump script procedures back up your JOD dictionary folders and play with dump scripts on test dictionaries. Dump scripts are essential JOD dictionary maintenance tools but like most powerful tools they must be used with care.

1. Spicing up one’s rhetoric with a double entendre like “turning tricks” may be construed as a microaggression. The point of colored language is to memorably make a point. You are unlikely to forget turning dump script tricks.

# Semi-Literate JOD

Click to view jodliterate.pdf

Despite seven decades of programming experience documenting software remains a challenge. There are many reasons for this sorry state of affairs with the most important being that programmers simply do not agree on the need for documentation. As pathetic as this sounds it’s not without merit. It all depends on what you call “documentation.”

Writing technical documents for management, marketing or users usually results in excruciating rounds of Dilbertian critiques. Everyone understands your code better than you do. If you provide too much detail, you get complaints. If you use unfamiliar words, you get complaints. If you point out limitations, assumptions or caveats, you get complaints. If you assume basic 8th grade reading levels, you get complaints. If you use nonstandard fonts or unauthorized style templates, you get complaints. No wonder many programmers hate “documentation” and blow off the entire problem by making ludicrous claims about “self documenting code.” The self documenting cabal may have fooled management but they’re not fooling the rest of us. The need for illuminating program documentation is as pressing today as it was for ENIAC coders in the 1940’s and, when in it comes to illuminating documentation, the best overall approach was pioneered by Donald Knuth over twenty-five years ago and goes by the moniker literate programming.

Providing basic literate programming support in JOD has been on my to-do list for ages. I’ve held off until recently because I have never been happy with my mark up options. JOD directly supports simple J scriptdoc compatible leading comment block formatting. For example many of my J verbs start with a comment block like:

betweenstrs=:4 : 0

NB.*betweenstrs v-- select sublists between  nonnested delimiters
NB.
NB. dyad:  blcl =. (clStart;clEnd) betweenstrs cl
NB.        blnl =. (nlStart;nlEnd) betweenstrs nl
NB.
NB.   ('start';'end') betweenstrs 'start yada yada end boo hoo start ahh end'
NB.
NB.   NB. also applies to numeric delimiters
NB.   (1 1;2 2) betweenstrs 1 1 66 666 2 2 7 87 1 1 0 2 2

's e'=. x
llst=. ((-#s) (|.!.0) s E. y) +. e E. y
)

Even if you can’t spell J I bet you have a good idea about what this “program” does and, if you doubt my claims, I’ve left you with some examples to try the next time you find yourself in J. Stupid comments may be for losers but telling comments, especially example laden ones, really help! And, if you really find comments distracting, JOD has a deal for you!

;1{compj 'betweenstrs'
betweenstrs=:4 :0
's e'=.x
a=.((-#s )(|.!.0)s E.y)+.e E.y
b=.~:/\a
(b#a)<;.1 b#y
)

compj purges pesky comments and reduces tedious long identifiers like mask to pure compact J. Getting rid of comments is trivial, putting them back in: not so much! JOD’s simple comment block formatting has been very effective but it’s hardly literate programming.

Literate programming requires more muscle. Knuth used his own TeX. TeX and LaTeX are certainly up to the job, as are many HTML and XML approaches. Unfortunately, all these mark up formats suffer from “distracting taggyness.” I can tolerate LaTeX but HTML and XML drives me nuts. Yes, there are perfectly fine editors for all these formats, but remember, we are inserting the resulting text into code that we will be looking at for the rest of our miserable coding lives! We need a mark up format that’s stable, readable, versatile, easy to use and, this is very important, easy to ignore! Markdown is such a format. It’s almost ideal for program comments and is capable of much more. I’ve started using markdown in JOD and it’s already paying its way.

jodliterate.ijs is a J utility script that can generate semi-literate LaTeX documents directly from JOD groups. It uses a version of pandoc with J syntax highlighting, see Pandoc based J Syntax Highlighting for details. I consider jodliterate semi-literate because it’s completely at the mercy of the programmer. If you don’t store coherent markdown text fragments in JOD all you get is a nice syntax highlighted listing. But, if you actually write about your group, jodliterate can produce essential documents. jodliterate.pdf is an example of this tool being used on itself. Self reference always makes an excellent test case. jodliterate will be included in the next JOD release. Until then you can download the J script from this directory. As always referenced files are available in the files sidebar. Enjoy!

# Pandoc based J Syntax Highlighting

John MacFarlane’s excellent command line utility Pandoc is a Haskell program that converts to and from various text markup languages. Pandoc’s help option lists its supported input and output formats.

The following examples are Linux bash shell commands. Windows shell commands are identical.

$pandoc --help pandoc [OPTIONS] [FILES] Input formats: native, json, markdown, markdown+lhs, rst, rst+lhs, docbook, textile, html, latex, latex+lhs Output formats: native, json, html, html5, html+lhs, html5+lhs, s5, slidy, slideous, dzslides, docbook, opendocument, latex, latex+lhs, beamer, beamer+lhs, context, texinfo, man, markdown, markdown+lhs, plain, rst, rst+lhs, mediawiki, textile, rtf, org, asciidoc, odt, docx, epub Some Pandoc conversions are better than others. Pandoc does a better job of turning markdown into LaTeX than LaTeX into markdown. It’s also better at converting HTML into LaTeX than LaTeX into HTML. Pandoc works best when converting markdown, the simplest of its inputs, to other formats. In fact Pandoc does such a good job of converting markdown to HTML, HTML+MathJax, LaTeX or PDF that many writers are now saving their source documents as markdown text knowing they can easily produce other formats as needed. As handy as Pandoc’s markup conversions are this nifty tool also supports syntax highlighting for over a hundred programming languages. Unfortunately, my favorite language J is not on Pandoc’s list of highlighted languages. [1] Where have I run into this problem before? Luckily for me Pandoc is an open source tool and Pandoc’s author has made it easy to add new highlight languages. Pandoc is a Haskell program. I’ve been aware of Haskell’s existence for years but until I decided to take on this specialized Pandoc hack I had never studied or used the language. Usually when you set out to modify a large program in an unfamiliar programming language you’re in for what can only be described as an f’ing educational experience. It’s a testament to the quality of the Haskell’s global libraries and standard tools that a complete Haskell novice can effectively tweak large Haskell programs. Here’s what you have to do. 1. Install the Haskell Platform. The Haskell Platform is available for all the usual suspects. I’ve used both the Windows and Linux versions. I almost installed the Mac version on my wife’s Mac but resisted the urge. 2. Get with the Cabal. Cabal is the main Haskell package distribution and build utility. Cabal comes with the Haskell Platform and is easily accessed from the command line. Type cabal --help in your favorite shell to view the program’s options. 3. Spend sometime playing with Hackage. Hackage contains a large set of Haskell packages including all the source code required to build Pandoc. After installing the Haskell Platform and familiarizing yourself with Cabal try building Pandoc. This will thoroughly exercise your Haskell system. Instructions for building Haskell packages are here. After reading the package build instructions run the following in your command shell:$ cabal update
$cabal install pandoc This will download, compile and install a number of Haskell packages. Where Cabal puts the packages depends on your operating system. Cabal saves Linux packages in a hidden local directory. On my machine they ended up in: /home/john/.cabal/lib If you managed to build Pandoc you’re now ready to add a new highlighting language. Pandoc uses the highlighting-kate package for highlighting. highlighting-kate works by reading a directory of Kate editor xml language regex based definition files and generating custom language parsers. We want to generate a custom J parser so we need to download highlighting-kate source and add a Kate xml definition file for J. You can find such a J Kate file on the J Wiki here. Download this file by cutting and pasting and save it as j.xml. Now do the following. 1. Run the Pandoc version command pandoc --version of the Pandoc you just built to determine the version of the highlighting-kate package you need. 2. Use Cabal to unpack the required highlighting-kate package. This downloads the required package and creates a temporary subdirectory in your current directory that contains package source code.$ cabal unpack highlighting-kate-0.5.3.2
Unpacking to highlighting-kate-0.5.3.2/
3. Move into the temporary subdirectory and copy the Kate j.xml file to the package’s xml subdirectory.
$cd highlighting-kate-0.5.3.2$ cp ~/pd/blog/j.xml ~/temp/highlighting-kate-0.5.3.2/xml/j.xml
4. Configure the package.
$cabal configure Resolving dependencies... Configuring highlighting-kate-0.5.3.2... 5. Build the highlighting-kate package.$ cabal build
Resolving dependencies...
... (omitted) ...
6. If highlighting-kate builds without problems run the command.
$runhaskell ParseSyntaxFiles.hs xml Writing Text/Highlighting/Kate/Syntax/SqlPostgresql.hs Writing Text/Highlighting/Kate/Syntax/Scala.hs ... (omitted) ... ParseSyntaxFiles scans the package’s xml subdirectory and generates language specific parsers. If all goes well you will find J.hs in this directory. ~/temp/highlighting-kate-0.5.3.2/Text/Highlighting/Kate/Syntax J.hs, like all the files referred to in this post, are available in the files sidebar in the Haskell/Pandoc subdirectory. 7. Now rebuild the highlighting-kate package. This compiles your new J.hs parser file.$ cabal build
Resolving dependencies...
... (omitted) ...
8. After rebuilding the package run the Cabal copy command to put the modified package in the expected library location.
$cabal copy Installing library in /home/john/.cabal/lib/highlighting-kate-0.5.3.2/ghc-7.4.1 Now that the highlighting library is up to date we have to rebuild Pandoc. To do this mirror the steps taken to download and build the highlighting package. 1. Use Cabal to unpack the Pandoc package.$ cd ~/temp
$cabal unpack pandoc-1.9.4.2 Unpacking to pandoc-1.9.4.2/ 2. Switch to the Pandoc subdirectory and configure the package.$ cabal configure
Resolving dependencies...
[1 of 1] Compiling Main      ( Setup.hs, dist/setup/Main.o )
... (omitted) ...
3. Rebuild Pandoc.
$cabal build Building pandoc-1.9.4.2... Preprocessing executable 'pandoc' for pandoc-1.9.4.2... ... (omitted) ... If all goes well a Pandoc executable will be written to this subdirectory. ~/temp/pandoc-1.9.4.2/dist/build/pandoc 4. You can check the new executable by running pandoc --version. The result should display J in the list of supported languages. Now that we have a Pandoc that can highlight J we’re almost ready to blog gaudy J code. However before doing this we need to install some custom CSS. Custom CSS is not available on free WordPress.com blogs. To apply custom coloring schemes get the custom package and learn how to use WordPress’s custom CSS editor. As daunting as this sounds it’s no problemo for my limited purposes. To enable tango style Pandoc syntax highlighting on your WordPress blog paste tango.css into the custom CSS editor, check the “Add my CSS to CSS stylesheet” button and then press the “Save Stylesheet” button. Now your WordPress blog will be sensitive to the HTML span tags generated by Pandoc. To show that all this hacking works as intended you can check out the Pandoc generated versions of this blog post. I’ve posted the original markdown source with PDF, LaTeX and HTML versions. All these files are available via the files sidebar. You can generate the HTML version with the command:$ pandoc -s --highlight-style=tango PJHighlight.markdown -o PJHighlight.html

To get other versions simply change the file extension of the output -o file.

Bonebridge puzzle in MYST IV
Click for “Haven Age” Walkthrough

Finally we are ready to post syntax highlighted J code. The following J verb bonebridge generates all “likely” lock combinations for the MYST IV Bonebridge puzzle in Pandoc’s tango style. At one time I was a big fan of MYST computer games. I always enjoyed being lost in a beautiful puzzle which, if you discard the beautiful bit, is a pretty accurate description of my programmer day job.

bonebridge=:3 : 0

NB.*bonebridge  v--  lists  totem  symbol  permutations for  bone
NB. bridge.
NB.
NB. The  solution to  this MYST IV puzzle is similiar to the book
NB. shelf puzzle in Tomanha but requires far more  exploration of
NB. the age.
NB.
NB. You are confronted with  5  bones on the lock.  All the bones
NB. move independently. You can see the settings for 4 bones. One
NB. bone  has a  broken display.  The four  visible bones  have 8
NB. symbols on them in the  same order.  The  5th bone also has 8
NB. symbols and you can "safely" infer they are in the same order
NB. as the visible bones.
NB.
NB. Four  bone  symbols   match  symbols  found  on  totem  poles
NB. distributed around the  age. There is a  5th  totem pole  but
NB. fruit eating mangrees  obscure  the  totem symbol and  I have
NB. never  seen it.  The  totem  poles are  associated  with  age
NB. animals. In addition to the totem poles  there is  a chart in
NB. the  mangree  observation  hut  that  displays  a  triangular
NB. pattern  of paw  prints.  The  paw  prints  define an  animal
NB. ordering. The order  seems to be how  dangerous a  particular
NB. animal is;  big scary animals  are at the top and vegetarians
NB. are at the bottom.
NB.
NB. Putting the clues together you infer:
NB.
NB. a)  the  bridge  combination  is  some  permutation  of  five
NB. different symbols
NB.
NB. b) two possible symbol orders are given by the paw chart
NB.
NB. c) you know 5 symbols and the 4th is one of the remaining 4
NB.
NB. If this is  the  case  the number of  possible  lock settings
NB. shrinks from 32768 to the ones listed by this verb.
NB.
NB.
NB.   bonebridge 0

NB. known in paw order
known=.    s: ' square triangle hourglass yingyang'
unknown=.  s: ' clover cross xx yy'

NB. all possible lock permutations
settings=. ~. 5 {."1 tapl known,unknown
assert. ((!8)%!8-5) = #settings

NB. possible ordering - we don't know
NB. what the fifth symbol is but it
NB. occurs in the 3rd slot
order=. 8#s:<''
order=. known (0 1 6 7)} order
order=. unknown (2 3 4 5)} order

NB. keep unknown only in 3rd slot
settings=. settings #~ -. +./"1 (0 1 3 4{"1 settings) e. unknown
settings=. settings #~ (2 {"1 settings) e. unknown

srsm=.  1 : '*./"1 u/&> 2 <\"1 y'

NB. retain strictly increasing and strictly decreasing rows
)

[1] J has its own syntax highlighting tools but they are not part of a document generation system. Pandoc’s highlighters elegantly feed into many output formats making them far more useful.

If you have worked through the exhausting procedure of converting your blog to LaTeX: see posts (1), (2) and (3), you will be glad to hear that turning your blog into an image free eBook is almost effortless. In this post I will describe how I convert my blog into EPUB and MOBI eBooks.

#### eBooks how the cool kids are reading

eBook readers like Kindles, Nooks, iPads and many cell phones are optimized for plain old prose. They excel at displaying reflowable text in a variety of fonts, sizes and styles. One eBook reader feature, dear to my old fart eyes, is the ability to increase the size of text.  All eBooks are potentially large print editions. There are other advantages: most readers can store hundreds, if not thousands of books, making them portable libraries. It’s now technically possible to hand a kindergarten student a little tablet that holds every single book he will use from preschool to graduate school. The only obstacle is the rapacious textbook industry and their equally rapacious eBook publishing enablers. But fear not open source man will save the day. The days of overpriced digital goods are over! I will never pay more than a few bucks for an eBook because I can make my own and so can you! Let’s get together and kill off another industry that so has it coming!

#### PDFs, EPUBs and MOBIs

Native eBook file formats like EPUB and MOBI do not handle complex page layouts well. If your document contains a lot of mathematics, figures and well placed illustrations stick with PDF workflows.[1] You will save yourself and your readers a lot of grief.  But, if your document is a prose masterpiece, a veritable great American novel, then “publishing” it as an EPUB or MOBI is great way to target eBook readers. EPUBs and MOBIs can be compiled from many sources.  I start with the LaTeX files I created for the PDF version of this blog because I hate doing the same boring task twice. By far the most time-consuming part of converting WordPress export XML to LaTeX is editing the pandoc generated *.tex files to resolve figures and fix odd run-together-words and paragraphs. To preserve these edits I use pandoc to convert my edited *.tex to *.markdown files.

#### Markdown

Markdown is a very simple text oriented format. A markdown file is completely readable exactly the way it is. All you need is a text editor. Even text editors are overkill. You could compose markdown with early 20th century mechanical typewriters; it’s a low tech format for the ages: perfect for prose.

The J verb MarkdownFrLatex [2] calls pandoc and converts my *.tex files to *.markdown. I place my markdown in the directory

c:/pd/blog/wp2epub

and to track changes to my markdown files I GIT this directory. MarkdownFrLatex strips out image inclusions and removes typographic flourishes.  When it succeeds it writes a simple markdown file and when it fails it writes a *.baddown file. Baddown files are *.tex files that contain lstlistings and complex figure environments that are best resolved with manual edits. After removing such problematic LaTeX environments the J verb FixBaddown calls pandoc and turns baddown files into markdown files.

#### Generating EPUB and MOBI files

When the conversion to markdown is complete I run MainMarkdown to mash all my files into one large markdown file with an eBook header. The eBook header for this blog is:

% Analyze the Data not the Drivel
% John D. Baker

The first few lines of the consolidated bm.markdown file are:

% Analyze the Data not the Drivel
% John D. Baker

#[What’s In it for

-------------------------------------------------------------------------------------------------

*Posted: 05 Sep 2009 22:44:50*

count well north of one hundred million. If only 0.5% of their users are
active that’s 500,000 *concurrent users.* How many expensive servers
does it take to support such a load? .....

Generating an EPUB from bm.markdown is a simple matter of opening up your favorite command line shell and issuing the pandoc command:

pandoc -S --epub-cover-image=bmcover.jpg -o bm.epub bm.markdown

You can read the resulting EPUB file bm.epub on any EPUB eBook reader. Here’s a screen shot of bm.epub on my iPhone.

The last step converts bm.epub to bm.mobi. MOBI is a native Kindle format. Pandoc can generate MOBI from bm.markdown but it inexplicably omits a table of contents. No problemo:  I use Calibre to convert bm.epub to bm.mobi. Calibre properly converts the embedded EPUB table of contents to MOBI.  Here’s bm.mobi on a Kindle.

[1] LaTeX is usually compiled to PDF making it one of hundreds of PDF workflows.

[2] All the J verbs referenced in this post are in the script TeXfrWpxml.ijs

# WordPress to LaTeX with Pandoc and J: Using TeXfrWpxml.ijs (Part 3)

WordPress to LaTeX

In this post I will describe how to use the J script TeXfrWpxml.ijs to generate LaTeX source from WordPress export XML.  I am assuming you have worked through (Part 1) and (Part 2) and have:

1. Successfully installed and tested Pandoc.
2. Installed and tested a version of J.
3. Set up appropriate directories (Part 2).
4. Know how to use LaTeX.

Item #4 is a big if.  Inexperienced LaTeX users will probably not enjoy a lot of success with this procedure as the source generated by TeXfrWpxml.ijs requires manual edits to produce good results.  However, if you’re not a LaTeX guru, do not get discouraged. It’s not difficult to create blog documents like bm.pdf.

How to download WordPress export XML is described here.  Basically you go to your blog’s dashboard, select Tools, choose Export  and select the All content option.

Tools > Export > All Content

c:/pd/blog/wordpress/analyzethedatanotthedrivel.wordpress.xml

Download TeXfrWpxml.ijs and remember where you save it.  I put this script here.

c:/pd/blog/TeXfrWpxml.ijs

#### Step 3: start J and load TeXfrWpxml.ijs

TeXfrWpxml.ijs was generated from JOD dictionaries. With JOD it’s easy to capture root word dependencies and produce complete standalone scripts. TeXfrWpxml.ijs needs only the standard J load profile to run.  It does not require any libraries or external references and should run on all Windows and Linux versions of J after 6.01.  Loading this script is a simple matter of executing:

The following shows this script running in a J 7.01 console. The console is the most stripped down J runtime.

#### Step 4: review directories and necessary LaTeX files

The conversion script assumes proper directories are available up: see Part 2. The first time you run TeXfrWpxml.ijs it’s a good idea to check that the directories and files the script is expecting are the ones you want to process.  You can verify the settings by displaying TEXFRWPDIR, TEXINCLUSIONS, TEXROOTFILE and TEXPREAMBLE.

TEXPREAMBLE
bmamble.tex
TEXFRWPDIR
c:/pd/blog/wp2latex/
TEXINCLUSIONS
inclusions
TEXROOTFILE
bm.tex
TEXPREAMBLE
bmamble.tex

If all these directories and files exist go to step (5).

#### Step 5: make sure you are online

The first time you run the converter it will attempt to download all the images referenced in your blog. This is where wget.exe gets executed.  Obviously to download anything you must be connected to the Internet.

#### Step 6: run LatexFrWordpress

Run the verb LatexFrWordpress.  The monadic version of this verb takes a single argument: the complete path and file name of the export XML file you downloaded in step (1).

xml=: 'c:/pd/blog/wordpress/analyzethedatanotthedrivel.wordpress.xml'

LatexFrWordpress xml

As the verb runs you will see output like:

LatexFrWordpress xml
Fake Programming
Laws or Suggestions
Lens Lust

... many lines omitted ...

WordPress to LaTeX with Pandoc and J: LaTeX Directories (Part 2)
+-++
|1||
+-++

When the verb terminates you should have a directory c:/pd/blog/wp2latex full of *.tex files:  one file for each blog post. Now the hard work starts.

#### Step 8: compile your LaTeX blog

I use batch files and shell scripts to drive LaTeX compilations.  I processed my blog with this batch file.

echo off
rem process blog posting (bm.tex) root file
title Running Blog Master/LaTeX ...

rem first pass for aux file needed by bibtex
lualatex bm

rem generate/reset bbl file
bibtex bm
makeindex bm

rem resolve all internal references - may
rem comment out when debugging entire document
lualatex bm
lualatex bm

rem display pdf - point to prefered PDF reader
title Blog Master/LaTeX complete displaying PDF ...
"C:\Program Files\SumatraPDF\SumatraPDF.exe" bm.pdf

The presence of Unicode APL, see this post, forced me to use lualatex. I needed some very nonstandard APL fonts.  See bm.pdf — also available on the Download this Blog page — to judge the effectiveness of my edits. Producing nice figure laden typeset blog documents is work but, as I will describe in the next post, producing image free eBooks is a simple and far less laborious variation on this process.

# WordPress to LaTeX with Pandoc and J: LaTeX Directories (Part 2)

WordPress to LaTeX

In this post I will describe the LaTeX directory structure the J script TeXfrWpxml.ijs is expecting. To convert WordPress export XML to LaTeX with this script you will have to set up similar directories.

LaTeX documents are built from *.tex[1] files. This makes LaTeX more like a compiled programming language than a word processing program. There are advantages and disadvantages to the LaTeX way. In LaTeX’s favor, the system is enormously adaptable, versatile and powerful. There is very little that LaTeX/TeX and associates cannot do.  Unfortunately, “with great power comes great responsibility.” LaTeX is demanding! You have to study LaTeX like any other programming language. It’s not for everyone but for experienced users it’s the best way to produce documents with the highest typographic standards.

#### LaTeX directory structure

To use LaTeX efficiently it’s wise to pick a document directory structure and stick with it. I use a simple directory layout. Each document has a root directory. The root directory used by TeXfrWpxml.ijs is:

 Windows c:/pd/blog/wp2latex Linux /home/john/pd/blog/wp2latex

I put my document specific *.tex, *.bib, *.sty and other LaTeX/TeX files in the root. To handle graphics I create an immediate subdirectory called inclusions.

c:/pd/blog/wp2latex/inclusions

The inclusions directory holds the document’s *.png, *.jpg, *.pdf, *.eps and other graphics files.  To reference files in the inclusions directory with the standard LaTeX graphicx package insert

\usepackage{color,graphicx,subfigure,sidecap}
\graphicspath{{./inclusions/}}

in your preamble. Finally, to track document changes I create a GIT repository in the root directory.

c:/pd/blog/wp2latex/.git

#### Self contained directories

I take care to keep my document directories self-contained. Zipping up the root and inclusions directory collects all the document’s files. This means that I sometimes have to copy files that are used in more than one document. Many LaTeX users maintain a common directory for such files but I’ve found that common directories complicate moving documents around. You’re always forgetting something in the damn common directory or you are copying a buttload of mostly irrelevant files from one big confusing common directory to another.

#### TeXfrWpxml.ijs files

The TeXfrWpxml.ijs script searches for these files in the root directory.

 bm.tex Main LaTeX root file bmamble.tex LaTeX preamble

bm.tex references bmtitlepage.tex.  I prefer a separate title page file; simply comment out this file if you create titles in other ways. The zip file wp2latex.zip contains a test directory in the format expected by TeXfrWpxml.ijs.  It also has a subset of my blog posts already converted to LaTeX. To get ready for WordPress to LaTeX with Pandoc and J: Using TeXfrWpxml.ijs (Part 3) download wp2latex.zip and attempt to compile bm.tex.  You might have to download a number of LaTeX packages.  Once you have successfully compiled bm.tex you are ready for the next step.

[1] LaTeX uses many other file types but key files are usually *.tex files.