NumPy another Iverson Ghost

During my recent SmugMug API and Python adventures I was haunted by an Iverson ghost: NumPy

An Iverson ghost is an embedding of APL like array programming features in nonAPL languages and tools.

You would be surprised at how often Iverson ghosts appear. Whenever programmers are challenged with processing large numeric arrays they rediscover bits of APL. Often they’re unaware of the rich heritage of array processing languages but in NumPy's case, they indirectly acknowledged the debt. In Numerical Python the authors wrote:

“The languages which were used to guide the development of NumPy include the infamous APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others.”

I consider “infamous” an upgrade from “a mistake carried through to perfection.”

Not only do developers frequently conjure up Iverson ghosts. They also invariably turn into little apostles of array programming that won’t shut up about how cutting down on all those goddamn loops clarifies and simplifies algorithms. How learning to think about operating on entire arrays, versus one dinky number at a time, frees the mind. Why it’s almost as if array programming is a tool of thought.

Where have I heard this before?

Ahh, I’ve got it, when I first encountered APL almost fifty years ago.

Yes, I am an old programmer, a fossil, a living relic. My brain is a putrid pool of punky programming languages. Python is just the latest in a longish line of languages. Some people collect stamps. I collect programming languages. And, just like stamp collectors have favorite stamps, I find some programming languages more attractive than others. For example, I recognize the undeniable utility of C/C++, for many tasks they are the only serious options, yet as useful and pervasive as C/C++ are they have never tickled my fancy. The notation is ugly! Yeah, I said it; suck on it C people. Similarly, the world’s most commonly used programming language JavaScript is equally ugly. Again, JavaScript is so damn useful that programmers put up with its many warts. Some have even made a few bucks writing books about its meager good parts.

I have similar inflammatory opinions about other widely used languages. The one that is making me miserable now is SQL, particularly Microsoft’s variant T-SQL. On purely aesthetic grounds I find well-formed SQL queries less appalling than your average C pointer fest. Core SQL is fairly elegant but the macro programming features that have grown up around it are depraved. I feel dirty when forced to use them which is just about every other day.

At the end of my programming day, I want to look on something that is beautiful. I don’t particularly care about how useful a chunk of code is or how much money it might make, or what silly little business problem it solves. If the damn code is ugly I don’t want to see it.

People keep rediscovering array programming, best described in Ken Iverson’s 1962 book A Programming Language, for two basic reasons:

  1. It’s an efficient way to handle an important class of problems.
  2. It’s a step away from the ugly and back towards the beautiful.

Both of these reasons manifest in NumPy‘s resounding success in the Python world.

As usual, efficiency led the way. The authors of Numerical Python note:

Why are these extensions needed? The core reason is a very prosaic one, and that is that manipulating a set of a million numbers in Python with the standard data structures such as lists, tuples or classes is much too slow and uses too much space.

Faced with a “does not compute” situation you can either try something else or fix what you have. The Python people fixed Python with NumPy. Pythonistas reluctantly embraced NumPy but quickly went apostolic! Now books like Elegant SciPy and the entire SciPy toolset that been built on NumPy take it for granted.

Is there anything in NumPy for programmers that have been drinking the array processing Kool-Aid for decades? The answer is yes! J programmers, in particular, are in for a treat with the new Python3 addon that’s been released with the latest J 8.07 beta. This addon directly supports NumPy arrays making it easy to swap data in and out of the J/Python environments. It’s one of those best of both worlds things.

The following NumPy examples are from the NumPy quick start tutorial. For each NumPy statement, I have provided a J equivalent. J is a descendant of APL. It was largely designed by the same man: Ken Iverson. A scumbag lawyer or greedy patent troll might consider suing NumPy‘s creators after looking at these examples. APL’s influence is obvious. Fortunately, Ken Iverson was more interested in promoting good ideas that profiting from them. I suspect he would be flattered that APL has mutated and colonized strange new worlds and I think even zealous Pythonistas will agree that Python is a delightfully strange world.

Some Numpy and J examples

Selected Examples from Output has been suppressed here. For a more detailed look at these examples browse the Jupyter notebook:  NumPy and J Make Sweet Array Love.

Creating simple arrays

 # numpy
 a = np.arange(15).reshape(3, 5)
 NB. J
 a =. 3 5 $ i. 15

 # numpy
 a = np.array([2,3,4])
 NB. J
 a =. 2 3 4
 # numpy
 b = np.array([(1.5,2,3), (4,5,6)])
 NB. J
 b =. 1.5 2 3 ,: 4 5 6

 # numpy
 c = np.array( [ [1,2], [3,4] ], dtype=complex )
 NB. J
 j. 1 2 ,: 3 4
 # numpy
 np.zeros( (3,4) )
 NB. J
 3 4 $ 0
 # numpy - allocates array with whatever is in memory
 np.empty( (2,3) )
 NB. J - uses fill - safer but slower than numpy's trust memory method
 2 3 $ 0.0001 

Basic operations

 # numpy
 a = np.array( [20,30,40,50] )
 b = np.arange( 4 )
 c = a - b
 NB. J
 a =. 20 30 40 50
 b =. i. 4
 c =. a - b
 # numpy - uses previously defined (b)
 b ** 2
 NB. J
 b ^ 2

 # numpy - uses previously defined (a)
 10 * np.sin(a)
 NB. J
 10 * 1 o. a
 # numpy - booleans are True and False
 a < 35
 NB. J - booleans are 1 and 0
 a < 35

Array processing

 # numpy
 a = np.array( [[1,1], [0,1]] )
 b = np.array( [[2,0], [3,4]] )
 # elementwise product
 a * b

 NB. J
 a =. 1 1 ,: 0 1
 b =. 2 0 ,: 3 4
 a * b

 # numpy - matrix product, b)

 NB. J - matrix product
 a +/ . * b  
 # numpy - uniform pseudo random
 a = np.random.random( (2,3) )
 NB. J - uniform pseudo random
 a =. ? 2 3 $ 0
 # numpy - sum all array elements - implicit ravel
 NB. J - sum all array elements - explicit ravel
 +/ , a
 # numpy
 b = np.arange(12).reshape(3,4)
 # sum of each column
 # min of each row
 # cumulative sum along each row
 # transpose

 NB. J 
 b =. 3 4 $ i. 12
 NB. sum of each column
 +/ b
 NB. min of each row
 <./"1 b
 NB. cumulative sum along each row
 +/\"0 1 b
 NB. transpose
 |: b

Indexing and slicing

 # numpy 
 a = np.arange(10) ** 3 
 a[ : :-1]   # reversal

 NB. J
 a =. (i. 10) ^ 3
 2 { a
 (2 + i. 3) { a
 |. a

APL Software Archaeology .dbi Edition


Have yourself a merry little APL Christmas.

I joke that my job title should be software archaeologist because I often find myself resurrecting, not refactoring, code that dates to primitive and primeval eras. The language I’m typically hired to resurrect is APL. APL, the language with funny symbols, is a software vampire. People keep paying us to kill it, but no matter how many stakes we pound through its heart it keeps coming back.

There are good reasons for this. APL embodies many timeless ideas and I’m confident that programming in the future will look a lot more like APL than many expect. If you doubt me just press the Siri button on your iPhone and ask, “Integrate X squared times sine X from 0 to 2.” What comes back has more of an APL than QWERTYUIOP flavor. Strange Unicode characters are creeping into many mainstream languages. This is a good thing because restricting programming to the miserly key sets of ancient typewriters was, is, and always will be a spectacularly bad idea. Ken Iverson deserves rich accolades for pointing this out more than fifty years ago and beating this drum incessantly during his lifetime. Iverson taught that notation is a tool of thought and that if you care about ideas you must care about how they are expressed. Why is this even remotely controversial?


Siri’s results use appropriate mathematical notations. As we move away from keyboards programming languages and mathematical notation will merge. APL was way ahead of its time in this respect.

The genius of APL continues to exert influence on many programming languages, but APL’s rise had little to do with its abstract notation and a lot to do with how it was implemented. APL was one of the first programming environments that nonprogrammers could use. It was the spreadsheet of the late 1960’s and 1970’s and just like spreadsheets of today a lot of utterly horrid, poorly structured, lame amateur messes were created with it. If you’ve ever cracked open a gigantic Excel model that looks like it was developed by a roomful of quarreling ADHD afflicted unionized chimpanzees then you know what the standard APL mess feels like. Many programmers blamed APL for this just like gun control advocates blame firearms for shootings. They argued that it would have been impossible to concoct such monsters in clean compiled languages like Pascal. “It wouldn’t even compile.” This is not even wrong. I’ve dealt with plenty of dreadful messes that do compile! The tool is always neutral; don’t blame the paintbrush for the painting.

Allowing rubes to code yields mountains of rubbish and the occasional ruby. It will shock many programmers to learn they are not the only smart people in the world. It turns out that nonprogrammers occasionally have good ideas and, miraculously, some of them can ably express their ideas in code. Before spreadsheets such user rubies congealed in APL where some still run. Part of my day job is extracting these precious stones from layers and layers of kluges, hacks, patch jobs, retro-fits and workarounds and recoding them in modern programming languages like C# and JavaScript.

Recently I recovered1 an ancient inverted file system embedded in the APL systems of my employer and rendered it in C#. This system uses the extension .dbi. I don’t know who created this system; the code is old. The most recent code comments date from the year 2000, but I am pretty sure that .dbi files predate component files in APL+WIN, formerly STSC APL, which pushes the design back to the 1980’s or earlier. I know many APL’ers check this blog. If any of you know who created the original .dbi APL code please leave a note.

Somehow this .dbi system survived unsupported, with few user complaints, for decades of daily use. How is this possible? Astonishingly, good ideas age well and the core .dbi idea is inverted data. Modern high-performance databases make heavy use of this method. Inversion is so effective that hoary old interpreted APL code still beats compiled and optimized ADO.Net when fetching large numeric vectors and tables.

Restoring the .dbi system was a two-step process.2 I first converted the APL system to J. I used J because it is a close relative of APL but not so close that you can cut and paste. Translating nontrivial APL to J forces you to understand the APL at the nit-bitty level. The translation to J also allowed me to fix the APL interface. The original system used global variables, rampant branches and other lamentable coding practices that C# will not abide. After matching the APL and J systems I then translated the J to C# and then rematched all three systems.

Comparing multiple systems is a very effective testing technique. I found bugs in all three systems. I fixed the J and C# bugs but left the original APL code unchanged. Software archaeology is a delicate field. You don’t “fix” old code just like you don’t correct errors in cuneiform tablets. Original and important program code belongs in museums with other significant cultural artifacts.

Original inverted file code probably belongs in a museum. This .dbi APL code is old, but it certainly derives from earlier programs so it’s not museum worthy. Even if it was the APL and C# .dbi systems belong to my employer. However, I am placing the J scaffold version, which matches the performance of the other systems, into the public domain. The script is available on GitHub and here. The .dbi system gets right down to bits in some cases and illustrates some J techniques for dealing with indexed binary inverted file data. Enjoy!

  1.  .dbi files held many gigabytes of actuarially tuned data. Dumping them was not an option. We either had to convert to a new store or produce a component that could read old data in new systems.
  2. Restoring old code is somewhat like restoring old pictures. When working on old pictures you’re always tempted to improve them. With pictures you usually have a choice. This may not hold for old code. Changes in software may force updates.

Typesetting UTF8 APL code with the LaTeX lstlisting package

UTF8 APL characters within a LaTeX lstlisting environment. Click for *.tex source code

Typesetting APL source code has always been a pain in the ass! In the dark ages, (the 1970’s), you had to fiddle with APL type-balls and live without luxuries like lower case letters. With the advent of general outline fonts it became technically possible to render APL glyphs on standard display devices provided you:

  1. Designed your own APL font.
  2. Mapped the atomic vector of your APL to whatever encoding your font demanded.
  3. Wrote WSFULL‘s of junk transliteration functions to dump your APL objects as font encoded text.

It’s a testament to either the talent, or pig headedness of APL programmers, that many actually did this. We all hated it! We still hate it! But, like an abused spouse, we kept going back for more.  It’s our fault; if we loved APL more it would stop hitting us!

When Unicode appeared APL’ers cheered — our long ASCII nightmare was ending. The more politically astute worked to include the APL characters in the Unicode standard. Hey if Klingon is there why not APL? Everyone thought it was just a matter of time until APL vendors abandoned their nonstandard atomic vectors and fully embraced Unicode. With a few notable exceptions we are still waiting. While we wait the problem of typesetting APL source code festers.

My preferred source code listing tool is the \LaTeX lstlisting package. lstlisting works well for standard ANSI source code.  I use it for J, C#, SQL, C, XML, Ocaml, Mathematica, F#, shell scripts and \LaTeX source code, i.e. everything except APL! lstlisting is an eight bit package; it will not handle arbitrary Unicode out of the box.  I didn’t know how to get around this so I handled APL by enclosing UTF8 APL text in plain \begin{verbatim} … \end{verbatim} environments. This works for XeLaTeX and LuaLaTeX but you lose all the lstlisting goodies. Then I saw an interesting posting about The ‘listings’ package and UTF-8. One solution to the post’s “French ligature problem” showed how to force Unicode down lstlisting‘s throat. I wondered if the same method would work for APL. It turns out that it does!

If you insert the following snippet of TeX code in your document preamble LuaLaTeX and XeLaTeX will properly process UTF8 APL text in lstlisting environments. You will need to download and install the APL385 Unicode font if it’s not on your system.  A test \LaTeX document illustrating this hack is available here. The compiled PDF is available here. As always these files can be accessed in the files sidebar.

% set lstlisting to accept UTF8 APL text
 \lst@CCECUse \lst@ProcessLetter

The Return of APL Fingers

APL typewriter ball (1970s)

APL typewriter ball (1970s)

I am programming in APL again after a six-year hiatus. My APL fingers are rusty but it’s amazing how deep muscle memory goes. I still know where all the beautiful APL glyphs’ hide on standard keyboards. I’ve programmed in almost a dozen programming languages but I maintain warm feelings for APL because I lost my coding virginity to her.

APL was gentle: abstract, clean, austere and so intoxicatingly elegant. I loved Iverson notation and how it manifested in what remains the most beautiful symbol set ever devised for a programming language. Blinded by passion I overlooked APL’s faults but as my ardor cooled I took notice of one glaring APL problem that persists to this day. Displaying, printing, emailing and now blogging with APL is a pain! Much has improved with the steady adoption of Unicode but even today handling APL imposes burdens. For example, without the APL385 Unicode font your browser will mangle the following APL characters.



By the way, if you have not installed APL385 Unicode I highly recommend downloading and installing it. You can get APL385 at VectorRegrettably installing a Unicode APL font will not fix all your APL character problems! In particular you cannot reliably:

  1. Copy and paste APL code between APL vendors or between APL and other languages.
  2. Typeset APL listings with ubiquitous \LaTeX packages like lstlisting.
  3. Post APL idioms on Twitter. Twitter’s 140 character limit is not such a big deal for APL.
  4. Print on arbitrary network printers! It’s the 21st century yet office printers are routinely limited by parochial IT policies to a small set of standard fonts.

APL’ers  almost relish these irritants after all problem solving is what APL’ers do! We don’t go crying to momma; we squash whatever is annoying us and get on with deeper problems.

Currently I am dealing with how to display APL functions on WordPress. WordPress supports a source code highlighting plug-in based on Alex Gorbatchev’s excellent Javascript SyntaxHighlighter. The WordPress plug-in produces wonderful results for well known languages like C# and by defining language specific Javascript classes you can highlight languages like APL. Eric Lescasse has done this for APL+WIN code so it is possible, (given Unicode APL fonts), to render highlighted web friendly APL code. Unfortunately WordPress does not support APL highlighting, (what a surprise), and has banned user Javascript classes on their freebie blogs. Apparently some programmers abuse JavaScript, (another surprise), and uncontrolled Javascript’ing might endanger the WordPress business model.

This leaves the users of peculiar programming languages with a problem. We can pester WordPress to support new languages or we can roll our own. I am starting a campaign to get APL and J on WordPress’s list of highlighted languages and while I am waiting for official support I will roll my own. Fortunately, highlighting code for blogs is not difficult. The much maligned MS Word (2007 and beyond) can crank out blog happy APL provided you have UTF-8 APL Unicode text to format. Getting UTF-8’ed APL is the tricky bit. Some APL systems like Dyalog directly support UTF-8 and others are planning to do so. APL+WIN cannot spit out UTF-8 but it’s not difficult to transform APL+WIN to UTF-8. The APL Wiki contains some slick APL+WIN functions to convert internal APL text to and from UTF-8.

To get a sense of why all this fuss is worthwhile consider the following APL function taken from Eugene McDonnell’s superb essay Life: Nasty, Brutish, and Short.

∇ z ← LifeKnuth w;v
v ← w
w ← w + (1 ⌽ w) + ¯1 ⌽ w
w ← w + (1 ⊖ w) + ¯1 ⊖ w
w ← w + w – v
z ← w ∊ 5 6 7

This little function is not the shortest APL Life function but in my opinion it’s the clearest and most concise description of Life’s generation rules out there. Tool of thought is not an empty APL marketing slogan. It’s the real deal!