APL Software Archaeology .dbi Edition

apltree

Have yourself a merry little APL Christmas.

I joke that my job title should be software archaeologist because I often find myself resurrecting, not refactoring, code that dates to primitive and primeval eras. The language I’m typically hired to resurrect is APL. APL, the language with funny symbols, is a software vampire. People keep paying us to kill it, but no matter how many stakes we pound through its heart it keeps coming back.

There are good reasons for this. APL embodies many timeless ideas and I’m confident that programming in the future will look a lot more like APL than many expect. If you doubt me just press the Siri button on your iPhone and ask, “Integrate X squared times sine X from 0 to 2.” What comes back has more of an APL than QWERTYUIOP flavor. Strange Unicode characters are creeping into many mainstream languages. This is a good thing because restricting programming to the miserly key sets of ancient typewriters was, is, and always will be a spectacularly bad idea. Ken Iverson deserves rich accolades for pointing this out more than fifty years ago and beating this drum incessantly during his lifetime. Iverson taught that notation is a tool of thought and that if you care about ideas you must care about how they are expressed. Why is this even remotely controversial?

siriintegral

Siri’s results use appropriate mathematical notations. As we move away from keyboards programming languages and mathematical notation will merge. APL was way ahead of its time in this respect.

The genius of APL continues to exert influence on many programming languages, but APL’s rise had little to do with its abstract notation and a lot to do with how it was implemented. APL was one of the first programming environments that nonprogrammers could use. It was the spreadsheet of the late 1960’s and 1970’s and just like spreadsheets of today a lot of utterly horrid, poorly structured, lame amateur messes were created with it. If you’ve ever cracked open a gigantic Excel model that looks like it was developed by a roomful of quarreling ADHD afflicted unionized chimpanzees then you know what the standard APL mess feels like. Many programmers blamed APL for this just like gun control advocates blame firearms for shootings. They argued that it would have been impossible to concoct such monsters in clean compiled languages like Pascal. “It wouldn’t even compile.” This is not even wrong. I’ve dealt with plenty of dreadful messes that do compile! The tool is always neutral; don’t blame the paintbrush for the painting.

Allowing rubes to code yields mountains of rubbish and the occasional ruby. It will shock many programmers to learn they are not the only smart people in the world. It turns out that nonprogrammers occasionally have good ideas and, miraculously, some of them can ably express their ideas in code. Before spreadsheets such user rubies congealed in APL where some still run. Part of my day job is extracting these precious stones from layers and layers of kluges, hacks, patch jobs, retro-fits and workarounds and recoding them in modern programming languages like C# and JavaScript.

Recently I recovered1 an ancient inverted file system embedded in the APL systems of my employer and rendered it in C#. This system uses the extension .dbi. I don’t know who created this system; the code is old. The most recent code comments date from the year 2000, but I am pretty sure that .dbi files predate component files in APL+WIN, formerly STSC APL, which pushes the design back to the 1980’s or earlier. I know many APL’ers check this blog. If any of you know who created the original .dbi APL code please leave a note.

Somehow this .dbi system survived unsupported, with few user complaints, for decades of daily use. How is this possible? Astonishingly, good ideas age well and the core .dbi idea is inverted data. Modern high-performance databases make heavy use of this method. Inversion is so effective that hoary old interpreted APL code still beats compiled and optimized ADO.Net when fetching large numeric vectors and tables.

Restoring the .dbi system was a two-step process.2 I first converted the APL system to J. I used J because it is a close relative of APL but not so close that you can cut and paste. Translating nontrivial APL to J forces you to understand the APL at the nit-bitty level. The translation to J also allowed me to fix the APL interface. The original system used global variables, rampant branches and other lamentable coding practices that C# will not abide. After matching the APL and J systems I then translated the J to C# and then rematched all three systems.

Comparing multiple systems is a very effective testing technique. I found bugs in all three systems. I fixed the J and C# bugs but left the original APL code unchanged. Software archaeology is a delicate field. You don’t “fix” old code just like you don’t correct errors in cuneiform tablets. Original and important program code belongs in museums with other significant cultural artifacts.

Original inverted file code probably belongs in a museum. This .dbi APL code is old, but it certainly derives from earlier programs so it’s not museum worthy. Even if it was the APL and C# .dbi systems belong to my employer. However, I am placing the J scaffold version, which matches the performance of the other systems, into the public domain. The script is available on GitHub and here. The .dbi system gets right down to bits in some cases and illustrates some J techniques for dealing with indexed binary inverted file data. Enjoy!


  1.  .dbi files held many gigabytes of actuarially tuned data. Dumping them was not an option. We either had to convert to a new store or produce a component that could read old data in new systems.
  2. Restoring old code is somewhat like restoring old pictures. When working on old pictures you’re always tempted to improve them. With pictures you usually have a choice. This may not hold for old code. Changes in software may force updates.

A C# .Net Class for calling J

J Icon One of my favorite programming tools is J.  In skilled hands J is a spear in a world of bent spoons.  In my day job I rarely encounter programming problems that cannot be brutally dispatched with a few dozen lines of J.  Most accomplished J programmers laud the elegance and power of the language and frequently remark on how learning J changed the way the way they think about programming. If you are intrigued please take a look but a word of caution.  Learning J is like learning Calculus. Don’t expect to progress beyond the trivial without a substantial intellectual effort on your behalf.

J has many strengths but current implementations also have some serious shortcomings.

  1. J’s GUI user interface tools are primitive compared to what you find in Microsoft Visual Studio or Java Eclipse environments.
  2. It is difficult to use J in mixed language projects.  J can make C style API calls and the Windows version sports a COM interface.  Both of these call mechanisms are solid and work well but the C API struggles with many C++ libraries and COM is now considered a legacy technology in Microsoft .Net circles.
  3. .Net executables can call J but J cannot easily call .Net executables. 
  4. There are very few useful J libraries. Python programmers often find complete solutions to their problems in libraries.  With J you often end up writing your own libraries  This fosters an independent frame of mind at the expense of productivity.
  5. Packaging J solutions is largely ad hoc and platform dependent.  It’s not like C# or Java where you get a nice self-contained install package.

To deal with J’s deficiencies I cheat and use other languages and tools. This is getting the best of both worlds or Miley Cryrus’ing  it!  Miley Cryus’ing in Windows environments leads straight to .Net and the premier .Net programming language C#.  J is not a .Net language but J can be called from C# by COM or by C style API calls.  This JServer class uses COM. JServer was inspired by Alex Rufon’s J Wiki essay but differs in that all JServer calls are strongly typed.  There is no point in using strongly typed languages like C# if you are constantly casting objects. Use the type checking Luke!

The following JServerTest code snippet shows JServer calls.

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using JServerClass;  // add reference to JServer.exe

namespace JServerTest
{
    class Program
    {
        static void Main(string[] args)
        {
            // create new j exe server - load only the j profile
            JServer js = new JServer(JServer.JScriptType.OnlyProfile);

            // make server visible/invisible/visible
            js.jShowServer = true;
            System.Threading.Thread.Sleep(200);
            js.jShowServer = false;
            System.Threading.Thread.Sleep(200);
            js.jShowServer = true;

            // do tests - create j nouns that interface can fetch

            js.jDo("18!:5 ''"); // should be in base locale

            // atoms - rank 0
            js.jDo("byteAtom=. 'A'");
            js.jDo("boolAtom=. 1");
            js.jDo("intAtom=. 42");
            js.jDo("doubleAtom=. 1x1"); // e in j notation

            // arrays of rank 1 and 2 - higher rank arrays are not
            // explicitly supported by the C# interface
            js.jDo("boolArray=. ?50#2");
            js.jDo("intArray=. 10 10$?100#10");
            js.jDo("doubleArray=. 5 10$(?50#50) % ?50#50");
            js.jDo("byteArray=. 20 30$'goaheadbyteme'");
            js.jDo("stringArray=. ;:'not by the hair of my chinny chin chin'");
            js.jDo("stringArray2=. 11 7$stringArray");

            // get tests - fetch j nouns - get and set are C# overloads

            // rank 0 gets
            byte byteAtom;
            js.jGet("byteAtom", out byteAtom);
            bool boolAtom;
            js.jGet("boolAtom", out boolAtom);
            int intAtom;
            js.jGet("intAtom", out intAtom);
            double doubleAtom;
            js.jGet("doubleAtom", out doubleAtom);

            // rank 1 and/or 2 gets
            bool[] boolArray;
            js.jGet("boolArray", out boolArray);
            int[,] intArray;
            js.jGet("intArray", out intArray);
            double[,] doubleArray;
            js.jGet("doubleArray", out doubleArray);
            byte[,] byteArray;
            js.jGet("byteArray", out byteArray);
            string[] stringArray;
            js.jGet("stringArray", out stringArray);
            string[,] stringArray2;
            js.jGet("stringArray2", out stringArray2);

            // set tests - set copies of fetched nouns in j and test
            js.jSet("byteAtomC", byteAtom);
            js.jDo("byteAtom -: byteAtomC");   // should be identical - result 1
            js.jSet("boolAtomC", boolAtom);
            js.jDo("boolAtomC -: boolAtomC");
            js.jSet("intAtomC", intAtom);
            js.jDo("intAtomC -: intAtom");
            js.jSet("doubleAtomC", doubleAtom);
            js.jDo("doubleAtomC -: doubleAtom");

            js.jSet("boolArrayC", boolArray);
            js.jDo("boolArrayC -: boolArray");
            js.jSet("intArrayC", intArray);
            js.jDo("intArrayC -: intArray");
            js.jSet("doubleArrayC", doubleArray);
            js.jDo("doubleArrayC -: doubleArray");
            js.jSet("byteArrayC", byteArray);
            js.jDo("byteArrayC -: byteArray");
            js.jSet("stringArrayC", stringArray);
            js.jDo("stringArrayC -: stringArray");

            // no overload for this case - it's not
            // as important as the rank 1 case
            //js.jSet("stringArray2C", stringArray2);

            // Datatable's are supported by the interface
            // as they can be quickly displayed and manipulated
            // in DataGridView objects
            DataTable dt = new DataTable();
            dt.Clear();

            // generate test j datatable representation - the interface
            // loads a support locale CSsrv that contains the necessary
            // j verbs to support these representations
            js.jDo("DTTEST=: testDataTable_CSsrv_ >:?100 10");

            // get the datatable
            dt = js.jGet("DTTEST");

            // set a copy of the datatable back in j and test equivalence
            // slight differences in floating number character formats
            // are reconciled with (testDataTableMatch)
            js.jSet("DTTESTC", dt);
            js.jDo("DTTESTC testDataTableMatch_CSsrv_ DTTEST");

            // wait five seconds before shutting
            // down so user can view the j exe server
            System.Threading.Thread.Sleep(5000);
        }
    }
}

Command Line C# SmugMug API Metadata Download

I have a skeleton in my photographic closet!  I enjoy hacking pictures as much as I enjoy shooting them.  Before digital photography I got my jollies the old fashioned way with chemicals:  dark room chemicals.  I still get all emotional when I remember the scent of a fixer.   Ahhh — those were the days.

Now,  instead of inhaling fumes in the dark, I hang out on picture sites:   SmugMug is my current favorite.   Over the last year I have uploaded thousands of carefully cataloged  images:  you can view them here.   I may not be much of photographer but when it comes to image metadata my anal analytic side shines.  I can EXIF, IPTC and GEOTAG with the best of them.

Because I tweak metadata online, and I suffer from a retentive character flaw,  it’s only natural that I would seek to download my sacred metadata.  This is what SmugMug’s API is for!  When I started experimenting with the SmugMug API I made the mistake of reading the documentation.  SmugMug documentation is,  at best,  a “work in progress.”  It may help but probably not!  I found trolling the web looking for code examples more productive.

To help the next SmugMug API geek I am posting a fragment of a simple command line C# metadata dump utility I put together.   The core of the program  is shown below and all the C# source is available here.  This program is to trivial to license so help yourself.

namespace SmugMugMDDumper
{
class Program
{
private const string xmlHeader = @"<?xml version=""1.0"" encoding=""UTF-8""?>";

// defaults - insert your own SmugMug apikey, password, email here
// defaults are used if corresponding command line arguments are missing
private const string apiKey = "<YOUR SMUGMUG APIKEY>";
private const string passWord = "<YOUR SMUGMUG PASSWORD";
private const string emailAddress = "<YOUR SMUGMUG EMAIL>";
private const string outFile = @"c:\temp\smugmugdata.xml";

static void Main(string[] args)
{
try
{
DataSet ds = new DataSet();
XmlDocument doc = new XmlDocument();
Arguments comline = new Arguments(args);
SmugmugMetaData smugmd = new SmugmugMetaData();

// parse and set any command line arguments
if (comline["help"] != null)
{
string __helpMsg = @"
Typical command line calls:

SmugMugMDDumper.exe -apikey:""xQDzWwLp2I1GUGli88g999VrQWN4Xz56"" -email:""youremail"" -password:""nimcompoop"" -output:""c:\test\smugdata.xml""
SmugMugMDDumper.exe -output:""d:\mystuff\smuggy.xml""
SmugMugMDDumper.exe -password:""newpassword"" -output:""c:\temp\out.xml""
SmugMugMDDumper.exe -help

";
Console.Write(__helpMsg);
return;
}

string __apiKey;
if (comline["apikey"] != null) __apiKey = comline["apikey"];
else __apiKey = apiKey;

string __emailAddress;
if (comline["email"] != null) __emailAddress = comline["email"];
else __emailAddress = emailAddress;

string __passWord;
if (comline["password"] != null) __passWord = comline["password"];
else __passWord = passWord;

string __outputFile;
if (comline["output"] != null) __outputFile = comline["output"];
else __outputFile = outFile;

// start output file
smugmd.WriteToFile(xmlHeader + "<SmugMugData>", __outputFile);

// open SmugMusg session - uses https
string __sessionID = smugmd.StartSMSession(__apiKey, __emailAddress, __passWord);

// collect all galleries
ds = smugmd.GetGalleries(__sessionID, __apiKey, __outputFile);
DataTable myTable = ds.Tables[0];
DataRow myRow;

// image metadata for each gallery
smugmd.AppendToFile("<GalleryImages>", __outputFile);
int rowcnt = myTable.Rows.Count;
string rowstr = "/" + rowcnt.ToString() + "]: ";
for (int i = 0; i < rowcnt; i++)
{
myRow = myTable.Rows[i];
Console.WriteLine("gallery [" + (i + 1).ToString() + rowstr + (string)myRow["Title"]);
doc = smugmd.GetGalleryImages(__sessionID, __apiKey, (int)myRow["id"], __outputFile);
}
smugmd.AppendToFile("</GalleryImages>", __outputFile);

// complete output file - end SmugMug session
smugmd.AppendToFile("</SmugMugData>", __outputFile);
smugmd.EndSMSession(__sessionID, __apiKey);

Console.WriteLine("[Complete] output file: " + __outputFile);
}
catch (Exception ex)
{
Console.WriteLine("[Fail] SmugMug Metadata Dumper Failure - error message: " + ex.Message);
}
}
}
}

Why Code when you can Steal

I am learning C#.

Two years ago I swore a blood oath not to learn anymore programming languages.   It’s been obvious for decades that you seldom find any new and important ideas in programming languages.  What you typically find are old ideas renamed and wrapped in a new syntax.  Virtually all key concepts in programming are over twenty years old and many are far older.  My disgust with new languages started with a single word:  refactoring! 

When I met refactoring it seduced me with its sleek geeky’ness.  What could this wonderful word mean and what thrilling concept did it clothe? Well it basically means cleaning up your abhorrent code so that you can make some freaking sense of it!  All competent programmers, dating back to Ada Lovelace (1815-1852),  have been refactoring all their goddamn coding lives.  Refactoring is geek marketing: the same old shit in a glistening new package.

C# is as free of new concepts as I expected but the language has its strengths.  C# has managed to inherit most of its predecessors gifts without introducing untested features.  C#’s designers restrained themselves and it shows.  The language is clean, easy to learn, and integrates elegantly with  .Net libraries. 

This is all good but what makes it better is that you can steal tons of C# code.  Google and Bing are my accomplices.  When I want to find out what a DataSet does I just pop a query and dredge up nuggets like:  Creating A Data Set From Scratch in C#.   In the old days you had to read  dense language documents like the J Dictionary and think for yourself.

Thinking for yourself is so 20th century;  why code when you can steal!