NumPy another Iverson Ghost

During my recent SmugMug API and Python adventures I was haunted by an Iverson ghost: NumPy

An Iverson ghost is an embedding of APL like array programming features in nonAPL languages and tools.

You would be surprised at how often Iverson ghosts appear. Whenever programmers are challenged with processing large numeric arrays they rediscover bits of APL. Often they’re unaware of the rich heritage of array processing languages but in NumPy's case, they indirectly acknowledged the debt. In Numerical Python the authors wrote:

“The languages which were used to guide the development of NumPy include the infamous APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others.”

I consider “infamous” an upgrade from “a mistake carried through to perfection.”

Not only do developers frequently conjure up Iverson ghosts. They also invariably turn into little apostles of array programming that won’t shut up about how cutting down on all those goddamn loops clarifies and simplifies algorithms. How learning to think about operating on entire arrays, versus one dinky number at a time, frees the mind. Why it’s almost as if array programming is a tool of thought.

Where have I heard this before?

Ahh, I’ve got it, when I first encountered APL almost fifty years ago.

Yes, I am an old programmer, a fossil, a living relic. My brain is a putrid pool of punky programming languages. Python is just the latest in a longish line of languages. Some people collect stamps. I collect programming languages. And, just like stamp collectors have favorite stamps, I find some programming languages more attractive than others. For example, I recognize the undeniable utility of C/C++, for many tasks they are the only serious options, yet as useful and pervasive as C/C++ are they have never tickled my fancy. The notation is ugly! Yeah, I said it; suck on it C people. Similarly, the world’s most commonly used programming language JavaScript is equally ugly. Again, JavaScript is so damn useful that programmers put up with its many warts. Some have even made a few bucks writing books about its meager good parts.

I have similar inflammatory opinions about other widely used languages. The one that is making me miserable now is SQL, particularly Microsoft’s variant T-SQL. On purely aesthetic grounds I find well-formed SQL queries less appalling than your average C pointer fest. Core SQL is fairly elegant but the macro programming features that have grown up around it are depraved. I feel dirty when forced to use them which is just about every other day.

At the end of my programming day, I want to look on something that is beautiful. I don’t particularly care about how useful a chunk of code is or how much money it might make, or what silly little business problem it solves. If the damn code is ugly I don’t want to see it.

People keep rediscovering array programming, best described in Ken Iverson’s 1962 book A Programming Language, for two basic reasons:

  1. It’s an efficient way to handle an important class of problems.
  2. It’s a step away from the ugly and back towards the beautiful.

Both of these reasons manifest in NumPy‘s resounding success in the Python world.

As usual, efficiency led the way. The authors of Numerical Python note:

Why are these extensions needed? The core reason is a very prosaic one, and that is that manipulating a set of a million numbers in Python with the standard data structures such as lists, tuples or classes is much too slow and uses too much space.

Faced with a “does not compute” situation you can either try something else or fix what you have. The Python people fixed Python with NumPy. Pythonistas reluctantly embraced NumPy but quickly went apostolic! Now books like Elegant SciPy and the entire SciPy toolset that been built on NumPy take it for granted.

Is there anything in NumPy for programmers that have been drinking the array processing Kool-Aid for decades? The answer is yes! J programmers, in particular, are in for a treat with the new Python3 addon that’s been released with the latest J 8.07 beta. This addon directly supports NumPy arrays making it easy to swap data in and out of the J/Python environments. It’s one of those best of both worlds things.

The following NumPy examples are from the SciPy.org NumPy quick start tutorial. For each NumPy statement, I have provided a J equivalent. J is a descendant of APL. It was largely designed by the same man: Ken Iverson. A scumbag lawyer or greedy patent troll might consider suing NumPy‘s creators after looking at these examples. APL’s influence is obvious. Fortunately, Ken Iverson was more interested in promoting good ideas that profiting from them. I suspect he would be flattered that APL has mutated and colonized strange new worlds and I think even zealous Pythonistas will agree that Python is a delightfully strange world.

Some Numpy and J examples

Selected Examples from https://docs.scipy.org/doc/numpy-dev/user/quickstart.html Output has been suppressed here. For a more detailed look at these examples browse the Jupyter notebook:  NumPy and J Make Sweet Array Love.

Creating simple arrays

 
 # numpy
 a = np.arange(15).reshape(3, 5)
 
 NB. J
 a =. 3 5 $ i. 15

 # numpy
 a = np.array([2,3,4])
 
 NB. J
 a =. 2 3 4
 
 # numpy
 b = np.array([(1.5,2,3), (4,5,6)])
 
 NB. J
 b =. 1.5 2 3 ,: 4 5 6

 # numpy
 c = np.array( [ [1,2], [3,4] ], dtype=complex )
 
 NB. J
 j. 1 2 ,: 3 4
 
 # numpy
 np.zeros( (3,4) )
 
 NB. J
 3 4 $ 0
 
 # numpy - allocates array with whatever is in memory
 np.empty( (2,3) )
 
 NB. J - uses fill - safer but slower than numpy's trust memory method
 2 3 $ 0.0001 

Basic operations

 
 # numpy
 a = np.array( [20,30,40,50] )
 b = np.arange( 4 )
 c = a - b
 
 NB. J
 a =. 20 30 40 50
 b =. i. 4
 c =. a - b
 
 # numpy - uses previously defined (b)
 b ** 2
 
 NB. J
 b ^ 2

 # numpy - uses previously defined (a)
 10 * np.sin(a)
 
 NB. J
 10 * 1 o. a
 
 # numpy - booleans are True and False
 a < 35
 
 NB. J - booleans are 1 and 0
 a < 35

Array processing

 
 # numpy
 a = np.array( [[1,1], [0,1]] )
 b = np.array( [[2,0], [3,4]] )
 # elementwise product
 a * b

 NB. J
 a =. 1 1 ,: 0 1
 b =. 2 0 ,: 3 4
 a * b

 # numpy - matrix product
 np.dot(a, b)

 NB. J - matrix product
 a +/ . * b  
 
 # numpy - uniform pseudo random
 a = np.random.random( (2,3) )
 
 NB. J - uniform pseudo random
 a =. ? 2 3 $ 0
 
 # numpy - sum all array elements - implicit ravel
 a.sum(a)
 
 NB. J - sum all array elements - explicit ravel
 +/ , a
 
 # numpy
 b = np.arange(12).reshape(3,4)
 # sum of each column
 b.sum(axis=0)
 # min of each row
 b.min(axis=1)
 # cumulative sum along each row
 b.cumsum(axis=1)
 # transpose
 b.T     

 NB. J 
 b =. 3 4 $ i. 12
 NB. sum of each column
 +/ b
 NB. min of each row
 <./"1 b
 NB. cumulative sum along each row
 +/\"0 1 b
 NB. transpose
 |: b

Indexing and slicing

 
 # numpy 
 a = np.arange(10) ** 3 
 a[2]
 a[2:5]
 a[ : :-1]   # reversal

 NB. J
 a =. (i. 10) ^ 3
 2 { a
 (2 + i. 3) { a
 |. a

SWAG a J/EXCEL/GIT Personal Cash Flow Forecasting Mob

While browsing in a favorite bookstore with my son, I spotted a display of horoscope themed Christmas tree ornaments. The ornaments were glass balls embossed with golden birth signs like Aquarius, Gemini, Cancer, et cetera, and a descriptive phrase that “summed up” the character of people born under a sign. Below my birth sign golden text declared, “Imaginative and Suspicious.”

I said to my son, “I hate it when astrological rubbish is right.”

I am imaginative and suspicious; it’s a curse. When it comes to money my “suspicious dial” is permanently set on eleven. I assume everyone is out to cheat and defraud me until there is overwhelming evidence to the contrary. Paranoia is generally crippling but when it comes to cold hard cash it’s a sound retention strategy.

Prompted by an eminent life move, I found myself in need of a cash flow forecasting tool. Normal people deal with forecasting problems by buying standard finance programs or cranking up spreadsheets; imaginative and suspicious people roll their own.

SWAG

SWAG, (Silly Wild Ass Guess), is a hybrid J/EXCEL/GIT mob1 that meets my eccentric needs. I wanted a tool that:

  1. Abstracted away accounting noise.
  2. Was general and flexible.
  3. Used highly portable, durable, and version control friendly inputs and outputs.
  4. Reflected what ordinary people, not tax accountants, actually do with money.
  5. Is open source and unencumbered by parasitic software patents.

Amazingly, my short list of no-brainer requirements eliminates most standard finance programs. Time to code!

SWAG Inputs

The bulk of SWAG is a JOD generated self-contained J script. You can peruse the script here. SWAG inputs and outputs are brain-dead simple TAB delimited text tables. Inputs consist of monthly, null-free, numeric time series tables, scenario tables, and name cross-reference tables. Outputs are simple, null-free, numeric time series tables. Input and output time series tables have identical formats.

A few examples will make this clear. The following is a typical SWAG input and output time series table.

 Date        E0      E1       E2      E3  E4     E5  E6  E7  E8  EC  EF  Etotal   I0       I1  I2  I3  I4  I5  IC  Itotal   R0         R1        R2  R3  Rtotal     D0  D1  D2  D3  D4  Dtotal  BB       NW         U0         U1  U2  U3
 2015-09-01  912.00  1650.00  100.00  0   50.00  0   0   0   0   0   0   2712.00  4800.00  0   0   0   0   0   0   4800.00  130000.00  25000.00  0   0   155000.00  0   0   0   0   0   0       2088.00  157088.00  155000.00  0   0   0 
 2015-10-01  912.00  1656.88  100.00  0   50.00  0   0   0   0   0   0   2718.88  4806.00  0   0   0   0   0   0   4806.00  130054.17  25062.50  0   0   155116.67  0   0   0   0   0   0       2087.13  159291.79  0          0   0   0 
 2015-11-01  912.00  1663.78  100.00  0   50.00  0   0   0   0   0   0   2725.78  4812.01  0   0   0   0   0   0   4812.01  130054.17  25062.50  0   0   155116.67  0   0   0   0   0   0       2086.23  161378.02  0          0   0   0 
 2015-12-01  912.00  1670.71  100.00  0   50.00  0   0   0   0   0   0   2732.71  4818.02  0   0   0   0   0   0   4818.02  130054.17  25062.50  0   0   155116.67  0   0   0   0   0   0       2085.31  163463.33  0          0   0   0 
 2016-01-01  912.00  1677.67  100.00  0   50.00  0   0   0   0   0   0   2739.67  4824.05  0   0   0   0   0   0   4824.05  130054.17  25062.50  0   0   155116.67  0   0   0   0   0   0       2084.37  165547.70  0          0   0   0 
 2016-02-01  912.00  1684.66  100.00  0   50.00  0   0   0   0   0   0   2746.66  4830.08  0   0   0   0   0   0   4830.08  130054.17  25062.50  0   0   155116.67  0   0   0   0   0   0       2083.41  167631.12  0          0   0   0 
 2016-03-01  912.00  1691.68  100.00  0   50.00  0   0   0   0   0   0   2753.68  4836.11  0   0   0   0   0   0   4836.11  130054.17  25062.50  0   0   155116.67  0   0   0   0   0   0       2082.43  169713.55  0          0   0   0 

The first header line is a simple list of names. The first name “Date” heads a column of first of month dates in YYYY-MM-DD format. The SWAG clock has month resolution and dates are the only nonnumeric items. Names beginning with “E” like E0, E1, …, are aggregated expenses. Names beginning with “I” like I0, I1, I2 … are income totals. “R” names are reserves: basically savings, investments, equity and so forth. “D” names are various debts. BB is basic period balance, NW is period net worth and “U” names are utility series. Utility series facilitate calculations. Remaining names are self-explanatory totals. Be aware that this table has been formatted for this blog. Examples of raw input and output tables can be found here.

The next ingredient in the SWAG stew is what many call a scenario. A scenario is a collection of prospective assumptions and actions. In one scenario you buy a Mercedes and assume interest rates remain low. In another, you take the bus and rates explode. When forecasting I evaluate five basic scenarios, grim, pessimistic, expected, optimistic, and exuberant. Being a negative Debbie Downer type I rarely invest time in exuberant scenarios. I concentrate on grim and pessimistic scenarios because once you are mentally prepared for the worst anything better feels like a lottery win.

The following is a typical SWAG scenario table. Scenario tables, like time series tables, are simple TAB delimited text files.

 Name         Scenario On Group       Value  OnDate     OffDate    Method   MethodArguments                                                Description                                                                     
 reservetotal s0          assumptions 0      2015-09-01 2015-10-01 assume   RSavings=. 0.5 [ RInvest=. 3 [ REquity=. 3 [ ROther=. 1        annual nominal percent reserve growth or decline during period                  
 car          s0                      50     2015-09-01 2035-08-01 history                                                                 annualized car maintenance until first death                                    
 house        s0                      912    2015-09-01 2016-08-01 history  BackPeriods=.1                                                 current rent until move                                                         
 insurance    s0                      100    2015-09-01 2035-08-01 history                                                                 car insurance                                                                   
 living       s0                      1650   2015-09-01 2044-01-01 history  YearInflate=.5                                                 normal monthly living expenses                                                  
 salary       s0                      4800   2015-09-01 2016-08-01 history  BackPeriods=.4 [ YearInflate=.1.5                              maintain net monthly income until move                                          
 reservetotal s0                      25000  2015-09-01 2015-10-01 reserve  Initial=.1 [ RInvest=. 1                                       stock value at model start                                                      
 reservetotal s0                      130000 2015-09-01 2015-10-01 reserve  Initial=.1                                                     savings at model start                                                          
 salary       s0                      4200   2016-08-01 2023-07-01 history  BackPeriods=.4 [ YearInflate=.1.5                              reduced net income after move until retirement                                  
 house        s0          move        2000   2016-08-01 2016-09-01 history                                                                 moving expenses                                                                 
 house        s0                      100    2016-08-01 2044-01-01 history                                                                 incidental housing expenses after move                                          
 house        s0                      100    2016-08-01 2044-01-01 history                                                                 home owners association payments                                                
 house        s0                      150    2016-08-01 2044-01-01 history                                                                 property taxes                                                                  
 reservetotal s0          buy house   110000 2016-08-01 2016-09-01 reserve  Initial=.1 [ REquity=. 1                                       down payment added to house equity initial setting prevents double spend        
 loan         s0          buy house   150000 2016-08-01 2023-02-01 borrow   Interest=. 4.5 [ YearTerm=. 30 [ DHouse=.1  [ LoanEquity=.1    30 year mortgage rate on house until inheritance                                
 reservetotal s0          buy house   110000 2016-08-01 2016-09-01 spend                                                                   down payment on house from savings                                              
 annuities    s0                      250    2018-07-01 2035-08-01 history                                                                 monthly retirement and other annuity payments end date is unknown               
 annuities    s0                      50     2018-07-01 2035-08-01 history                                                                 any government pension payments                                                 
 reservetotal s0          assumptions 0      2020-01-01 2044-01-01 assume   RSavings=. -2.5 [ RInvest=. -5.0 [ REquity=. -5.0 [ ROther=. 2 market tanks government introduces negative interest                            
 loan         s0          buy car     10000  2020-07-01 2025-07-01 borrow   Interest=. 5 [ YearTerm=. 5 [ DCar=.1                          pay balance of car at 5% for 5 years                                            
 reservetotal s0          buy car     7000   2020-07-01 2020-08-01 spend                                                                   car down payment from savings                                                   
 reservetotal s0          inherit     180000 2023-01-01 2023-02-01 reserve  Initial=.1                                                     inheritance to savings                                                          
 reservetotal s0          buy house   150000 2023-03-01 2023-04-01 transfer Fee=. 1500 [ DHouse=.1                                         pay off remaining mortgage balance after inheritance fee is closing cost        
 salary       s0                      1400   2023-07-01 2035-08-01 history                                                                 estimated social security payments spread over expected life                    
 insurance    s0                      700    2023-07-01 2024-12-01 history                                                                 medical insurance in the gap between retirement and spouse medicare eligibility 
 annuities    s0                      100    2024-12-01 2044-01-01 history                                                                 any retirement pension payments to spouse                                       
 annuities    s0                      100    2035-08-01 2044-01-01 history                                                                 any us social security survivor benefit after first death                       

Again the first header row is a simple list of names. Most scenario names are self-explanatory but four OnDate, OffDate, Method, and MethodArguments merit some explanation. SWAG series methods assume, history, reserve, transfer, borrow, and spend are modeled on what people typically do with cash.

  1. assume sets expected interest rates and other global assumptions for a given time period. SWAG series methods operate over a well-defined time period. The period is defined by OnDate and OffDate.
  2. history looks at past periods and estimates a numeric value that is projected into the future. Currently, history computes simple means but the underlying code can use arbitrary time series verbs.
  3. reserve manages savings, investments, equity and other cash-like instruments.
  4. borrow borrows money and sets future loan payments. borrow supports simple amortization loans but is also capable of reading an arbitrary payment schedule that can be used for exotic2 loans.
  5. transfer moves money between reserves, debts, expenses and income series.
  6. spend does just what you expect.

SWAG series methods adjust all the series affected by the method. As you might expect SWAG arguments methods are detailed. MethodArguments uses a restricted J syntax to set SWAG arguments. Argument order does not matter but only supported names are allowed. Many examples of SWAG MethodArguments can be found in the EXCEL spreadsheet tp.xlsx. I use EXCEL as a scenario editor. By setting EXCEL filters, you can manage many scenarios.

The final SWAG input is a name cross-reference table. It is another TAB delimited text file that defines SWAG names. You can inspect a typical cross-reference table here.

Running SWAG

To run SWAG you:

  1. Prepare input files.
  2. Start J, any front-end jconsole, JQT or JHS will do, and load the Swag script.
  3. Execute RunTheNumbers.
  4. Open the EXCEL spreadsheet swag.xlsx, click on the data ribbon and then press the “Refresh All” button.

Let’s work through the steps.

Prepare Input Files

By far the most difficult step is the first. Here you review your financial status which means checking bank balances, stock values, loan balances and so on. Depending on your holdings this could take anywhere from minutes to hours. I call this updating actuals. Not only is updating actuals the most difficult and time-consuming step it is also the most valuable. Money that is not closely watched leaks away.

I store my actuals in a simple tabbed spreadsheet. Each tab maintains an image of a text file. I enter my data and then cut and paste the sheets into a text editor where I apply final tweaks and then save the sheets as TAB delimited text files.

Monthly income, expenses and debts are easy to update but some of my holdings do not offer monthly statements. The verb RawReservesFromLast in Swag.ijs fills in missing months with the last known values. When I’m finished preparing input files I’m left with four actual TAB delimited files, RawIncome.txt, RawExpenses.txt, RawReserves.txt, and RawDebts.txt. You can inspect example actual files here.

Start J and load the Swag script.

The SWAG script is relatively self-contained. It can be run in any J session that loads the standard J profile. Load Swag with the standard load utility.

 load 'c:/pd/fd/swag/swag.ijs'

Here SWAG is loaded in JHS.

jhsswag

Execute RunTheNumbers

RunTheNumbers sets the SWAG configuration, loads scenarios, copies actuals to each scenario, and then evaluates each scenario. Scenarios are numbered. I use positive numbers for “production” scenarios and negative numbers for test scenarios. It sounds more complicated that it is. This is all you have to do to execute RunTheNumbers

 RunTheNumbers 0 1 2 3 4 

The code is simple and shows what’s going on.

RunTheNumbers=:3 : 0

NB.*RunTheNumbers v-- compute all scenarios on list (y).
NB.
NB. monad:  blclFiles =. RunTheNumbers ilScenarios
NB.
NB.   RunTheNumbers 0 1 2 3 4

NB. parameters sheet is the last config sheet
ModelConfiguration_Swag_=:MainConfiguration_Swag_
parms=. ".;{:LoadConfig 0
scfx=. ScenarioPrefix

ac=. toHOST fmttd ActualSheet 0
ac write TABSheetPath,'MainActuals',SheetExt

sf=. 0$a:
for_sn. y do.
  ac write TABSheetPath,scfx,(":sn),'Actuals',SheetExt
  sf=. sf , parms Swag sn [ LoadSheets sn
end.

sf 
)

RunTheNumbers writes a pair of TAB delimited forecast and statistics files for each scenario it evaluates.

Open swag.xlsx and press “Refresh All”

The spreadsheet swag.xlsx loads SWAG TAB delimited text files and plots results.3 I plot monthly cash flow, estimated net worth and debt/equity for each scenario. The following is a typical cash flow plot. It estimates mean monthly cash balance over the scenario time range.

meanbalance

The polynomial displayed on the graph is an estimated trend line. Things are looking bleak.

Here’s a typical net worth plot.

networth

In this happy scenario, we die broke and leave a giant bill for the government.4

So far SWAG has met my basic needs and forced me to pay more attention to the proverbial bottom line. As I use the system I will fix bugs, refine rough spots, and add strictly necessary features. Feel free to use or modify SWAG for your own purposes. If you find SWAG useful please leave a note on this blog or follow the SWAG repository on GitHub.


  1. What do you call dis-integrated collections of programs that you use to solve problems? Declaring such dog piles “systems” demeans the word “system” and gives the impression that everything has been planned. This is not how I roll. “Mob” is far more appropriate. It conveys a proper sense of disorder and danger.
  2. When borrowing money you should always plan on paying it all back. Insist on a complete iron clad repayment schedule. If such a schedule cannot be provided run like hell or prepare for the thick end of a baseball bat to be rammed up your financial ass.
  3. It may be necessary to adjust file paths on the EXCEL DATA ribbon to load SWAG TAB delimited text files.
  4. They can see me in Hell to collect.

Turning JOD Dump Script Tricks

Have you ever wondered how extremely prolific bloggers do it? How is it possible to crank out thousands of blog entries per year without creating a giant stinking pile of mediocre doo-doo? Like most deep medium mysteries it’s not very deep and there are no mysteries. The spewers, people who post like teenage girls tweet, use two basic strategies:

  1. Multiple authors: The heroic image of the lone blogger waging holy war against a sea of Internet idiocy is largely a myth. Many popular prolific blogs are the work of many hands. The editor at Analyze the Data not the Drivel eschews this tactic. Apparently he’s an incontinent and argumentative prima donna that sane people steer clear of.
  2. Content recycling: In elementary school this was called copying. Now that we’re all grown up we use terms like, “excerpting”, “abstracting”, and my favorite “re-purposing.” The basic idea is simple. Take something you’ve written elsewhere and repackage it as something new. Hey, all the cool kids are doing it!

The following is a slightly edited new appendix I have just added to the JOD manual. I am working to properly publish the JOD manual mostly so I can say that I’ve written a legitimate, albeit strange and queer, book.

I created this post by running the \LaTeX code of the manual appendix through the excellent utility pandoc, tweaking the resulting markdown, and then using pandoc again to generate html for this blog. pandoc is a great “re-purposing” tool!  

Finally, re-purposing is not entirely cynical. The act of moving material from one medium to another exposes problems. I found a few editing errors while creating this post that eluded my \LaTeX eyes. If you find more this is your chance to tell me what a moron I am.

Turning JOD Dump Script Tricks

Dump script generation is my favorite JOD feature. Dump scripts serialize JOD dictionaries; they are mainly used to back up dictionaries and interact with version control systems. However, dump scripts are general J scripts and can do much more! Maintaining a stable of healthy JOD dictionaries is easier if you can turn a few dump script tricks.1

  1. Flattening reference paths: Open JOD dictionaries define a reference path. For example, if you open the following dictionaries:
       NB. open four dictionaries
       od ;:'smugdev smug image utils'
    +-+-----------------------+-------+----+-----+-----+
    |1|opened (ro/ro/ro/ro) ->|smugdev|smug|image|utils|
    +-+-----------------------+-------+----+-----+-----+

    the reference path is /smugdev/smug/image/utils.

    When objects are retrieved each dictionary on the path is searched in reference path order. If there are no compelling reasons to maintain separate dictionaries you can improve JOD retrieval performance and simplify dictionary maintenance by flattening all or part of the path.

    To flatten the reference path do:

       NB. reopen the first three dictionaries on the path
       od ;:'smugdev smug image' [ 3 od ''
    +-+--------------------+-------+----+-----+
    |1|opened (ro/ro/ro) ->|smugdev|smug|image|
    +-+--------------------+-------+----+-----+
    
       NB. dump to a temporary file (df)
       df=: {: showpass make jpath '~jodtemp/smugflat.ijs'
    +-+---------------------------+-----------------------+
    |1|object(s) on path dumped ->|c:/jodtemp/smugflat.ijs|
    +-+---------------------------+-----------------------+
    
       NB. create a new flat dictionary
       newd 'smugflat';jpath '~jodtemp/smugflat' [ 3 od ''
    +-+---------------------+--------+--------------------+
    |1|dictionary created ->|smugflat|c:/jodtemp/smugflat/|
    +-+---------------------+--------+--------------------+
    
       NB. open the flat dictionary and (utils)
       od ;:'smugflat utils'
    +-+-----------------+--------+-----+
    |1|opened (rw/ro) ->|smugflat|utils|
    +-+-----------------+--------+-----+
    
       NB. reload dump script ... output not shown ...  
       0!:0 df

    The collapsed path /smugflat/utils will return the same objects as the longer path. It is important to understand that the collapsed dictionary smugflat does not necessarily contain the same objects found in the three original dictionaries smugdev, smug and image. If objects with the same name exist in the original dictionaries only the first one found will be in the collapsed dictionary.

  2. Merging dictionaries: If two dictionaries contain no overlapping objects it might make sense to merge them. This is easily achieved with dump scripts. To merge two or more dictionaries do:
       NB. open and dump first dictionary
       od 'dict0' [ 3 od ''
    +-+--------------+-----+
    |1|opened (rw) ->|dict0|
    +-+--------------+-----+
       df0=: {: showpass make jpath '~jodtemp/dict0.ijs'
    +-+---------------------------+--------------------+
    |1|object(s) on path dumped ->|c:/jodtemp/dict0.ijs|
    +-+---------------------------+--------------------+
    
       NB. open and dump second dictionary
       od 'dict1' [ 3 od ''
    +-+--------------+-----+
    |1|opened (rw) ->|dict1|
    +-+--------------+-----+
       df1=: {: showpass make jpath '~jodtemp/dict1.ijs'
    +-+---------------------------+--------------------+
    |1|object(s) on path dumped ->|c:/jodtemp/dict1.ijs|
    +-+---------------------------+--------------------+
    
       NB. create new merge dictionary
       newd 'mergedict';jpath '~jodtemp/mergedict' [ 3 od ''
    +-+---------------------+---------+---------------------+
    |1|dictionary created ->|mergedict|c:/jodtemp/mergedict/|
    +-+---------------------+---------+---------------------+
    
       NB. open merge dictionary and run dump scripts
       od 'mergedict'
    +-+--------------+---------+
    |1|opened (rw) ->|mergedict|
    +-+--------------+---------+
    
       NB. reload dump scripts ... output not shown ...  
       0!:0 df0  
       0!:0 df1

    Be careful when merging dictionaries. If there are common objects the last object loaded is the one retained in the merged dictionary.

  3. Updating master file parameters: When a new parameter is added to jodparms.ijs it will not be available in existing dictionaries. With dump scripts you can rebuild existing dictionaries and update parameters. To rebuild a dictionary with new or custom parameters do:
       NB. save current dictionary registrations
       (toHOST ; 1 { 5 od '') write_ajod_ jpath '~temp/jodregister.ijs'
    
       NB. open dictionary requiring parameter update 
       od 'dict0' [ 3 od ''
    +-+--------------+-----+
    |1|opened (rw) ->|dict0|
    +-+--------------+-----+
    
       NB. dump dictionary and close
       df=: {: showpass make jpath '~jodtemp/dict0.ijs'
    +-+---------------------------+--------------------+
    |1|object(s) on path dumped ->|c:/jodtemp/dict0.ijs|
    +-+---------------------------+--------------------+
    
       3 od ''
    +-+---------+-----+
    |1|closed ->|dict0|
    +-+---------+-----+
    
       NB. erase master file and JOD object id file
       ferase jpath '~addons/general/jod/jmaster.ijf'
    1
       ferase jpath '~addons/general/jod/jod.ijn'
    1
    
       NB. recycle JOD - this recreates (jmaster.ijf) and (jod.ijn) 
       NB. using the new dictionary parameters defined in (jodparms.ijs)   
       (jodon , jodoff) 1
    1 1
    
       NB. re-register dictionaries
       load jpath '~temp/jodregister.ijs'
    
       NB. create a new dictionary - it will have the new parameters
       newd 'dict0new';jpath '~jodtemp/dict0new' [ 3 od ''
    +-+---------------------+---------+-------------------+
    |1|dictionary created ->|dict0new|c:/jodtemp/dict0new/|
    +-+---------------------+---------+-------------------+
    
       od 'dict0new'
    +-+--------------+--------+
    |1|opened (rw) ->|dict0new|
    +-+--------------+--------+
    
       NB. reload dump script ... output not shown ...
       0!:0 df  

    Before executing complex dump script procedures back up your JOD dictionary folders and play with dump scripts on test dictionaries. Dump scripts are essential JOD dictionary maintenance tools but like most powerful tools they must be used with care.


  1. Spicing up one’s rhetoric with a double entendre like “turning tricks” may be construed as a microaggression. The point of colored language is to memorably make a point. You are unlikely to forget turning dump script tricks.

Cutting the Stinking Tauntaun and other Adventures in Software Archeology

The other day a software project that I had spent a year on was “put on the shelf.”  A year of effort was unceremoniously flushed down the software sewer.  A number of colleagues asked me, “How do you feel about this?” Would you believe relieved?

This is not false bravado or stupid sunny optimism. I am not naïve. I know this failure will hurt me. In modern corporations the parties that launch failures, usually management, seldom take the blame. Blame, like groundwater, sinks to the lowest levels. In software circles, it pools on the coding grunts doing the real work. It’s never said but always implied, that software failures are the exclusive result of programmer flaws.  If only we had worked harder and put in those sixty or eighty hour weeks for months at a time; if only we were a higher level of coding wizard that could crank out thousands of error free lines of code per hour, like the entirely fictional software geniuses on TV, things might have been different.

Indulging hypotheticals is like arguing with young Earth creationists about the distance of galaxies.

“Given the absolute constancy of the speed of light how is it possible to see the Andromeda galaxy, something we know is over two million light years away, in a six thousand-year-old universe?”

The inevitable reply to this young Earth killing query is always something to the effect that God could have made the universe with light en route.  God could have also made farts smell like Chanel #5 but sadly he[1] did not.  Similarly, God has also failed to staff corporations with legions of Spock level programmers ready and willing to satisfy whatever au courant notion sets up brain-keeping in management skulls.  The Andromeda galaxy is millions of light-years away, we don’t live on a six-thousand-year-old Earth, and software is not created by magic.  Projects work or flounder for very real reasons. I was not surprised by the termination of this project. I was surprised that it took so long to give up on something that was never going to work out as expected.

Working with bad legacy code always reminds me of the scene in The Empire Strikes Back when Han Solo rescues Luke on the ice world by cutting open his fallen Tauntaun with a light saber. When the Tauntaun’s guts spill out Solo says, “I thought they smelled bad on the outside.” Legacy code may look pretty bad on the outside, just wait until you cut into it. Only then do you release the full foul forceful stench.

The project, let’s call it the Stinking Tauntaun Project, was an exercise in legacy system reformation. Our task was adding major new features to a hoary old system written in a programming language that resembles hieroglyphics. I know and regularly use half a dozen programming languages. With rare exceptions, programming languages are more alike than different. From the time of Turing, we have known that all programming languages are essentially the same.  You can always implement one language in another; all programming systems exploit this fundamental equivalence. It’s theoretically possible to compile SQL into JavaScript and run the resulting code on an eight-bit interpreter written in ancient Batch I just wouldn’t recommend you start a SOX compliant software project to create such insanity. This goes double if only one programmer in your organization knows ancient Batch!  The Stinking Tauntaun Project wasn’t this crazy, but there were clear signs that we were setting out on what has been accurately described as a Death March.

Avoiding Death Marches, or finding ways to shift blame for them, is an essential survival skill for young corporate programmers. It’s not an exaggeration to say that a lot of modern software is built on the naiveté of the young. When you first learn to program you are seduced by its overpowering rationality. Here is a world founded on logic, mathematics and real engineering constraints. Things work and don’t work, for non-bullshit reasons! There aren’t any pointless arguments with ideological metrosexual pinheads about “privilege” or “gender.” Programs don’t give a crap about the skin color of the programmer or whether their mother was an abusive bull dyke. After a few years of inhaling rarefied logic fumes young programmers often make the gigantic and catastrophic mistake that the greater naked ape society in which they live also works on bullshit free principles. It’s at this delicate stage when you can drive the young programmer hard. There’s an old saying that salesmen like money and engineers like work so in most companies the salesmen get the money and the engineers get the work. As the great sage Homer Simpson would say, “It’s funny because it’s true!”

Death Marches shatter the naïve. Gaining a greater understanding of the nasty world we live in always does a naked ape good but there’s a cost; your career will suffer and you will suffer: enlightenment is proportional to pain! The school of hard knocks is a real thing and unlike that party school that you boozed through no one graduates alive. I’d strongly recommend a few Death Marches for every working programmer. To paraphrase Tolstoy, “All successful software projects are alike; every unsuccessful project fails in its own way!” You may consider what follows my modest contribution to the vast, comprehensively ignored, literature of software Death Marches.

Code Smell #1: A large mass of poorly written test free legacy code that does something important.

“Refactoring” is a word that I love to mock, but the refactoring literature showcases an astonishingly precise term: “code smell.” Code smells are exactly what you expect. Something about the code stinks. The Stinking Tauntaun code base didn’t just stink, it reeked like diuretic dog shit on choleric vomit. This was well-known throughout the company.  My general advice to young programmers that catch a whiff of rotting code is simple: flee, run, abort, get out of Dodge! Do not spill your precious bodily fluids cleaning up the messes of others.

When recruited a number of experienced hieroglyphic programmers asked me the pointed question, “How do you deal with giant stinking piles of legacy doo-doo?” Maybe it wasn’t quite phrased like that but the intention was clear. They were telling me that they were dealing with a seething mass of half-baked crappy code that somehow, despite its manifold flaws, was useful to the organization and couldn’t be put out of its well deserved misery.

I honestly answered. “Systems that have survived for many years, despite their problems and flaws, meet a need. You have to respect that.” I still believe this!  Every day I boot up ancient command shells that look pretty much like they did thirty years ago.  Why do I use these old tools? It’s simple; they do key tasks very efficiently. I try new tools all the time. I spent a few months playing with PowerShell.  It’s a very slick modern shell but it has one big problem. It takes far longer to load than ancient DOS or Bash shells. When I want to execute a few simple commands I find the few extra seconds it takes to get a PowerShell up and running highly annoying. Why should I wait when I don’t need the wonderful new features of PowerShell?  The Stinking Tauntaun System also met user needs and, just like I am horrified by the clunky ill-conceived incoherence of old shells, Stinking Tauntaun users held their noses and used the good bits of what’s overall a bad system.

Code Smell #2: No fully automatic version controlled test suite.

There is a tale, possibly apocryphal, from the dawn of programming that goes something like this: a young programmer of the ENIAC generation was having a bad day. He was trying to get some program running and was getting frustrated “re-wiring.” Our young programmer was experiencing one of the first “debugging” sessions back when real insects came into play. In a moment of life searing insight our young programmer realized that from now on most of his time would be consumed hunting down errors in faulty programs.  The tale goes dark here. I’ve often wondered what happened to the young programmer. Did he have a happy life or did he, like Sisyphus, push that damn program up the hill only to have it crash down again, and again, until management cried stop!

Debugging is a necessary evil. If you program — you debug. It’s impossible to eliminate debugging but at least you can approach it intelligently, and to this day, the only successful method for controlling program errors is the fully automatic version controlled ever-expanding test suite. Systems with such suites actually improve with time. Systems without such suites always degenerate and take down a few naïve programmers as they decay. The Stinking Tauntaun System lacked an automatic test suite. The Stinking Tauntaun System lacked a non-automatic test suite. The Stinking Tauntaun had no useful test cases what so ever! Testing The Stinking Tauntaun was more painful than dealing with its code. The primary tester had nightmares about The Stinking Tauntaun.

withouttests

My Power-Point-less illustration of Quality over Time for a system that lacks a fully automatic ever expanding version controlled test suite. Systems without well designed test rigs are almost without exception complete crap. Unless someone is paying you huge bucks it’s best to start sending out resumes when asked to fix such systems. You can ruin your health stirring your yummy ice cream into the shit but it’s almost certain the final mixture will taste more like shit than ice cream.

It’s the 21st century people! In ENIAC days you could overlook missing automatic test suites. Nowadays missing automatic test suites is a damning indictment of the organization that tolerates such heinous crimes against programming humanity!

Code Smell #3: Only one programmer is able to work with legacy code.

When I was hired there were a few hieroglyphic programmers on staff. No single programmer was saddled with the accumulated mistakes of prior decades. We could divide and deal with the pain. During this happy time our work was support oriented. We we’re not undertaking large projects. Then an inevitable bout of corporate reorganization occurred, followed by terminations and a stream of defectors fleeing the new order.  Eventually only one hieroglyphic programmer remained: moi!  Most programmers would have joined the exodus and not taken on the sins of other fathers but as I have already pointed out I am a software whore and proud of it.  Still even software sluts must avoid ending up like the, “there can be only one,” Highlander. Most of those sword duels ended rather badly as I recall.

Code Smell #4: Massive routines with far-ranging name scope.

With noxious fumes already cutting off our air supply we should have stopped in our tracks but we soldiered on because The Stinking Tauntaun System did something important and alternatives were not readily available. The corporation recognized this and started a number of parallel projects to explore New Stinking Tautaun’s.  I wasn’t lucky enough to work on any of the parallel projects. I had to get in the stall and shovel the Tauntaun manure.

We weren’t entirely clueless when we started Stinking Tauntaun work.  We recognized the precarious hieroglyphic programmer resource constraint and tried to divide the work into units that alleviated it. We decided to implement part of the new features in another programming language and call the new module from the old system. The hope was this would divide the work and possibly create something that might be used outside The Stinking Tauntaun System. This part of the project went reasonably well. The new module is entirely new code and works very well. Unfortunately, it was a small part of the greater effort of bolting new features into The Stinking Tauntaun System.

Stinking Tauntaun code is magisterial in its monolithic madness. The hieroglyphic programming language is famous for its concise coding style yet the main routine in the Stinking Tauntaun System is close to 10,000 lines long!  I had never seen hieroglyphic language routines of such length. I’ve only seen comparable programs a few times in my battle scared career. As a summer student I marveled at a 20,000 line, single routine, FORTRAN program.  At first I thought it was the output of some cross compiler but no, some insane, bat shit crazy government drone, (I was working for a government agency in western Canada), had coded it by hand — on freaking punch cards no less! Such stunning ineptitude is rare: most programmers have a passing acquaintance with modular design and know enough to reconsider things when code gets out of hand. We all have different thresholds for judging out-of-hand-ness, for me it’s any routine that’s longer than sixty lines: including the damn comments.

A 10,000 line routine is a monumental red flag but The Stinking Tautaun’s length was not the biggest problem. There is this thing programmer’s call “scope.”  Scope marks out name boundaries. It sets limits on where names have meaning or value. Ideally name scope is limited to the smallest feasible extent. You really don’t want global names! You absolutely don’t want global names like “X,” or “i,” or “T,” cavorting in your code! Most of the cryptic names in the Stinking Tauntaun System had far-ranging scope. Vast scopes make it difficult to safely make local changes. This makes it risky, dangerous and time-consuming to change code.  Large scopes also increase the burden on testers. If tiny changes have far-ranging consequences you have to retest large parts of the system every time you tweak it. All of this sucks up time and drains the pure bodily fluids of IT staff.

Code Smell #5: Basic assumptions about what is possible quickly prove wrong.

It was looking dark. Timid souls would have run but we plowed on. Our naïve hope rested on the theory that we could change a few touch points in The Stinking Tauntaun’s code base and then get out of Dodge before the dark kludge forest closed in. My first estimate of how many routines I would have to touch was around twenty. Two months into the project I had already altered over a hundred with no end in sight. The small orthogonal change we hoped to make was not possible because The Stinking Tauntaun System did not sensibly do one thing in one place. It did sort of the same thing in many places. You couldn’t make a small change and have things sensibly flow throughout the system.  I worked to isolate and modularize the mess but I should have just waved my hands and spent my days on LinkedIn looking for another job.

Code Smell #6: A development culture steeped in counterproductive ceremony.

On top of all these screeching sirens yelling stop there were other impediments. I work in a SOX saturated environment. Readers of my blog know that I consider SOX to be one of the biggest time-wasting piles of DC idiocies ever pushed on American companies. Like many “government solutions” SOX did not drain the intended swamp but forever saddled us with inane time-wasting ceremony and utterly stupid levels of micromanagement. Is Active Directory management really a concern of freaking government? Somehow it’s become one!  One of my favorite consequences of SOX is the religious ordering of development environments and the sacraments for promoting code changes. During the long bruising Stinking Tauntaun project many releases boiled down to me making changes to one file! Pushing a new version to test should have been a simple file copy.

Of course it couldn’t be that simple. Numerous “tickets” had to be approved. Another branch of IT, that was even more stressed and suffered higher levels of turnover than ours, was dragged in to execute a single trivial file copy. In many cases more bytes were generated by all this pointless time-wasting ceremony than I had changed in that single file. Of course all this took time, and as hard as I looked for a useless-time-wasting-bullshit category in our web-based hours tracking system, I couldn’t find one. There’s a management mantra: you can only improve what you measure.  Funny how companies never measure the bullshit.

In retrospect we should have junked The Stinking Tauntaun System or opted for a radical rewrite. It’s ironic but something like this is what’s going to happen. If I was young I would be bitter but I cashed in on this project. In my consulting days I was always on the lookout for Stinking Tauntauns: they were freaking gold mines for those of us that have acquired a taste for the bracing putrid fumes of rotting code.

[1] If you are the type of person that gets your panties in a knot about the gender of hypothetical supreme beings please go away.

JOD Update: Version 0.9.97*

JOD LogoIn the last year much has changed in the J world.

  1. There are new official J 8.0x builds for all supported platforms.
  2. The QT based IDE JDE has matured and is in widespread use.
  3. The column oriented J database JD is drawing new users to J and enticing J veterans to reconsider how we use databases.
  4. There is a small group of J system builders experimenting with additions, extensions and revisions of core J source code.

In short, there are have been enough changes to revisit and update JOD.

JOD version 0.9.97*1 is the first JOD update in many years that mocks the god of software compatibility. In particular:

  1. The syntax of the jodhelp verb has changed.
  2. The jodsource addon no longer uses a zip file to distribute JOD dump files.
  3. JOD online help will no longer be supported or updated.
  4. Volume size is no longer checked before creating new JOD dictionaries.
  5. There is a new version of jod.pdf.

jodhelp changes (#1, #3 and #5)

jodhelp has always been a kludge. In programmer speak a kludge is some half-baked facility added to a system after more essential features have stabilized. The original versions of jodhelp pointed at my rough notes. It was all the “documentation” I needed! Then others stared using JOD which resulted in an “evolved” online version of my notes. I originally thought that hosting my notes online would simultaneously serve user needs and cut the amount of time I spent maintaining documentation. In retrospect this wasn’t even wrong!

I used Google Documents to host my notes. If you’ve ever wondered why completely free Google Documents hasn’t obliterated expensive Microsoft Word or hoary old excellent \LaTeX I invite you to maintain a set of long-duration-documents with Google Documents. During jodhelp‘s online lifetime the basic internal format of Google Documents changed in a screw-your-old-documents upgrade which forced me to spend days repairing broken hyper-links and reformatting. I was not amused; you still get what you pay for!

I originally choose Google Documents because of its alleged global accessibility. Sadly, Google Documents is now often blocked by corporate and national firewalls. Even when it isn’t blocked it renders like a dog peeing on a fire hydrant. All these problems forced me to rewrite JOD documentation with a completely reliable tool: good old-fashioned \LaTeX. The result of my labors, jod.pdf, is now distributed by the joddocument addon and is easily browsed with jodhelp.

After jod.pdf‘s appearance another irritant surfaced: synchronizing jod.pdf and the online version. I tried using pandoc and markdown to generate both the online and PDF versions from the same source files but jod.pdf is too complex for not-to-fancy portable approaches. I was faced with a choice, lower my jod.pdf standards, or get rid of something I never really liked. I opted to drown a child and abandon online help. I don’t expect a lot of mourners at the funeral.

Using the new version of jodhelp requires installing the addon joddocument and configuring a J PDF reader. It’s also good idea to define a JQT PF key to pop up JOD help with a keystroke. To configure a J PDF reader edit the configuration file:

 ~config/base.cfg

this file is directly available from the JQT Edit\Configure menu. base.cfg defines a number of operating system dependent utilities. Make changes to the systems you use, save your changes, and restart J. The following example shows my Win64 system settings.

 case. 'Win' do.
   BoxForm=: 1
   Browser=: 'c:/Program Files (x86)/Google/Chrome/Application/chrome.exe'
   Browser_nox=: ''
   EPSReader=: 'c:/program files/ghostgum/gsview/gsview64.exe'
   PDFReader=: 'c:/uap/sumatra/SumatraPDF.exe'
   XDiff=: 'c:/uap/WinMerge-2.14.0-exe/winmergeu.exe'
   Editor=: 'c:/uap/notepad++/notepad++.exe %f'
   Editor_nox=: '' 

I use SumatraPDF to read PDF files on Windows. It’s a fast, lightweight, program that efficiently renders jod.pdf. Good PDF readers are available for all commonly used platforms.

To define JQT PK keys edit the configuration file:2

 ~config/userkeys.cfg
 

This file is also directly available from Edit\Configure menu. My JOD specific PF keys are:

 F3;1;Require JOD;require 'general/jod'
 Shift+F3;1;JOD Help;jodhelp 0
 F6;1;Dev Dicts;od cut 'joddev jod utils' [ 3 od ''
 Shift+F6;1;Fit Dev Dicts;od cut 'jodfit joddev jod utils' [ 3 od ''
 Ctrl+Shift+F6;1;Test Dev Dicts;od cut 'jodtest joddev jod utils' [ 3 od ''
 

Pressing Shift+F3 executes jodhelp 0 which pops up JOD help.

jodsource changes (#2)

The jodsource addon is a collection of JOD dump scripts. Dump scripts are serialized versions of binary JOD dictionaries. When executed they merge objects into the current JOD put dictionary. I use them primarily to move dictionaries around but they have other uses as well. Prior to this version I distributed the three main JOD development dump scripts, joddev, jod, and utils in one compressed zip file to reduce the size of JAL downloads.

The distributed script jodsourcesetup.ijs used the zfiles addon to extract these scripts and rebuild JOD development dictionaries. This worked on 32 bit Windows systems but failed elsewhere. J now runs on 32/64 bit Windows, Mac, Linux, IOS and Android systems. To better support all these variants I eliminated the zfiles dependency and pruned the JOD development dictionaries. The result is a more portable and smaller jodsource addon.

Bye bye volume sizing (#4)

Early versions of JOD ran in the now bygone era of floppy disks. It was possible to create many JOD dictionaries on a single standard 800 kilobyte 3.5 inch floppy. Compared to modern porcine-ware JOD, which many J’ers consider a huge system, is lithe and lean. In floppy days it was important to check if there was enough space on a floppy before creating another huge 48K empty JOD dictionary. This is a bit ridiculous today! If you don’t have 48K free on whatever device you are running you have far more serious problems than not being able to create JOD dictionaries.

Volume sizing code remained in JOD for years until it started giving me problems. Returning the size of very large network volumes can be time-consuming and there are serious portability issues. Every operating system calls different facilities to return volume sizes. Even worse, security settings on corporate networks and cloud architectures sometimes refuse to divulge national secrets like free byte counts.

To eliminate all these headaches this version of JOD no longer checks volume size when the FREESPACE noun is zero. To restore the previous behavior you have to edit the file

 ~addons/general/jod.ijs`

and change the line FREESPACE=:0 to whatever byte count you want. Alternatively, you could NGAF3 and just assume you have 48K free on your terabyte size volumes.

Still to come

You may have surmised from JOD’s version number that the system is still not feature complete.  The JOD manual lists a few words that I am planning to implement. I only develop JOD when I need something or I am bored out of my mind at work and need a break. Such intermittent motivators seldom insure project completion but I have found a new reason to finish JOD. To list a book on Goodreads or Amazon you need an ISBN number.  The hardcopy version of the JOD manual is a sort-of-published book. To complete the publishing process I need an ISBN. If I am going to bother with such formalities I might as well complete the system the manual describes. So there you have it a new software development motivator: vanity.


  1. The version number is *‘ed because you are always a point release from done!
  2. userkeys.cfg is only available for J 8.03 systems.
  3. Not Give a F%&k!