Parsing the Bitcoin Genesis Block with J

The genesis block is the first block on the Bitcoin blockchain. Satoshi Nakamoto, the mysterious entity that created Bitcoin, mined the genesis block on January 3, 2009. It’s been five years since the genesis block’s birth and Satoshi is still unknown, Bitcoin is bigger than ever, and the blockchain is longer than 300,000 blocks and growing.

One of the most important features of the blockchain is its immutability. After the Bitcoin network accepts a block and adds it to the blockchain it can never be altered. This makes Bitcoin blocks rare durable binary artifacts. The cryptographic hash algorithms that underpin the Bitcoin protocol enforce block immutability. If someone decides to tinker with a block, say maliciously flip a single bit, the block’s hash will change and the network will reject it. This is what makes it almost impossible to counterfeit Bitcoins. Bitcoins have been lost and stolen but they have never been successfully counterfeited. This sharply contrasts with funny money like the US dollar that is so routinely and brazenly counterfeited that many suspect the US government turns a blind eye.

The exceptional durability of Bitcoin blocks, coupled with the mysterious origins of Bitcoin, makes the genesis block one of the most intriguing and important byte runs in the world. This post was inspired by the now defunct post 285 bytes that changed the world. I would love to give you a link but this post has vanished. A secondary, but excellent reference is John Ratcliff’s How to Parse the Bitcoin BlockChain. I am adapting John’s nomenclature in what follows.

When programmers start exploring Bitcoin they often cut their teeth on parsing the genesis block. If you Google “blockchain parsing” you’ll find examples in dozens of programming languages. The most popular are C, C++, Java, PHP, C#, JavaScript, and the rest of the mainstream suspects. What you will not find, until now, are J examples.

So what does J bring to the table that makes yet another genesis block parser worth a look? Let’s take a look at Bitcoin addresses. The following is the Bitcoin address of this blog’s tip jar. Feel free to send as many Satoshis and full Bitcoins as you like to this address.

   tip=. '17MfYvFqSyeZcy7nKMbFrStFmmvaJ143fA'

There is nothing deep or mysterious about this funny string of letters; it’s just a plain old number in Bitcoin base 58 clothing. So, what is this number in standard format? Here’s how it’s calculated with J.

   BASE58=. '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
 
   dfb58=. 58x #. BASE58 i. ]
 
   dfb58 tip
 1709618896654985460726422911112500711652231559804656492485

The second line that defines dfb58, (decimal from base 58), is the complete J program! That’s it folks. You can troll the internet for days looking at base 58 to big integer converters and it’s unlikely you will find a shorter or more elegant conversion program. Not only is the J version short and sweet it’s also fast and versatile. Suppose you wanted to convert ten thousand Bitcoin addresses. The following converts ten thousand copies of tip.

   dfb58 10000 # ,: tip
 1709618896654985460726422911112500711652231559804656492485 17096188966549854607264...

At this point fanboys of mainstream programming languages typically pipe up with something like, “changing number encodings is inherently trivial; what about something more demanding like going the other way, say converting Bitcoin public keys to the base 58 address format?”

The public key in the genesis block is encoded in what many call the “challenge script.” Here is the genesis block’s challenge script in hex.

41 04 67 8A FD B0 FE 55 48 27 19 67 F1 A6 71 30 B7 10 5C D6 
A8 28 E0 39 09 A6 79 62 E0 EA 1F 61 DE B6 49 F6 BC 3F 4C EF 
38 C4 F3 55 04 E5 1E C1 12 DE 5C 38 4D F7 BA 0B 8D 57 8A 4C 
70 2B 6B F1 1D 5F AC

Public keys take a number of forms in the blockchain. John Ratcliff’s post summarizes the many forms you will run into. The genesis block uses the 65 byte ECDSA form. Converting this form to base 58 requires taking SHA-256 and RIPEMD-160 hashes. These hashes are available in OpenSSL which is conveniently distributed with J 8.02 JQT. Here’s how to convert the genesis block’s public key to base 58 with J.

   load 'c:/bitjd/scripts/sslhash.ijs'

   Base58frKey65=:3 : 0

   NB.*Base58frKey65 v-- 65 byte public Bitcoin key bytes to base 58.
   NB.
   NB. monad:  clB58 =. Base58frKey65 clBytes

   ekey=. (0{a.) , sr160 s256 y
   csum=. 4 {. s256 s256 ekey
   Base58Check ekey,csum
   )

   Base58frKey65 }. }: ChallengeScript
 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa

The ChallengeScript noun holds the bytes given in hex above. The verbs sr150, s256 and Base58Check are available in the J scripts sslhash and ParseGenesisBlock that I have put in the jacks repository on GitHub.

The following J verb ParseGenesisBlock reads the first full node Bitcoin block file and then extracts and checks the genesis block. ParseGenesisBlock tests the various verbs, (functions), it employs. As a side effect it clearly describes the layout of the genesis block and provides test data for anyone that’s interested.

If this post peeks your curiosity about J a good place to start learning about the language is the recently released New Dictionary of J. You can download a version of J for Windows, Linux, OS/X, IOS, and Android at Jsoftware’s main site.

ParseGenesisBlock=:3 : 0

NB.*ParseGenesisBlock v-- parse and check Bitcoin genesis block.
NB.
NB. monad:  clMsg =. ParseGenesisBlock clBlockFile
NB.
NB.   file=. 'c:/bitjd/blocks/blk00000.dat'
NB.   ParseGenesisBlock file

NB. fetch genesis block data
dat=. read y

NB. first 4 bytes are "sort of" block delimiters
MagicID=: (i. offset=. 4) { dat
'MagicID mismatch' assert 'F9BEB4D9' -: ,hfd a. i. MagicID

NB. next 4 bytes gives following block length
offset=. offset + 4 [ BlockLength=: _2 ic (offset + i. 4) { dat
'BlockLength mismatch' assert 285 = BlockLength

NB. next 4 bytes block format version - has changed
offset=. offset + 4 [ VersionNumber=: _2 ic (offset + i. 4) { dat

NB. next 32 bytes is previous block hash - genesis block
NB. has no previous hash and all bytes are set to 0
offset=. offset + 32 [ PreviousBlockHash=: (offset + i. 32) { dat
'PreviousBlockHash mismatch' assert (32#0) -: a. i. PreviousBlockHash

NB. next 32 bytes is the Merkle tree root hash
offset=. offset + 32 [ MerkleRoot=: (offset + i. 32) { dat
grh=. '3BA3EDFD7A7B12B27AC72C3E67768F617FC81BC3888A51323A9FB8AA4B1E5E4A'
'MerkleRoot mismatch' assert grh -: ,hfd a. i. MerkleRoot

NB. next 4 bytes is a unix epoch timestamp - rolls over 7th feb 2106
NB. there is no timezone information - it is interpreted as utc
offset=. offset + 4 [ TimeStamp=: _2 ic (offset + i. 4) { dat
'TimeStamp mismatch' assert 2009 1 3 18 15 5 -: ,tsfrunixsecs TimeStamp

NB. next 4 bytes represents block target difficulty
offset=. offset + 4 [ TargetDifficulty=: _2 ic (offset + i. 4) { dat
'TargetDifficulty mismatch' assert 486604799 = TargetDifficulty

NB. next 4 bytes is a random number nonce
offset=. offset + 4 [ Nonce=: (offset + i. 4) { dat
'Nonce mismatch' assert '1DAC2B7C' -: ,hfd a. i. Nonce

NB. next 1 to 9 bytes is the transaction count stored as a variable length integer
NB. see:  https://en.bitcoin.it/wiki/Protocol_specification#Variable_length_integer
offset=. offset + vlen [ 'vlen TransactionCount'=: vint (offset + i. 9) { dat
'TransactionCount mismatch' assert TransactionCount = 1  NB. (*)=. vlen

NB. next 4 bytes transaction version number
offset=. offset + 4 [ TransactionVersionNumber=: _2 ic (offset + i.4) { dat
'TransactionVersionNumber mismatch' assert 1 = TransactionVersionNumber

NB. next 1 to 9 bytes is the number of transaction inputs
offset=. offset + vlen [ 'vlen TransactionInputNumber'=: vint (offset + i. 9) { dat

NB. next 32 bytes is the hash of the input transaction
offset=. offset + 32 [ TransactionHash=: (offset + i. 32) { dat
'TransactionHash mismatch' assert (32#0) -: a. i. TransactionHash

NB. next 4 bytes is the input transaction index
offset=. offset + 4 [ TransactionIndex=: _2 ic (offset + i. 4) { dat
'TransactionIndex mismatch' assert _1 = TransactionIndex

NB. input script length is next
offset=. offset + vlen [ 'vlen InputScriptLength'=: vint (offset + i. 9) { dat
'InputScriptLength mismatch' assert 77 = InputScriptLength

NB. script data
offset=. offset + InputScriptLength [ InputScript=: (offset + i. InputScriptLength) { dat

NB. sequence number 4 bytes
offset=. offset + 4 [ SequenceNumber=: ,hfd a. i. (offset + i. 4) { dat
'SequenceNumber mismatch' assert 'FFFFFFFF' -: SequenceNumber

NB. output count 1 to 9 bytes
offset=. offset + vlen [ 'vlen OutputCount'=: vint (offset + i.9) { dat

NB. output value - number of satoshis sent
offset=. offset + 8 [ OutputSatoshis=: (offset + i.8) { dat  NB. 64 bit unsigned integer
'OutputSatoshis mismatch' assert '00F2052A01000000' -: ,hfd a. i. OutputSatoshis
OutputSatoshis=: ]`(_3&ic)@.IF64 OutputSatoshis

NB. challenge script length
offset=. offset + vlen [ 'vlen ChallengeScriptLength'=: vint (offset + i.9) { dat
'ChallengeScriptLength mismatch' assert 67 = ChallengeScriptLength

NB. challenge script - contains elliptic curve signatures
offset=. offset + ChallengeScriptLength [ ChallengeScript=: (offset + i. ChallengeScriptLength) { dat
'ChallengeScript mismatch' assert GenesisBlockChallengeScript -: ,hfd a. i. ChallengeScript

NB. challenge script is 67 bytes drop first and last byte to
NB. compute the familiar Bitcoin base 58 address - compare with block explorer
NB. http://blockexplorer.com/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
OutputAddress=: Base58frKey65 }. }: ChallengeScript
'Genesis Block address mismatch' assert GenesisBlockOutputAddress -: OutputAddress

NB. last 4 bytes lock time
TransactionLockTime=: (offset + i.4) { dat
'TransactionLockTime mismatch' assert 0 0 0 0 -: a. i. TransactionLockTime

'Genesis Block Parsed and Checked'
)

JOD Update: J 8.02 QT/JHS/64 bit Systems

JOD LogoI have pushed out a JOD update that makes it possible to run the addon on J 8.02 systems. In the last eight months a QT based J IDE has been developed that runs on Linux, Windows and Mac platforms. To maintain JOD’s compatibility across all versions of J from 6.02 on I had to tweak a few verbs.

The only significant changes are how JOD interacts with various J system editors.  I have tested the system on Windows  J 6.02, J 7.01, J 8.02, and Mac J 7.01 systems. I expect it will behave on 32 and 64 bit Linux systems from J 7.01 on, but I have yet to test these setups. My hardware budget limits my ability to run common variants of Windows, Linux and Mac systems.

JOD is still not complete; that’s why the version number has not been bumped past 1.0.0. The missing features are noted in the table of contents of jod.pdf, (also available in the joddocument addon), with the suffix “NIMP,” which means “not implemented.”  I will fill in these blanks as I need them. Most of the time JOD meets my needs so don’t hold your breath.

If you want to make your own additions to JOD the program and documentation source is available on GitHub. Just follow the links and enjoy.

As a last note: I will be at the J Conference in Toronto (July 24 and 25, 2014) where I will be giving a short presentation and handing out a few hardcopy versions of the JOD manual to one or two JOD fans.

APL Software Archaeology .dbi Edition

apltree

Have yourself a merry little APL Christmas.

I joke that my job title should be software archaeologist because I often find myself resurrecting, not refactoring, code that dates to primitive and primeval eras. The language I’m typically hired to resurrect is APL. APL, the language with funny symbols, is a software vampire. People keep paying us to kill it, but no matter how many stakes we pound through its heart it keeps coming back.

There are good reasons for this. APL embodies many timeless ideas and I’m confident that programming in the future will look a lot more like APL than many expect. If you doubt me just press the Siri button on your iPhone and ask, “Integrate X squared times sine X from 0 to 2.” What comes back has more of an APL than QWERTYUIOP flavor. Strange Unicode characters are creeping into many mainstream languages. This is a good thing because restricting programming to the miserly key sets of ancient typewriters was, is, and always will be a spectacularly bad idea. Ken Iverson deserves rich accolades for pointing this out more than fifty years ago and beating this drum incessantly during his lifetime. Iverson taught that notation is a tool of thought and that if you care about ideas you must care about how they are expressed. Why is this even remotely controversial?

siriintegral

Siri’s results use appropriate mathematical notations. As we move away from keyboards programming languages and mathematical notation will merge. APL was way ahead of its time in this respect.

The genius of APL continues to exert influence on many programming languages, but APL’s rise had little to do with its abstract notation and a lot to do with how it was implemented. APL was one of the first programming environments that nonprogrammers could use. It was the spreadsheet of the late 1960’s and 1970’s and just like spreadsheets of today a lot of utterly horrid, poorly structured, lame amateur messes were created with it. If you’ve ever cracked open a gigantic Excel model that looks like it was developed by a roomful of quarreling ADHD afflicted unionized chimpanzees then you know what the standard APL mess feels like. Many programmers blamed APL for this just like gun control advocates blame firearms for shootings. They argued that it would have been impossible to concoct such monsters in clean compiled languages like Pascal. “It wouldn’t even compile.” This is not even wrong. I’ve dealt with plenty of dreadful messes that do compile! The tool is always neutral; don’t blame the paintbrush for the painting.

Allowing rubes to code yields mountains of rubbish and the occasional ruby. It will shock many programmers to learn they are not the only smart people in the world. It turns out that nonprogrammers occasionally have good ideas and, miraculously, some of them can ably express their ideas in code. Before spreadsheets such user rubies congealed in APL where some still run. Part of my day job is extracting these precious stones from layers and layers of kluges, hacks, patch jobs, retro-fits and workarounds and recoding them in modern programming languages like C# and JavaScript.

Recently I recovered1 an ancient inverted file system embedded in the APL systems of my employer and rendered it in C#. This system uses the extension .dbi. I don’t know who created this system; the code is old. The most recent code comments date from the year 2000, but I am pretty sure that .dbi files predate component files in APL+WIN, formerly STSC APL, which pushes the design back to the 1980’s or earlier. I know many APL’ers check this blog. If any of you know who created the original .dbi APL code please leave a note.

Somehow this .dbi system survived unsupported, with few user complaints, for decades of daily use. How is this possible? Astonishingly, good ideas age well and the core .dbi idea is inverted data. Modern high-performance databases make heavy use of this method. Inversion is so effective that hoary old interpreted APL code still beats compiled and optimized ADO.Net when fetching large numeric vectors and tables.

Restoring the .dbi system was a two-step process.2 I first converted the APL system to J. I used J because it is a close relative of APL but not so close that you can cut and paste. Translating nontrivial APL to J forces you to understand the APL at the nit-bitty level. The translation to J also allowed me to fix the APL interface. The original system used global variables, rampant branches and other lamentable coding practices that C# will not abide. After matching the APL and J systems I then translated the J to C# and then rematched all three systems.

Comparing multiple systems is a very effective testing technique. I found bugs in all three systems. I fixed the J and C# bugs but left the original APL code unchanged. Software archaeology is a delicate field. You don’t “fix” old code just like you don’t correct errors in cuneiform tablets. Original and important program code belongs in museums with other significant cultural artifacts.

Original inverted file code probably belongs in a museum. This .dbi APL code is old, but it certainly derives from earlier programs so it’s not museum worthy. Even if it was the APL and C# .dbi systems belong to my employer. However, I am placing the J scaffold version, which matches the performance of the other systems, into the public domain. The script is available on GitHub and here. The .dbi system gets right down to bits in some cases and illustrates some J techniques for dealing with indexed binary inverted file data. Enjoy!


  1.  .dbi files held many gigabytes of actuarially tuned data. Dumping them was not an option. We either had to convert to a new store or produce a component that could read old data in new systems.
  2. Restoring old code is somewhat like restoring old pictures. When working on old pictures you’re always tempted to improve them. With pictures you usually have a choice. This may not hold for old code. Changes in software may force updates.

Jacks Repository

The other day I attempted to browse a J script described in an old blog post only to find that my employer’s network monkeys had blocked the file sharing service. I’ve railed about IT control freaks in the past. They will not rest until it’s impossible to do useful work. I fumed and grumbled until I perceived a bigger problem. I have so many references to program code in this blog that it’s getting tedious tracking them down. Wouldn’t it be nice if my hacks were neatly organized in one coherent repository?

Let me introduce jacks. jacks, or “J-hacks”, organizes the J related code referenced in this blog into a single GitHub repository. Most of the scripts in jacks are one-offs but some have proven so useful that it makes sense to store them in a repository and track changes. From now on jacks will be the first place to look for code from this blog. You pull the contents of jacks into a new Git repository with the commands:

git init
git remote add jacks https://github.com/bakerjd99/jacks.git
git pull jacks master

It took me a few moments to settle on the name “jacks.” I considered “jokes” because programmers often take their code too seriously and “jocks” because J programmers are wild out of control convention eschewing code jocks but jacks won out when I remembered the refrain “jack be nimble, jack be quick, jack jump over” whatever coding problem is pissing you off.

More about JHS with the DHTMLX Grid

I have resolved my DHTMLX standard edition row data extraction problem. The standard edition does not serialize grids or track user cell changes. You have to pay for such luxuries. Because I’m a foul software Grinch and this is just an exploratory hack I had to roll my own. I am posting the relevant JavaScript because I could not find similar examples. Here is how you can fetch rows from standard edition DHTMLX grids and save them as JSON in a hidden textarea element. Eric Iverson suggested hidden textareas and they work like a charm.

function ev_saveme_click(){

  if ('undefined' != typeof grid0){
  
    if (0 == grid0.getRowsNum()){
      jbyid("rerowcnt").innerHTML = "No rows to save"; 
      return;
    }

    var st = new Date().getTime(),  // start time  
        ids = grid0.getAllRowIds(","),
        ccnt = 1 + grid0.getColumnsNum();  // includes id
      
    ids = ids.split(",");  
    var rcnt = ids.length,
        tab = new Array(rcnt);
    
    // header row - tab[0][0] cell ignored
    tab[0] = new Array(ccnt);  
    for (var i = 1; i < ccnt; i++) {
      tab[0][i] = grid0.getColumnLabel(i-1,0); 
    }
     
    // cells with leading row id
    for (var i = 0 , si = 1 ; i < rcnt; i++ , si++) {
      tab[si] = new Array(ccnt);
      for (var j = 1; j < ccnt; j++) {
        tab[si][j] = grid0.cells((+ids[i]),j-1).getValue();
      }
      tab[si][0] = ids[i];
    }
  
    // prefix row column counts 
    var pfx = (rcnt+1) + " " + ccnt + "*";
    jbyid("gridchgs").innerHTML = pfx + JSON.stringify(tab);
    jdoajax(["gridchgs","tout"],"");
  
    var et = new Date().getTime() - st;  // end time    
    jbyid("rerowcnt").innerHTML= " row count= " + grid0.getRowsNum() +  
        ",  JavaScript ms= " + et; 
        
  } else {
  
    jbyid("rerowcnt").innerHTML= "Nothing to save";     
  }
}

Passing data back to J is fast but the J JSON addon convert\json burps on large datasets. For this demo I substituted a simple table oriented parser that is much faster.

JHS with the DHTMLX Grid

Grids are the most important GUI user object. It’s hard to think of a user-friendly data munching application that doesn’t have a grid beating at its heart. Consequently, any serious GUI interface contender must support grids. My previous post showed how to use MathJax with JHS. MathJax is an impressive and important JavaScript library; it clearly demonstrates the potential of CHJ1 GUI interfaces but let’s face it, mathematical typesetting will not win many consulting contracts. Grids won’t seal the deal either but their absence is a huge “next” signal. To support serious business and technical applications JHS needs grids.

Fortunately, the JavaScript world is grid saturated. The difficulty is not finding a grid but choosing among dozens of candidates. For this demo I Googled around and found DHTMLX. According to this probably biased article the DHTMLX grid performs well on large inputs and, more importantly, there is an open source version.

You have to start somewhere so I opted to use DHTMLX to build a simple CSV file editor. The CSV files I am going to edit are TAB delimited text files. Each file has a fixed number of columns with column names in the first row. Here is an example TAB delimited file. The idea is to load the file data into the grid. Tweak a few rows and save the result. By increasing the size of the CSV file we can gauge the performance of the grid. Let’s get started.

Using the DHTMLX grid requires some preparation.

  1. Create a local directory and edit J’s ~config/folders.cfg to reference the directory with the name GridDemo. jpath '~GridDemo' should return the full directory path.
  2. Download the files in the GridDemo folder and copy them to ~GridDemo.
  3. Download the Standard Edition (Version 3.5) of DHTMLX. The distribution file dhtmlxGrid.zip contains the grid source and supporting files.
  4. Extract the /dhtmlxGrid/codebase/ directory from dhtmlxGrid.zip and copy the entire directory tree to ~GridDemo.
  5. Also extract /dhtmlxGrid/samples/common from dhtmlxGrid.zip and copy the directory to ~GridDemo.

When you’re finished the top-level of ~GridDemo will look like the following where names without extensions are directories.

    calendar           dhtmlxgrid.js         GridDemo.ijs   t100rows.txt
    common             dhtmlxgrid_skins.css  imgs           t5000rows.txt
    dhtmlxcommon.js    excells               jodoval.png
    dhtmlxgridcell.js  ext                   skins
    dhtmlxgrid.css     favicon.ico           t1000rows.txt

The main J script is ~GridDemo\GridDemo.ijs. Start JHS and load this file.

    load '~GridDemo/GridDemo.ijs'

Then browse to this site.

    http://127.0.0.1:65001/GridDemo

If all goes well you will see the following GridDemo page after pressing the Edit Grid button.

Screenshot of GridDemo running on Chrome

Screenshot of GridDemo running on Chrome

To load and edit files enter their fully qualified names in the Input and Output boxes and press Edit Grid. To edit a cell double-click it. To save changes press Save Grid.2 There are more sophisticated ways to pick files on JavaScript pages. It’s easy to pop up standard host OS file dialogs but it’s not particularly easy to determine host directory paths. This post outlines the demons web programmers must slay to select host files. JHS circumvents these difficulties by asking the J server, which is a typically a local console process, to do the dirty work. JavaScript’s access to local files is limited for security reasons but J has no such restrictions. Use the force Luke!

Three test files t100rows.txt, t1000rows.txt, and t5000rows.txt are included with the demo. On my test machines load times vary from fractions of a second for the smaller files to nine seconds for the largest. This is competitive with the basic C# grid control and fast enough for serious work.

In subsequent posts I will explore JavaScript/JHS graphics options and start the process of integrating, grids, graphs and MathJax with JHS.


  1. CSS, HTML and JavaScript.
  2. The freebie version of DHTMLX does not support grid serialization. Here is how to roll your own.