Extracting SQL code from SSIS dtsx packages with Python lxml

Lately, I’ve been refactoring a sprawling SSIS (SQL Server Integration Services) package that ineffectually wrestles with large XML files. In this programmer’s opinion using SSIS for heavy-duty XML parsing is geeky self-abuse so I’ve opted to replace an eye-ball straining[1] SSIS package with half a dozen, “as simple as possible but no simpler”, Python scripts. If the Python is fast enough for production great! If not the scripts will serve as a clear model[2] for something faster.

I’m only refactoring[3] part of a larger ETL process so whatever I do it must mesh with the rest of the mess.

So where is the rest of the SSIS mess?

SSIS’s visual editor does a wonderful job of hiding the damn code!

This is a problem!

If only there was a simple way to troll through large sprawling SSIS spider-webby packages and extract the good bits. Fortunately, Python’s XML parsing tools can be easily applied to SSIS dtsx files. SSIS dtsx files are XML files. The following code snippets illustrate how to hack these files.

First import the required Python modules. lxml is not always included in Python distributions. Use the pip or conda tools to install this module.

# imports
import os
from lxml import etree

Set an output directory. I’m running on a Windows machine. If you’re on a Mac or Linux machine adjust the path.

# set sql output directory
sql_out = r"C:\temp\dtsxsql"
if not os.path.isdir(sql_out):
    os.makedirs(sql_out)

Point to the dtsx package you want to extract code from.

# dtsx files
dtsx_path = r'C:\Users\john\AnacondaProjects\testfolder\bixml'
ssis_dtsx = dtsx_path + r'\ParseXML.dtsx'
ssis_dtsx

Read and parse the SSIS package.

tree = etree.parse(ssis_dtsx)
root = tree.getroot()
root.tag
'{www.microsoft.com/SqlServer/Dts}Executable'

lxml renders XML namespace tags like <DTS:Executable as {www.microsoft.com/SqlServer/Dts}Executable. The following
shows all the transformed element tags in the dtsx package.

# collect unique element tags in dtsx
ele_set = set()
for ele in root.xpath(".//*"):
    ele_set.add(ele.tag)    
print(ele_set)
print(len(ele_set))
{'InnerObject', '{www.microsoft.com/SqlServer/Dts}PrecedenceConstraint', '{www.microsoft.com/SqlServer/Dts}ObjectData', '{www.microsoft.com/SqlServer/Dts}PackageParameter', '{www.microsoft.com/SqlServer/Dts}LogProvider', '{http://schemas.xmlsoap.org/soap/envelope/}Envelope', '{www.microsoft.com/SqlServer/Dts}Executable', 'ExpressionTask', '{http://schemas.xmlsoap.org/soap/envelope/}Body', '{www.microsoft.com/SqlServer/Dts}Variable', '{www.microsoft.com/SqlServer/Dts}ForEachVariableMapping', '{www.microsoft.com/SqlServer/Dts}PrecedenceConstraints', '{www.microsoft.com/SqlServer/Dts}SelectedLogProviders', 'ForEachFileEnumeratorProperties', '{www.microsoft.com/SqlServer/Dts}ForEachVariableMappings', '{www.microsoft.com/SqlServer/Dts}ForEachEnumerator', '{www.microsoft.com/SqlServer/Dts}SelectedLogProvider', '{www.microsoft.com/SqlServer/Dts}DesignTimeProperties', '{www.microsoft.com/SqlServer/Dts}LogProviders', '{www.microsoft.com/SqlServer/Dts}LoggingOptions', '{www.microsoft.com/SqlServer/Dts}Variables', 'FEFEProperty', 'FileSystemData', 'ProjectItem', '{www.microsoft.com/SqlServer/Dts}Property', '{http://www.w3.org/2001/XMLSchema}anyType', '{www.microsoft.com/SqlServer/Dts}Executables', '{www.microsoft.com/sqlserver/dts/tasks/sqltask}ParameterBinding', '{www.microsoft.com/sqlserver/dts/tasks/sqltask}ResultBinding', '{www.microsoft.com/SqlServer/Dts}VariableValue', 'FEEADO', 'BinaryItem', '{www.microsoft.com/SqlServer/Dts}PackageParameters', '{www.microsoft.com/SqlServer/Dts}PropertyExpression', '{www.microsoft.com/sqlserver/dts/tasks/sqltask}SqlTaskData', 'ScriptProject'}
36

Using transformed element tags of interest blast over the dtsx and suck out the bits of interest.

# extract sql code in source statements and write to *.sql files 
total_bytes = 0
package_name = root.attrib['{www.microsoft.com/SqlServer/Dts}ObjectName'].replace(" ","")
for cnt, ele in enumerate(root.xpath(".//*")):
    if ele.tag == "{www.microsoft.com/SqlServer/Dts}Executable":
        attr = ele.attrib
        for child0 in ele:
            if child0.tag == "{www.microsoft.com/SqlServer/Dts}ObjectData":
                for child1 in child0:
                    sql_comment = attr["{www.microsoft.com/SqlServer/Dts}ObjectName"].strip()
                    if child1.tag == "{www.microsoft.com/sqlserver/dts/tasks/sqltask}SqlTaskData":
                        dtsx_sql = child1.attrib["{www.microsoft.com/sqlserver/dts/tasks/sqltask}SqlStatementSource"]
                        dtsx_sql = "-- " + sql_comment + "\n" + dtsx_sql
                        sql_file = sql_out + "\\" + package_name + str(cnt) + ".sql"
                        total_bytes += len(dtsx_sql)
                        print((len(dtsx_sql), sql_comment, sql_file))
                        with open(sql_file, "w") as file:
                            file.write(dtsx_sql)
print(('total sql code bytes',total_bytes))
   (2817, 'Add Record to ZipProcessLog', 'C:\\temp\\dtsxsql\\2_ParseXML225.sql')
    (48, 'Dummy SQL - End Loop for ZipFileName', 'C:\\temp\\dtsxsql\\2_ParseXML268.sql')
    (1327, 'Add Record to XMLProcessLog', 'C:\\temp\\dtsxsql\\2_ParseXML293.sql')
    (546, 'Delete Prior Loads in XMLProcessLog Table for Looped XML', 'C:\\temp\\dtsxsql\\2_ParseXML304.sql')
    (759, 'Delete Prior Loads to Node Table for Looped XML', 'C:\\temp\\dtsxsql\\2_ParseXML312.sql')
    (48, 'Dummy SQL - End Loop for XMLFileName', 'C:\\temp\\dtsxsql\\2_ParseXML320.sql')
    (1862, 'Set Variable DeletePriorImportNodeFlag', 'C:\\temp\\dtsxsql\\2_ParseXML356.sql')
    (55, 'Shred XML to Node Table', 'C:\\temp\\dtsxsql\\2_ParseXML365.sql')
    (1011, 'Update LoadEndDatetime and XMLRecordCount in XMLProcessLog', 'C:\\temp\\dtsxsql\\2_ParseXML371.sql')
    (1060, 'Update LoadEndDatetime and XMLRecordCount in XMLProcessLog - Shred Failure', 'C:\\temp\\dtsxsql\\2_ParseXML382.sql')
    (675, 'Load object VariablesList (Nodes to process for each XML File Category)', 'C:\\temp\\dtsxsql\\2_ParseXML412.sql')
    (1175, 'Set Variable ZipProcessFlag - Has Zip Had A Prior Successful Run', 'C:\\temp\\dtsxsql\\2_ParseXML461.sql')
    (224, 'Set ZipProcessed Status (End Zip)', 'C:\\temp\\dtsxsql\\2_ParseXML474.sql')
    (238, 'Set ZipProcessed Status (Zip Already Processed)', 'C:\\temp\\dtsxsql\\2_ParseXML480.sql')
    (231, 'Set ZipProcessing Status (Zip Starting)', 'C:\\temp\\dtsxsql\\2_ParseXML486.sql')
    (609, 'Update LoadEndDatetime in ZipProcessLog', 'C:\\temp\\dtsxsql\\2_ParseXML506.sql')
    (613, 'Update ZipLog UnzipCompletedDateTime Equal to GETDATE', 'C:\\temp\\dtsxsql\\2_ParseXML514.sql')
    (1610, 'Update ZipProcessLog ExtractedFileCount', 'C:\\temp\\dtsxsql\\2_ParseXML522.sql')
    ('total sql code bytes', 14908)

The code snippets in this post are available in this Jupyter notebook: Extracting SQL code from SSIS dtsx packages with Python lxml. Download and tweak for your dtsx nightmare!



  1. I frequently run into SSIS packages that cannot be viewed on 4K monitors when fully zoomed out.
  2. Python’s readability is a major asset when disentangling mess-ware.
  3. Yes, I’ve railed about the word “refactoring” in the past but I’ve moved on and so should you. “A foolish consistency is the hobgoblin of little minds.”

Sudden Genocide

If genocide is sudden, painless, unexpected, complete and absolute, if a people simply vanish without screams, without fear, without anticipation, if one nanosecond they are and the next nanosecond they are not, and if somehow you are responsible, are you a war criminal or a savior? Ultimately we all vanish, usually with screams, usually with fear, usually with anticipation. For years I’ve longed for a sudden painless death: a dismembering that tears me apart faster than my nerves can relay pain. Imagine such an end for all of us – at once – now. Why linger? We will go extinct. There will be a last human. And if the last human dies with screams, with fear, with anticipation how is that better than sudden genocide?

Who Thought Blinking Windfarms was a Good Idea?

One night, a few weeks ago, I was driving west on I86 near American Falls when I spotted a long string of blinking red lights. The lights stretched over a large arc of the horizon. My first thought was “Jesus H. Christ now what?” As an amateur astronomer, I have climbed mountains to get away from light pollution. Now some jackwagon was ruining an entire rural horizon with a goddamn string of synchronized windfarm lights.

May I ask why?

Windfarms blink at night to warn planes they’re flying to low. Please! The towers are well under 400 meters. If you are flying a plane below 400 meters in mountainous locales like Idaho you have far more serious problems than running into wind turbines. This is another example of stupid regulation. There is absolutely no good reason for lighting up entire landscapes. It wrecks the view, distracts drivers, (cars on I86 were slowing down to get a better view), wastes energy, rapes the night sky, and reminds everyone what an environmental tax-subsidized eyesore windfarms are. Don’t even think of disagreeing. When was the last time you looked at a landscape littered with wind turbines and thought, “This is so much better than it was before.”

Yes, I know windfarms are saving us from global warming. If you believe that you are probably exactly the type of person that signed off on ruining an entire county’s nighttime view with goddamn blinking windfarms.

Trey and Kate: Review

This will be a completely biased review. I have a close relationship with the author so everything I say must be verified. Please buy Trey and Kate, read it, and make up your mind. With that caveat out of the way let’s get started.

Trey and Kate is a tale about an on and off again Millennial romance that plays out in Kingston Ontario. The two leads are not exactly star crossed lovers. They’re both partly broken and struggling with mental illness, past life hallucinations, deficient friends, and uncomprehending divorced families. Kate is bipolar and goes on and off her meds throughout. Trey is stuck in a dead-end barista job: remember Millennials. He’s mourning a deceased unloving father and has only two reliable relationships: his cat and his mother. Trey and Kate’s dreams hint at a shared past and promise a joint future but offer little practical guidance.

With destiny seemingly on their side, you would expect their romance to go smoothly; it does not. When Kate goes off her meds she’s impulsive and prone to risky behavior. The book’s best passages detail her sordid bouts of random sex with total strangers. It’s almost prostitution but Kate doesn’t have the business sense of a prostitute. Of course, this doesn’t help her relationship with Trey. To his credit or shame, he forgives her but we’re not sure if his forgiveness is self-pity. Trey’s self-esteem is so low he finds it almost comical than any woman could love him. Welcome to the club Trey. Trey and Kate’s interaction is both frustrating, satisfying, embarrassing, irritating and fulfilling.

Trey and Kate is the author’s first book. I know the author struggled to put the book together. Its best parts are purely descriptive and when the author shows us what the characters are seeing and feeling the prose tells. When the text ventures into rhetorical semi-poetic asides it hollows out. Trey and Kate feels like a screenplay disguised as a novel. This is partly due to the almost cinematic presence of the setting Kingston Ontario, a dull stone-filled town that would be unlivable without Lake Ontario, and Kingston’s wretched weather which is every bit as bad as it’s portrayed in Trey and Kate. I’d encourage the author to keep writing, rewriting and experimenting. There are good stories to tell here.

The Return of the Prodigal Blogger

It’s been ages since my last blog post. Yes, I’ve been a very, very bad blogger. Lesser men would throw themselves on the metaphorical feet of their readers and beg for forgiveness but if you’re expecting apologies you don’t know me! I write for myself; if you choose to read my ramblings, well that’s on you.

Since my last entry I have:

  1. Retired. I finally pulled the plug on being a so-called productive member of society. Now I’m an old Social Security draining parasite. Since I no longer pay net taxes I am effectively dead to the state and they would love it if I was actually dead. Dead people are easier to finance. Unfortunately, I’ve always been on to the deep state and my new mission in life is to claw back every single tax dollar I ever paid with butt loads of interest. When I snuff off this mortal coil I am going stick you with a giant uncollectable I OWE FUCKING YOU! If you create financially unsustainable systems that encourage abuse, well guess what, you’re going to get abused. God, I’m loving being a bitter old man; it’s what I was born to be.
  2. Continued to pursue my hobbies, especially photography. This year (2019) I set the mini-goals of shooting, on average, one picture per day and scanning at least three hundred prints and slides. This may not sound like a lot but it takes me time to select the best images, process RAW files, restore film scans, edit or hack pictures, write captions and compute keywords.  I treat every uploaded image as a milliblogging opportunity.  Some of my image captions are longer than some of my blog posts.
  3. Taken on some new family responsibilities and obligations.
  4. Taken some trips.
  5. Worked on various personal software projects.

Arthur C. Clarke once remarked that only unimaginative people get bored. I’ve always had something on my mind and I’ve always lived in my head. This has always been my problem and my strength. With retirement, I am casting off my shriveled shackles of pretense. I’m not even going to pretend to care about other people’s problems. I will think about what I find fascinating and do what I find worthwhile.

Call it retirement privilege snowflakes!

Now get back to work and pay your taxes you have old parasites to support.

iNap #1: Enough People are Scum

Intelligible systems are built on a few basic principles.1 While reducing my dour skepticism to the memorable maxims that codify Informed Naked Ape Protocol I repeatedly asked myself what animates unflagging skepticism? What turns naturally cheerful and optimistic people like myself into raging cynics? What motivates noble trolls to put down the pizza and grab the keyboard? Only one answer sprang to mind: other people.

Regardless of your politics, gender, education, race, age, nationality, or ethnicity, you have probably noticed there are a lot of scummy people out there.

Why is this?

The answer derives from our evolution. Evolutionary advantages often accrue to individuals that cheat. Cheating is a fundamental behavioral adaptation that has been observed in many animal, bird and plant species. Cheating is so common that cheating the system is the system! How cheaters might benefit was best illustrated in the hilarious Ricky Gervais movie The Invention of Lying. In Ricky’s world, everybody told the truth until one day he discovered that you could lie. The best scene in the film has Ricky running into a beautiful woman on the street. He tells her that the world is doomed unless she immediately agrees to have sex with him. Given the dire circumstances, she instantly agrees to save the world. Clearly, liars are going to enjoy immense reproductive success in a world of truth tellers. Similarly, scumbags will profit in a world of purely honorable people.

We don’t live in a world of purely honorable people or purely scummy people. Human scum density is complicated; it depends on more variables than the weather. Nevertheless, we can infer that human scum density is seldom zero and is often appreciable.

How big is appreciable?

My bitter sampling of humanity yields an estimate of approximately 0.05 for contemporary American society.2 It’s definitely bad news that 5% of the people around you cannot be trusted or depended on. It’s even worse news that human scum, like pond scum, often floats to the upper echelons of society. This is a nasty reality and I wish it wasn’t so but reality is often unpleasant and leaves few options: either adapt or be crushed.

The first step in adapting to scummy naked apes is acknowledging the most fundamental fact about them — enough people are scum!


  1. The human preference for systems with a few axioms is an artifact of our primitive intellects. There are few if any human beings that would be comfortable with axiomatic theories that depend on trillions of independent axioms yet we know that such systems exist and remain incomplete. Our drive to reduce things to a manageable set of rules, even when we know it is naïve and futile, amounts to little more than thumb-sucking. It makes our baby brains happy even if it will not solve our problems.
  2. Yes, human scum density varies with culture. Some societies are briefly more virtuous than others.

Informed Naked Ape Protocol

Many think we are living in a golden age of bullshit. That public discourse has never been gaudier or more demeaning. That respect for truth and decency has reached all-time lows. The mental pygmies that hold these opinions don’t read or think for themselves. Deceiving ourselves and others is the one thing our species excels at. Lies are the bedrock of art, economics, politics, and religion. Only two tiny slivers of human thought have ever breached the bullshit barrier and approached something that might be credibly labeled truth: hard science, and harder mathematics.

If you think I am going to praise brave men, (and bitches), of science for lighting a candle in the perpetual darkness, (Carl Sagan already wrote an entire fawning book on this self-aggrandizing theme), think again. Scientists are just as flawed and full of crap as the rest of us. Science occasionally succeeds because it has evolved protocols that correct for human bullshit. A good protocol protects us against our worst enemy – ourselves.

My Informed Naked Ape Protocol, or iNap for short, consists of eleven1 pithy maxims that force a hard ass skeptical view of things. I will manifest my maxims here and elaborate on each one in following posts.

Informed Naked Ape Protocol

  1. Enough people are scum.
  2. Trust is for imbeciles.
  3. “Belief” is a bullshit word.
  4. Assume corruption.
  5. Analyze the data, not the drivel.
  6. Demand full analytic disclosure.
  7. Practice relentless verification.
  8. Centralized systems are always corrupted.
  9. If you don’t control it you cannot trust it.
  10. Only scientific and mathematical arguments are admissible.
  11. Correct errors.

  1. Why eleven? The last time somebody tried to get the inhabitants of planet moron to follow ten simple rules it didn’t work out.