I’ve been blogging long enough to observe that your awesome posts are often ignored while your little “one-offs” sometimes strike hit gold. This is particularly true for “code example” posts. When I’m trolling for code — the geek equivalent of trolling for babes — I don’t want to read about the author’s programming philosophy or what the hell he thinks about the idiot in the White House. Just show me the code and shut up!
So, taking my own advice, the following is an example of a SQLSever Common Table Expression (CTE) query.
WITH cte1 AS (SELECT c.obj_id AS obj_id, a.created AS creation_date, Datediff(DAY, a.created, Getdate()) AS day_cnt FROM [HIST].[dbo].[History] a JOIN [HIST].[dbo].[HistorySummary] b ON a.rowid = b.rowid JOIN [HIST].[dbo].[ObjectNames] c ON c.obj_name = b.obj_name), cte2 AS (SELECT cte1.obj_id AS obj_id, CASE WHEN cte1.day_cnt <= 30 THEN 1 ELSE 0 END AS d0, CASE WHEN ( cte1.day_cnt > 30 ) AND ( cte1.day_cnt <= 60 ) THEN 1 ELSE 0 END AS d1, CASE WHEN ( cte1.day_cnt > 60 ) AND ( cte1.day_cnt <= 90 ) THEN 1 ELSE 0 END AS d2, CASE WHEN ( cte1.day_cnt > 90 ) THEN 1 ELSE 0 END AS d3 FROM cte1) SELECT obj_id, SUM(d0) AS OneMonth, SUM(d1) AS TwoMonths, SUM(d2) AS ThreeMonths, SUM(d3) AS Overdue FROM cte2 GROUP BY obj_id
CTE queries essentially create temporary virtual tables during query execution. They are similar to nested SQL queries but are easier to code, more general, (see recursive CTE queries), and often perform better than their convoluted equivalents. This example creates two virtual tables cte1 and cte2 that are then used to compute a quick histogram.
You can call CTE queries with the J ODBC interface. The following assumes a SQLServer ODBC connection dsn history.
NB. odbc interface require 'dd' NB. read CTE query HistoryAgeSQL=. read 'c:/temp/HistoryAge.sql' NB. connect sqlserver database ch =. ddcon 'dsn=history' NB. select with CTE query sh =. HistoryAgeSQL ddsel ch NB. fetch results data=. ddfet sh,_1
As a final note it’s worth comparing the SQL CTE histogram code with J equivalents. The two following J verbs taken from the J wiki, (here and here), compute histograms.
NB. computes histograms uses right open intervals histogram=:<:@(#/.~)@(i.@#@[ , I.) NB. variation on (histogram) uses left open intervals histogram2=:<:@(#/.~)@(i.@>:@#@[ , |.@[ (#@[ - I.) ])
They scale to tens of millions of data points; returning results in a few seconds on my laptop. The SQL CTE, shown above, takes about three seconds running on 180,000 row tables on my employer’s full warp servers. If I could convince the masses to adopt languages like J a large part of my job would disappear. Fortunately, the world is hostile to terse elegance!
You may be right about the timings if you consider the final
SELECT
the equivalent of the J expressions. Will you be at the J conference this summer?Given that I am currently in the Maritimes, not London, it will be more difficult getting to Toronto, but who knows?
having the WITH _ AS _ can prevent deep nesting of queries, actually a prime use for the ‘variable’ concept, although the setting of a variable that is only used once doesn’t really add much of a time savings.
I suspect measurements of the J verbs histogram and histogram2 do not include the creation of the arguments to each, something the SQL seems to include.
A =: createHistogramArg ” NB. perhaps this takes a lot of time
]histogram A NB. ‘a few seconds’ … maybe so