PIVOT in PowerShell

Words: 1151

Time to read: ~ 6 minutes

Apologies

I’m going to start this post off with an apology.

As Kevin Feasel ( Blog | Twitter ) mentioned about my last post Attempting SUM() OVER () in PowerShell:

It’d be a lot easier, though, with a properly normalized data model which includes date, attempt number, and push-ups in that attempt. Pivot those results at the end if you want this sort of report, but SQL is designed to work best with tables in first normal form or higher.

Kevin Feasel

I can’t very well give out to people for not doing the right thing first time, even if it’s more difficult, if I don’t do the right thing myself!

As Kevin mentioned, once the data was in a proper format, a format designed for SQL, the calculations were trivial.

However, outputting the results in the same way in PowerShell required a way to pivot results in PowerShell. Thanks to some heavy lifting from Joel Sallow ( Blog | Twitter ), I now know how to pivot in PowerShell!

Here’s hoping that this post will help explain it for you also.

Exploring our Data

SQL

First off, let’s check the current state of our table in SQL.

SELECT	POP.pushup_date,
		POP.attempt_number,
		POP.pushup_count,
		SUM(POP.pushup_count) OVER (PARTITION BY POP.pushup_date ORDER BY POP.pushup_date) AS total_per_date,
		SUM(POP.pushup_count) OVER () AS grand_total
FROM	dbo.PushupsOctoberProper AS POP;
SQL style!

Pivoting

I want to get all possible 8 attempts horizontal like the last post. I find this fairly easy when I have the documentation for PIVOTs open in another tab.

/* Can we pivot these? */
SELECT	PVT_01.pushup_date,
		[1] AS attempt_1,
		[2] AS attempt_2,
		[3] AS attempt_3,
		[4] AS attempt_4,
		[5] AS attempt_5,
		[6] AS attempt_6,
		[7] AS attempt_7,
		[8] AS attempt_8,
		PVT_01.total,
		PVT_01.total_so_far
FROM
(
	SELECT	POP.pushup_date,
			POP.attempt_number,
			POP.pushup_count,
			SUM(POP.pushup_count) OVER (PARTITION BY POP.pushup_date ORDER BY POP.pushup_date) AS total,
			SUM(POP.pushup_count) OVER () AS total_so_far
	FROM	dbo.PushupsOctoberProper AS POP
) AS SRC
PIVOT
(
	MAX(pushup_count) FOR attempt_number IN ([1], [2], [3], [4], [5], [6], [7], [8])
) AS PVT_01
ORDER BY	PVT_01.pushup_date;
Simples!

Simple, right? Once we have the data in the expected format then the above steps are the only steps necessary to calculate and show the data in the way that we want.

However, it becomes a bit more complicated in PowerShell.

PowerShell

Let’s grab the data from our SQL instance and take a look at it.

<# Populate our variable from the database #>
$invQueryParams = @{
    SqlInstance = $sqlInstance
    Database = 'LocalTesting'
    Query = 'SELECT * FROM dbo.PushupsOctoberProper;'
}
$data = Invoke-DbaQuery @invQueryParams

<# Show our data #>
$data | Format-Table -Autosize
So far, so good…

Grouping our Data

We have our data fetched, now we need to group it by the different dates. If only PowerShell had a way to group objects…what? Group-Object? oh!

<# Grouping our data #>
$dataGroups = $data | Group-Object -Property pushup_date
$dataGroups
Data.DataRow? * sigh* one of these days I’ll remember to use -AS PSObject with my Invoke-DbaQuery

Now that we have our data grouped by the different dates, we can loop through each date and pivot the data out horizontally.

Manual Pivot

The first way that came to mind was to manually list out all columns. I know that the maximum attempt_count that I have is 8 so let’s manually create 8 attempt columns.

<# Let's pivot this manually because it's the first way that came to mind #>
$ManualpivotedData = foreach ($dg in $dataGroups) {

    [PSCustomObject]@{
        pushup_date = ($dg.Group | Select-Object -ExpandProperty pushup_date -Unique).ToShortDateString()
        attempt01 = ($dg.Group | Where-Object attempt_number -eq 1).pushup_count
        attempt02 = ($dg.Group | Where-Object attempt_number -eq 2).pushup_count
        attempt03   = ($dg.Group | Where-Object attempt_number -eq 3).pushup_count
        attempt04   = ($dg.Group | Where-Object attempt_number -eq 4).pushup_count
        attempt05   = ($dg.Group | Where-Object attempt_number -eq 5).pushup_count
        attempt06   = ($dg.Group | Where-Object attempt_number -eq 6).pushup_count
        attempt07   = ($dg.Group | Where-Object attempt_number -eq 7).pushup_count
        attempt08   = ($dg.Group | Where-Object attempt_number -eq 8).pushup_count
        total = ($dg.Group | Measure-Object -Property pushup_count -Sum).Sum
        total_so_far = ($data | Measure-Object -Property pushup_count -Sum).Sum
    }
}

<# Let's make this pretty #>
$ManualpivotedData | Format-Table -Property pushup_date, @{ Expression = 'attempt*'; Width = 10 }, total, total_so_far
Seems to work

In case you’re wondering what @{ Expression = 'attempt*' ; Width = 10 } does, I use it to narrow the width of the columns named like attempt since they’re integers. Since they don’t need as much space, I can narrow them down and then Format-Table won’t cut-off my later columns!

Dynamic Pivot

I’m not against the manual way. I just find it too bulky and repetitve. It works! Please don’t get me wrong on that accout but as I recently heard someone say: “It works, now clean it up

Our main problem is the attempt columns and our manually typing them out. They seem like a perfect candidate for a ForEach loop. But, when we try to slot that in….

foreach ($dg in $dataGroups) {

    $props = @(
        @{ Name = 'pushup_date' ; Expression = { ($dg.Group | Select-Object -ExpandProperty pushup_date -Unique).ToShortDateString() }}
        foreach ($num in 1..8) {
            @{ 
                Name = "attempt_$num" 
                Expression = { $dg.Group |
                    Where-Object attempt_number -eq $num |
                    Select-Object -ExpandProperty pushup_count } 
            }
        }
        @{ Name = 'total' ; Expression = { ($dg.Group | Measure-Object -Property pushup_count -Sum).Sum } }
        @{ Name = 'total_so_far' ; Expression = { ($data | Measure-Object -Property pushup_count -Sum).Sum }}
    )

    $dg | Select-Object $props
}
Well that shouldn’t be empty!

Yeah, that’s a “no” from PowerShell. Why is this?

Why this is

Let’s investigate that $props variable. We’re creating a hashtable where the Key is our name and the Value is the expression we want. So let’s get the values.

$props | Format-List
Expression = $num

Do you see the way that each of the Expression keys have a value with the $num variable?

If you check $num now, you’ll see that it’s set to 8. It looks like we have found our problem, the $props variable isn’t keeping the value of $num when we define it!

Since only one date has a value for attempt 8, we should see some values there.

$pivotedData = foreach ($dg in $dataGroups) {

    $props = @(
        @{ Name = 'pushup_date' ; Expression = { ($dg.Group | Select-Object -ExpandProperty pushup_date -Unique).ToShortDateString() }}
        foreach ($num in 1..8) {
            @{ 
                Name = "attempt_$num" 
                Expression = { $dg.Group |
                    Where-Object attempt_number -eq $num |
                    Select-Object -ExpandProperty pushup_count } 
            }
        }
        @{ Name = 'total' ; Expression = { ($dg.Group | Measure-Object -Property pushup_count -Sum).Sum } }
        @{ Name = 'total_so_far' ; Expression = { ($data | Measure-Object -Property pushup_count -Sum).Sum }}
    )

    $dg | Select-Object $props
}

<# Let's check the 14th #>
$pivotedData | Where-Object pushup_date -eq '14/10/2020'
All filled but all with value for the 8th attempt!

Yeah…that’s not correct. I did 30 on the first attempt. Believe me, I remember the pain. Looks like it’s putting the value for attempt 8 into each of the attempts.

Not cool…

Closures

If only there was a way to keep the value of $num when we defined the $props variable. Well, thanks to Joel and his post ScriptBlocks and GetNewClosure(), I now know that there is!

$pivotedData = foreach ($dg in $dataGroups) {

    $props = @(
        @{ Name = 'pushup_date' ; Expression = { ($dg.Group | Select-Object -ExpandProperty pushup_date -Unique).ToShortDateString() }}
        foreach ($num in 1..8) {
            @{ 
                Name = "attempt_$num" 
                Expression = { $dg.Group |
                    Where-Object attempt_number -eq $num |
                    Select-Object -ExpandProperty pushup_count }.GetNewClosure()
            }
        }
        @{ Name = 'total' ; Expression = { ($dg.Group | Measure-Object -Property pushup_count -Sum).Sum } }
        @{ Name = 'total_so_far' ; Expression = { ($data | Measure-Object -Property pushup_count -Sum).Sum }}
    )

    $dg | Select-Object $props
}

<# Let's make this pretty #>
$pivotedData | Format-Table -Property pushup_date, @{ Expression = 'attempt*'; Width = 10 }, total, total_so_far
It’s alive!!!!!

Summary

There’s nothing wrong with making mistakes; as long as you learn from them.

Thanks to Kevin for reminding me how things should be stored, and thanks to Joel for this (vast) knowledge sharing, I’ve been able to learn how to dynamically pivot in PowerShell from my mistakes.

Review your mistakes, you never know what you may learn.

Table Column Differences with T-SQL and PowerShell

Where I start off with one idea, than veer widely towards another…much like me normally

As part of my post for the latest T-SQL Tuesday, I talked about community scripts. One of these scripts was by Jana Sattainathan ( blog | twitter ) and it was to do with finding tables with similar table structure. This is a great script for initially finding the tables and something that I am not sure that I would have thought of!

However, do me a favour and in your favourite search engine, do a search for “stackoverflow sql server wide tables” (Google search included for your benefit).

For some reason, people have a slight fascination with wwwwwiiiiiiiiddddeeeee tables!

So thanks to the excellent work done by Jana, you have now identified the tables with similar structure, but what about if we want to know which column names match exactly?

I recently had this problem with consolidating copies of tables of medium to wide length in a database and needed to know. So I created a little snippet of code to help me out and I present it to you, in all of it’s unashamed vulnerability, in the hope that it can help you out.

Be warned: Thar be PIVOTs ahead!

Ahoy, matey!

So we’ve got two tables that are slightly different, not completely different (hey, they could be) but different enough to be similar without matching… (differently similar?)

Our task is to find out which columns are in both tables, which columns are only in 1 table, and which columns are only in the other.

Now this isn’t so bad manually, especially if you only need to do this once, maybe twice. What happens though if you need to do it with multiple pairs of tables? Or multiple pairs of wide tables like our search engines showed us above?

So let us do what all DBAs should do when they have a repetitive, manual task stopping them from doing more fun important things: Automate it away!

Avast Ye!

Our two tables are as follows:

CREATE TABLE dbo.DifferenceTable01
(
    col1 int,
    col2 int,
    col4 int,
    col6 int
);
GO

CREATE TABLE dbo.DifferenceTable02
(
    col2 int,
    col3 int,
    col4 int,
    col5 int
);
GO

Now we can use the sys.columns table to check out the different columns in the table but the results are, well, meh

SELECT 
    OBJECT_NAME([object_id]) AS TableName,
    [name] AS ColumnName
FROM sys.columns 
WHERE [object_id] IN
(
    OBJECT_ID(N'dbo.DifferenceTable01', N'U'),
    OBJECT_ID(N'dbo.DifferenceTable02', N'U')
);
GO

sys_columns_results.PNG
Even if I ordered it, it would still be “meh”…

That is not really going to work out for us…
So I’m not liking the look of this, and going through the results, it seems to me that these results are just not useful. This isn’t the computers fault – it’s done exactly what I’ve told it to do – but a more useful result would be a list of columns and then either a simple ‘Yes’, or a ‘No’.

There’s syntax for this…PIVOT

Thar She Blows!

As anyone who has seen me dance can attest to, I can neither shake, rattle, nor roll. And I definitely do not normally PIVOT. However, as I’m trying to know my tools, I do know that this is the syntax that I need.

PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

So after looking up the syntax for this once (ok, 5 times!) I managed to come out with a script that I’m reasonably happy with.


SELECT Pivot1.ColumnName,
Pivot1.[dbo.DifferenceTable01],
Pivot1.[dbo.DifferenceTable02],
CASE WHEN [dbo.DifferenceTable01] = 1 AND [dbo.DifferenceTable02] = 1
THEN 'Both'
WHEN [dbo.DifferenceTable01] = 1 AND [dbo.DifferenceTable02] IS NULL
THEN 'Table 1 only'
WHEN [dbo.DifferenceTable01] IS NULL AND [dbo.DifferenceTable02] = 1
THEN 'Table 2 only'
ELSE 'Eh…this should not really happen'
END AS HumanReadableFormat
FROM ( SELECT
c.[name] AS ColumnName,
tb.TableName,
1 AS ColumnExists
FROM sys.columns AS c
RIGHT JOIN ( VALUES
(OBJECT_ID(N'dbo.DifferenceTable01', N'U'), 'dbo.DifferenceTable01'),
(OBJECT_ID(N'dbo.DifferenceTable02', N'U'), 'dbo.DifferenceTable02')
) AS tb (ObjectID, TableName)
ON c.object_id = tb.ObjectID
) AS UnPivotedColumns
PIVOT (
MAX(ColumnExists) FOR TableName IN ([dbo.DifferenceTable01], [dbo.DifferenceTable02])
) AS Pivot1
ORDER BY Pivot1.ColumnName ASC;
GO

And the results are a lot easier to read 🙂

human_readable_format
Can also be extended to more than 2 tables!

So much better! This way, no matter how long the tables, I can easily figure out what columns are in what table(s) based on their names.

Shiver Me Timbers!

Isn’t it always the way after you’ve done something, you realise a much easier way to do it?

The Old Seadog!

I’ve talked about automation in this post and I have yet to mention PowerShell. I bow my head in shame.

Especially when it has a native command like Compare-Object.

help Compare-Object -ShowWindow

Synopsis
Compares two sets of objects.

Description
The Compare-Object cmdlet compares two sets of objects. One set of objects is the “reference set,” and the other set is the “difference set.”

The result of the comparison indicates whether a property value appeared only in the object from the reference set (indicated by the <= symbol), only in the object from the difference set (indicated by the => symbol) or, if the IncludeEqual parameter is specified, in both objects (indicated by the == symbol).

If the reference set or the difference set is null ($null), this cmdlet generates a terminating error.

So the question we have to ask ourselves now is “Can we do what we did with the PIVOTs easier?”

The Old Salt.

I’ve recently found out about splatting so, of course, I’ve started to use it EVERYWHERE!

Let’s “splat” our two parameters

$Table01Param = @{
ServerInstance = 'localhost\SQLDEV2K14'
Database = 'master'
Query = &quot;SELECT OBJECT_NAME([object_id]) AS TableName, [name] AS ColumnName FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.DifferenceTable01', N'U');&quot;
}

$Table02Param = @{
ServerInstance = 'localhost\SQLDEV2K14'
Database = 'master'
Query = &quot;SELECT OBJECT_NAME([object_id]) AS TableName, [name] AS ColumnName FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.DifferenceTable02', N'U');&quot;
}

And we now save ourselves the trouble of writing the parameters to the functions.

Invoke-Sqlcmd @Table01Param
Invoke-Sqlcmd @Table02Param

splat-attack
SPLAT-ATTACK!!

Since everything is now set up, we can just pass those results into 2 different variable holders and use our Compare-Object.

$Table01 = Invoke-Sqlcmd @Table01Param
$Table02 = Invoke-Sqlcmd @Table02Param

Compare-Object -ReferenceObject $Table01 -DifferenceObject $Table02 -Property ColumnName -IncludeEqual

Non-splat-attack
It’s annoyingly wide without splatting…

And for anyone saying “Yeah, but who knows what ‘SideIndicator’ means!”,  I advise you to read the help with PowerShell. It helps a-lot!

The result of the comparison indicates whether a property value appeared only in the object from the reference set (indicated by the <= symbol), only in the object from the difference set (indicated by the => symbol) or, if the IncludeEqual parameter is specified, in both objects (indicated by the == symbol).

If you are still complaining about it – “You can’t have the HumanReadableFormat column like you did in T-SQL” – then please stop. There are experts out there who can make you go “wow” with what they do. I’m still learning but even I have an answer to that.

Compare-Object -ReferenceObject $Table01 -DifferenceObject $Table02 -Property ColumnName -IncludeEqual |
Select-Object -Property *,@{N='HRF';e={switch ($_.SideIndicator)
{'==' {'Both'}
'=&gt;' {'Table 2 only'}
'&lt;=' {'Table 1 only'}
}
}}

hrf
Go on! Compare this with T-SQL

Land-ho!

I’m not trying to argue with who would win between T-SQL and PowerShell, that was definitely not my intention with this post. T-SQL is my first language love, PowerShell is very quickly becoming my second (and that’s not just because it’s the only other langauge I know).

They both accompolish the same thing!

It’s just a matter of preference, that’s all, not a matter of competition. You want to work in SSMS, PIVOT it up! You like PowerShell, hammer that Compare-Object nail!
Whatever works for you.

My first idea for this was T-SQL but it turns out for me that PowerShell is the easier option. Don’t rule out one, just because the other was the first thing to pop into your head.

Now I’m going to go before I make a joke about Pirates, SQL and the R language…