Table Column Differences with T-SQL and PowerShell – Part 2

If this was a horror movie, it would be called “The Differencing”…duh duh duh!

Time to read: ~ 5 minutes

Words: 1057

Update 2021-07-14: Marked code blocks as preformatted

The original post for this topic garnered the attention of a commenter who pointed out that the same result could be gathered using a couple of UNION ALLs and those lovely set-based EXCEPT and INTERSECT keywords.

I personally think that both options work and whatever you feel comfortable with, use that.

It did play on my mind though of what the performance differences would be…what would the difference in STATISTICS IO, TIME be? What would the difference in Execution Plans be? Would there even be any difference between the two or are they the same thing? How come it’s always the things I tell myself not to forget that I end up forgetting?

I have no idea about the last one but at least the other things we can check. I did mention to the commentor that I would find this an interesting blog topic if they wanted to give it a go and get back to me. All I can say is – Sorry, your mail must have got lost in transit. I’m sure it is a better blog post that mine anyway.

If you’re going to do it…

For this test, we’re not going to stop at a measely 4 columns per table. Oh no! For this one we’re going to go as wide as we can.

With a recent post by Kenneth Fisher ( blog | twitter ) out about T-SQL FizzBuzz, I’m going to create two tables, both of which will have incrementing column names i.e. col00001, col00002, …, col1024. Table1 will have all columns divisible by 3 removed while Table2 will have all columns divisible by 5 removed.

See, FizzBuzz can be useful!

So our table creation scripts…

SELECT TOP (1024)
CASE WHEN v.number = 0
-- Change this to 02 the second run through
THEN N'CREATE TABLE dbo.TableColumnDifference01 ('
ELSE N' col' + RIGHT(REPLICATE('0', 8) + CAST(v.number AS nvarchar(5)), 4) + N' int,'
END
FROM master.dbo.spt_values AS v
WHERE v.type = N'P'
AND (
-- Change this to '% 5' the second run through
v.number % 3 != 0
OR v.number = 0
)
FOR XML PATH('')

TableCreationScript
See Note

NOTE: When you copy and paste the results of this query into a new window to open it, it is going to fail. Why? Well the end of the script is going to be along the lines of colN int, and it needs to be colN int). Why is it like this? Well it was taking to damn long to script that out. Feel free to change this to work for you. Hey if you do, let me know!

Now, how I’m going to do test this, is run each method 3 times (PIVOT, UNION, and PowerShell), then measure the third run of each method. This is mainly as I want to get rid of any “cold cache” issues with SQL Server where the plan has to be compiled or the data brought into memory.

…do it Pivot

So first up is the Pivot method from the last blog post. In case you’re playing along at home (and go on, do! Why should kids get all the fun) here is the code that I’m running.

And here is our results:

PivotMethodGridResults
Yup, those be columns

What we are really after though is the stats, execution plan and time to complete for our 3rd execution. Now as much as I love reading the messages tab for the stats information, I feel with blog posts that aesthetics is king, so I’m going to be using the free tool by Richie Rump ( twitter ) “Statistics Parser

Stats:

PivotMethodGridStats
Elapsed time: 00:00:00.136

Execution Plan:

PivotMethodGridPlan
Probably the first plan I’ve seen where the SORT isn’t the most expensive!

..do it UNION

Secondly we have what I dubbed “the UNION method” (no points for figuring out why) and the only change I’ve made to this script is to add in PARSENAME() and that’s only so that the script would..you know…work.

Results be like:

UnionMethodGridResults
Yep, Yep, Yep, Yep, Nope, Yep…

Stats:

UnionMethodGridStats
Elapsed time: 00:00:00.624

hmm…less Scan Counts but 5 times the reads…also 5 times slower than the PIVOT method. Maybe the execution plan will be prettier?

Execution Plan:

UnionMethodGridPlan.png
ehh…WHAT!

Yeah…so…that’s…that’s different from the first plan! I was right in my comment though, there is a concatenation operator (there’s actually 2, you may need to zoom in to find them though)

…do it PowerShell

Finally we have the PowerShell method. No messing about here, let’s get straight to it! I’m going to lump all the code together in one gist and I’ll be wrapping it in Measure-Command to get the speed of the command.

Get-Results

PoSHMethodGridResults
Yeah I’m liking VS Code more and more…

Get-Stats:

PoSHMethodGridStats.png
Elapsed time: 00:00:00.249

help *execution*; help *plan*

Would you believe that I couldn’t figure out how to get an execution plan for PowerShell 🙂

If anybody knows, hit me up!

Finishing off

You know at the start of this, I was fully expecting the PowerShell to win out, followed by the UNION method, because it’s use of UNION, EXCEPT, and INTERSECT which are basically made for this kind of problem, and the PIVOT method bringing up a distant last since PIVOTs have this complexity stigma attached to them and what is complex is normally slow.

From a sheer speed point of view, the actual results are:

  1. Pivot
  2. PowerShell
  3. Union

Who knew!?

I don’t think this is the end of my use of PowerShell or Union operators though. I’m not going to replace all the stuff that I can with Pivots. For one I just think that PowerShell and the Union operators are just too cool!

I actually like this result for two reasons.

  1. There are multiple way to do something in SQL, there are good ways and better ways. The main point is whatever option you choose, make sure you know what it entails and can justify it.
    Whatever works for you, works for you!
  2. You don’t know something, test it and find out! What you think the outcome may be, may not be true.

Now if you’ll excuse me, I want to figure out if there’s a way to return execution plans with PowerShell.

My Function Won’t Accept Parameters? Get-Help!

Getting Get-Help Help

The following is a recounting of an issue that I had and how I went about resolving it. No computers were harmed in the making of this post.


Ask me for one PowerShell command that everyone should know and I can answer you: Get-Help.

Fairly descriptive name if you ask me. Today I’m focusing on using Get-Help selectively to help me figure out why my custom function just won’t accept parameters!

You say Test Case. I say Basket Case.

We are going to need a custom test function for the audience to play along with at home, luckily Shane’s got you covered.
This is a Tactical Estimation of Shane’s Test function – aka T.E.S.T. function; very simple but all the important parts are there.

Function Test-FunctionByParameter {
    [cmdletbinding()]
    Param(
        [Parameter(Mandatory = $true,
                   ValueFromPipelineByPropertyName = $true)]
        [string]$Parameter
    )
    process {
        "Success, I'm [$Parameter]"
    }
}

If I’ve done my maths right, and I always do my maths right (as far as you know), then this function should take input from the pipeline and output it in the string “Success, I’m …”

Do I do my maths right?

Get-Service -Name *sql* |
Select Name -first 1 |
Test-FunctionByParameter

FirstFailure
It’s the “carrying the 1” that always gets me!

Huh, parameter problem.

I thought this was supposed to work like this. You pipe in information, magic happens, and functions work, no?

Well, when in doubt, Get-Help.

Get-Help:

Before I go any further though, just so that everyone knows how to use Get-Help, I’m going to show you one of the secret techniques for using Get-Help.

Get-Help *help*

GetHelp_Help

Yup, I use dbatools

Why does help exist?

When you think about it, why is there even a function called help?
As far as I’m aware it’s basically the same as Get-Help except it automatically pipes the output to | more so we get pages rather than a wall of text.

Is there more that we can do with Get-Help though? Is there a way that we can return the examples only? Syntax only? Parameters only?

Is there not a way that we can do such things?!

Yessum, the Possums cousin

Okay I cheated on the first one; examples are pretty easy. PowerShell actually already takes care of that for you.

Get-Help Get-Help -examples

GetHelp_GetHelp_Examples
Help me if you can I’m feeling examples…I mean down!

The other two, while not laid out for you as pretty as that, are not that difficult to do. What needs to be remembered about Get-Help is that it is a cmdlet. And what do cmdlets normally output?…

What?! No! Objects!
They normally output Objects! Wow…next time just pipe it to Get-Member if you don’t know.

I Object!

Now I first saw this done in a blog post by Adam Bertram ( blog | twitter ) but I do believe that it warrants further highlighting.

If you did pipe Get-Help to | Get-Member you would have seen a NoteProperty called syntax, so if we want the syntax for a cmdlet, we can specify that using:

(Get-Help Get-Help).syntax

GetHelp_GetHelp_Syntax.PNG
Syntax, useful for all languages

So for parameters we need…yup .parameters.

(Get-Help Get-Help).parameters

GetHelp_GetHelp_Parameters
Parameters…languages use them as well I guess

Hmm, not as handy as I thought it would be. What happens if we pipe that to Get-Member (Alias gm as I’m getting lazy here)?

(Get-Help Get-Help).parameters | gm

GetHelp_GetHelp_Parameters_GM.PNG
Well lookie here, another NoteProperty!

Let’s try that and see what we get, shall we?

(Get-Help Get-Help).parameters.parameter

GetHelp_GetHelp_Parameters
…the exact same >:( Fine, have the same screenshot then!

It’s always brightest before the dawn

And the answer comes always before you smash your screen in rage.

If we pipe the above information to Get-Member again, we get more useful information this time (I’m not going to show it, you know how to pipe to gm by now).

GetHelp_GetHelp_Parameters_GM_Useful
This looks like something we can work with 🙂

I’m from a database background so can we make this pretty, all I care about is the name and the pipeline input.

(Get-Help Get-Help).parameters.parameter |
    Select-Object -Property name,pipelineinput

GetHelp_NamePipelineInput.PNG
ByPropertyName…what’s that?

By Odin’s Beard! I mean PropertyName

You know one of these days I should really read this help file (you should too) because half way down the results of the following code is some interesting info…

help about_pipelines

METHODS OF ACCEPTING PIPELINE INPUT

Cmdlets parameters can accept pipeline input in one of two different ways:

— ByValue: Parameters that accept input “by value” can accept piped objects
that have the same .NET type as their parameter value or objects that can be
converted to that type.

For example, the Name parameter of Start-Service accepts pipeline input
by value. It can accept string objects or objects that can be converted to
strings.

— ByPropertyName: Parameters that accept input “by property name” can accept piped
objects only when a property of the object has the same name as the parameter.

For example, the Name parameter of Start-Service can accept objects that have
a Name property.

(To list the properties of an object, pipe it to Get-Member.)

Some parameters can accept objects by value or by property name. These parameters are
designed to take input from the pipeline easily.

So that’s the problem?! The names need to match up! I can do that with Select-Object!

All I need to do is add a custom label using @{Label='<custom label>';Expression={'<custom expression>'}}

Try{}Catch{}Finally{}

Get-Service -Name *sql* |
Select-Object -First 1 -Property @{l='Parameter';e={$_.Name}} |
Test-FunctionByParameter

ParameterName_TestFunction
I always do my maths right!

So now when I run a command and get the crazy…

The input object cannot be bound to any parameters for the command either because the command does not take pipeline input or the input and its properties do not match any of the parameters that take pipeline input.

I can just run:

(Get-Help &lt;cmdlet name&gt;).parameters.parameter |
Select-Object Name,pipelineInput

And know exactly where to fix! 🙂

Understanding “Scan Counts 0, Logical Reads N”

So yesterday, the 30th of November, was Arun Sirpal’s (b | t) birthday and as a birthday present, he challenged me to write a blog post.

Now I’ll admit that I didn’t have any plan to write a blog post before I got the challenge as I’ve been steadily getting busier with work/study/other commitments, so apologies if it’s a bit long, rambling, and not thought through fully.

Anyway, happy birthday Arun, here’s your blog post:

Understanding “Scan Counts 0, Logical Reads N”

Have you ever run SET STATISTICS IO ON; ?

I’ll confess that I do it a lot, especially when I am performance tuning. Yet, like most things in SQL Server, I don’t fully understand it…yet!

Now don’t get me wrong, the little that I understand is extremely helpful, but recently I had a case where I didn’t understand the output of STATISTICS IO , and asking my Senior DBA got me the look from him that I come to think of as ‘The You_Are_Either_Joking_Or_You_Are_Being_Stupid_Again Look’.

So to document my stupidity, here’s the question for that look.

How come the Logical Reads from STATISTICS IO are so high when it says Scan count is 0?

tl;dr – They are related but not exactly a 1:1 relationship. Plus a scan count of 0 does not mean that the object wasn’t used at all.

Logical Reads:

Let us just get this definition out of the way as it’s very short, sweet, and to the point. Logical Reads are the…

Number of pages read from the data cache.

Right, great, got’cha. Logical reads, 8kb pages, read from the data cache. If your STATISTICS IO reports logical reads 112 then you’ve read 112 pages from the cache. Simples!

Scan Count:

This is the blighter that got me the look…well more like ‘my misunderstanding of what Scan Count means’ got me the look but it still holds my contempt at this moment in time.

My previous intuitions about this guy was…

“Scan count is the number of hits the table/index received”
(THIS IS NOT CORRECT! …and why it isn’t down as a full quote).

Let’s check out the definition again and see what it has to say for itself.
Scan count is the…

Number of seeks/scans started after reaching the leaf level in any direction to retrieve all the values to construct the final dataset for the output.

That’s a very specific definition isn’t it? It’s not all of the definition though, there’s more! And it is this “more” that I want to focus on.

Testing The Defintions

First, things first, let us set up our query environment.

USE [tempdb];
GO

SET STATISTICS IO ON;
GO

Next paragraph…

Scan count is 0 if the index used is a unique index or clustered index on a primary key and you are seeking for only one value. For example WHERE Primary_Key_Column = <value>.

Well let’s see about that!


CREATE TABLE dbo.Unique_DefinedUnique
(
 col1 INT NOT NULL PRIMARY KEY
);
INSERT INTO dbo.Unique_DefinedUnique (col1)
SELECT x.n FROM (VALUES (1), (2), (3), (4), (5)) AS x(n);

CREATE UNIQUE NONCLUSTERED INDEX uci_Unique_DefinedUnique_col1
ON dbo.Unique_DefinedUnique ( col1 )
GO

SELECT col1 FROM dbo.Unique_DefinedUnique WHERE col1 = 1;

scancount0
okay…maybe it does…

Never mind, the next paragraph please!

Scant count is 1 when you are searching for one value using a non-unique clustered index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for. For example WHERE Clustered_Index_Key_Column = <value>.

I don’t believe you!


CREATE TABLE dbo.Unique_NotDefinedUnique
(
col2 INT NOT NULL
);
GO
INSERT INTO dbo.Unique_NotDefinedUnique (col2) VALUES (1), (2), (3), (4), (5);

CREATE NONCLUSTERED INDEX nci_Unique_NotDefinedUnique_col2
ON dbo.Unique_NotDefinedUnique ( col2 )
GO

SELECT col2 FROM dbo.Unique_NotDefinedUnique WHERE col2 = 1;

scancount1
Fine…you got me there too…

Final bit!

Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key.

Hmmm, so if we have duplicate values, then this will happen?…
Nah, ridiculous!


CREATE TABLE dbo.NotUnique
(
col3 CHAR(1) NOT NULL
);
GO

INSERT INTO dbo.NotUnique (col3)
VALUES
('A'), ('A'), ('B'), ('B'), ('C'), ('C'), ('D'), ('D'), ('E'), ('E'),
('F'), ('F'), ('G'), ('G'), ('H'), ('H'), ('I'), ('I'), ('J'), ('J');
GO 2

CREATE NONCLUSTERED INDEX nci_NotUnique_col3
ON dbo.NotUnique ( col3 )
GO

--Let's try it with 2 and then 3!!!

SELECT col3 FROM dbo.NotUnique
WHERE col3 = 'A' OR col3 = 'B'

SELECT col3 FROM dbo.NotUnique
WHERE col3 = 'A' OR col3 = 'B' OR col3 = 'C'

scancountn
Alright, alright! I was an idiot!

Putting away my toys…

DROP TABLE dbo.Unique_DefinedUnique, dbo.Unique_NotDefinedUnique, dbo.NotUnique;

Round Up:

I was confused about the Scan count being 0 but logical reads not being 0. How can the scan count not actually scan/seek anything?

But it is scanning/seeking!

Read the Scan Count definition again…I’ll capitalize the words that I glossed over

Number of seeks/scans STARTED AFTER REACHING THE LEAF LEVEL in any direction to retrieve all the values to construct the final dataset for the output.

Scan count of 0 occurs when there is a unique index or clustered index on a primary key and you are seeking for only one value. The word of the day is…”Unique”.

So because the unique index looking for a unique value in a column guaranteed to be unique, it’s not so much that the query isn’t looking for a value, it’s more that  once the query reaches the leaf level it already knows that it’s on the value it needs!
Since it doesn’t look any more “after reaching the leaf level”, the scan count is allowed to be 0.

This explains why, if the value is unique but not guaranteed to be so (either the index, value, or column is not guaranteed unique) the query has to do 1 scan/seek to check that the next value isn’t what it wants.
Therefore, Scan Count will be 1…

And I’ll leave it as an exercise to figure out why Scan Count N is Scan Count N.
(Hint: it’s because Scan Count N)

Take Away:

I sometimes find the documentation dry and not terribly interesting but I don’t like the “You_Are_Either_Joking_Or_You_Are_Being_Stupid_Again” look more.
So read the documentation, study what you are doing, know why things do what they do…

It’ll save you some funny looks off of people. 🙂