Normally, I would have skipped this month because I’ve talked about permissions before, and I feel like it’s one of the better posts I’ve written. But I’m trying to write more, so that can’t be an excuse anymore.
So, let’s talk about AGs, DistAGs, and SQL Server authentication login SIDs.
AGs and DistAGs
In work, we have Availability Groups (AGs) and Distributed Availability Groups (DistAGs), depending on the environment level, e.g. Prod, Pre-Prod, QA, Dev, etc.
When we plan failovers for maintenance, we run through a checklist to ensure we can safely complete the failover.
One of these checks is to ensure that the correct logins with the correct SIDs are on each replica. Otherwise, we could get flooded with a lot of “HEY! My application doesn’t connect anymore! Why you break my application“, when apps try and log into the new primary and the user SIDs don’t match the login SIDs.
While I don’t have a problem with doing a query across registered servers, I threw together a quick script in PowerShell with dbatools that does the job of checking for me. And, like most things that happen in business, this temporary solution became part of our playbook.
Who knows! Maybe this quick, throwaway script could also become part of other people’s playbooks!
I’m not sure how I feel about thatโฆ
Scripts
We’re using dbatools for this because I think it’s the best tool for interacting with databases from a PowerShell session regardless of what Carbon Black complains about.
Then, we can use the ever-respected Get-DbaLogin to get a list of logins and their SIDs per replica. If we have a mismatch, we will have an issue after we failover. So best nip that in the bud now (also, I had this phrase SO wrong before I looked it up!).
$InstanceName = 'test'
# $InstanceName = $null # Uncomment this line to get ALL
$Servers = Get-DbaRegServer
if ($InstanceName) {
# Find the server that ends in our instance name
$Servers = $Servers | Where-Object ServerName -match ('{0}$' -f $InstanceName)
}
# Get all replicas per Instance name that are part of AGs...
$DAGs = Get-DbaAvailabilityGroup -SqlInstance $Servers.ServerName |
Where-Object AvailabilityGroup -notlike 'DAG*' | # Exclude DAGs which just show our underlying AG names, per our naming conventions.
ForEach-Object -Process {
$instance = $null
$instance = $_.InstanceName
foreach ($r in $_.AvailabilityReplicas) { # I'd like 1 replica per line, please.
[PSCustomObject] @{
InstanceName = $instance
Replica = $r.Name
}
}
} |
Select-Object -Property InstanceName, Replica -Unique |
Group-Object -Property InstanceName -AsHashTable
# Get a list of Logins/SIDs that don't match the count of replicas, i.e. someone forgot to create with SIDs...
foreach ($d in $DAGs.Keys) {
Write-Verbose "Working on instance: $d" -Verbose
Get-DbaLogin -SqlInstance $DAGs[$d].Replica |
Group-Object -Property Name, Sid |
Where-Object Count -ne $DAGs[$d].Count |
Select-Object -Property @{ Name = 'Instance'; Expression = { $_.Group[0].InstanceName }},
@{ Name = 'Occurances'; Expression = { $_.Count }},
@{ Name = 'Login/SID'; Expression = { $_.Name }}
}
I got 99 problems, and Login/SIDs are one! Or I have 5 Login/SID problems.
Cure
They say “prevention is better than cure”, and I’d love to get to a stage where we can “shift left” on these issues. Where we can catch them before someone creates a login with the same name but different SIDs on a replica.
But we’re not there yet. At least, we can find the issues just before they impact us.
Welcome to T-SQL Tuesday, the monthly blogging party where we are given a topic and have to talk about it. Today, we have Rob Farley ( blog | bluesky ), talking about integrity.
I’ll admit that it’s been a while since I’ve written a blog post. It’s been a combination of either burnout or busyness, but let’s see if I still have the old chops. Plus, I’m currently sick and resting in bed, so I have nothing better to do.
I’m one of the few who haven’t had experiences with corruption in our database. Apart from Steve Stedman’s ( blog ) Database Corruption Challenge, that is.
With that being said, what do I have to say about this topic then? Well, let’s talk about the first version of corruption checking automation that we introduced in work.
Now, this is just the bare bones and many different iterations since then, but the essence is here.
Overview
Like many shops out there, we can’t run corruption checking on our main production database instance. So, then, what do we do? We take the backups and restore them to a test instances, and then run corruption checking on those restored databases.
At least this way we can test the backups we take can be restored, as well.
But, I don’t want to spend every day manually restoring and corruption checking these databases, so let’s automate this bitโฆ
Limitations
A quick scour of the interwebs brought back something extremely close to what I want by Madeira Data Solutions. It had some extras that I didn’t want, though.
More importantly, it used some functions that our dreaded antivirus software still screams false positives about. So, they would stop our script from running if we even tried.
A while back, I was tasked to performance tune some code that brought me through all five stages of grief. It’s best to take you through the events from the developer’s viewpoint. There’s less cursing that way.
Denial
Hey, what’s up? You were asked to troubleshoot some code. Not a big deal; these requests come in from time to time.ย
You have code that keeps coming up as a high consumer of CPU on our systems? OK?
It had gotten to the stage where it kept appearing in sp_WhoIsActive? What’s that? And should it not be “whom is active”? Well, agree to disagree.
Let’s see the code so. Wow, that’s a small scroll bar! Yeah, that’s one of ours.
No, it is. I swear it is.
I’m not surprised you can’t find it in stored procedures; we dynamically create it. Here’s the procedure. Yeah, all 900 lines! Pretty cool, huh?
What do you mean, why? We had to! We had to react to the different parameters that get passed in.
Anger
Alright, alright! Calm down. I misspoke.
Yeah, that’s it, breathe. There aren’t different parameters. It’s a parameter. Singular.
No, we still need to generate the query dynamically. Cause the XML could be different. Huh? Yeah, the parameter is XML; it could be anything in there. So we pass the XML in as a nvarchar(max) string.
You spilt some of your coffee there when you banged the desk.
Then we use sp_xml_preparedocument at the start, but then we have to use sp_xml_removedocument at the end.
You’ve never heard of those before? I thought you were a DBA?
We use the “prepare” to get the XML into a state that’s easier to consume, and we use the “remove” so we don’t get memory leaks!
Your face is getting a bit red, did you know that?
It’s SQL Server; it uses enough memory it can share some!
Did we read the docs? No, why? It can use one-eighth of the total memory available for SQL Server. Ah, but that’s “can”, not “will”.
Yes, yes, cursors upon cursors upon cursors. Why? We don’t know how many values are in each XML node in the params, so we have to split them out.
We then join them into a comma-delimited string, which is then used in IN clauses… woah! Is that the sound of your teeth grinding? You know that’s not good for your health. What do you mean neither am I?
Anyway, then we parse all of that down to a massive, what do you call it, swiss-army knife, lego-block, dynamic query built based on what’s passed in. You don’t call it that? I thought you did. What do you call it so? Wow, that’s not a word I had heard of before.
It’s not too bad, though! We pass in everything as literal values, so it’s faster that way. We read up on that parameter sniffing issue you said can happen. That isn’t going to catch us out here!
Modern problems require modern solutions, as they say.ย What’s that? It’s not a modern problem? I’m a what? Are you allowed to call me that?
Bargaining
You want me to re-write it? You’ll help? It’ll take a lot of time. It’s not really a priority.
Plus, it’s not even worth it. How much CPU? Is that a “illion” with an “m” or “illion” with a “b”? Per run? Is that a lot? Well, I really wouldn’t know, now would I?ย
Yeah, good point; you’ll also have to check how much memory it uses. Hey! Probably one-eight of the total memory available for SQL Server. Alright, it wasn’t that bad a joke; everyone’s a critic!
You’ll give me credit? But it’s fine in Dev; there’s no point. I suppose we could remove some of the WHERE OR IS NULL branches since we know if they’re empty when we create the query.
Nah, that would make the procedure a bit too big. Anything over 1000 lines is too much. Well, 985 is still less than 1000!
Depression
Why are you crying? No, we care, we do, we really do. No, not all the code is like this; this is old. Yes, we stopped using that XML parameter pattern. Yes, and the cursors. Well, most of them, at least.ย
Your tears are mixing with the coffee you spilt, you know that?ย
Here, there’s really no need for the fetal position in the middle of the office. You have yet to see the code for the busier databases.
You know what, why don’t you go home and come back in tomorrow, and I’ll go over that way, OK?
Acceptance
Hey! There’s our favourite DBA. You seem a lot cheerier.
Oh, you don’t have to perf-tune that query anymore? That’s great; see, it wasn’t so bad. What’s that? The latest release failed? It ran fine in Dev.
You’re permitted to set up a QA environment to test performance before code gets released? What kind of code? Any stored procedure?
Ah, that’s OK. No, really, it’s fine. We started using ORMs and embedding SQL into the app a while ago. This won’t affect us.
It has been quite a while since I have coded just for fun, so I’m thankful to Reitse for suggesting this. Unfortunately, I don’t have a pre-baked idea for this T-SQL Tuesday, so let’s see what we can come up with.
Echos
Around December 2021, Wordle hit the virtual scenes. Yeah, nearly two years ago. How do you feel about that?
I got swept up in that wave for a while in the same way I got swept up in the other trends of my time, like Pokemon, Sodoku, and Angry Birds.
Eventually, I stopped when I found a PowerShell script by Kieran Walsh ( github | twitter ) where you could narrow down to the correct answer by putting in the results of your guess each round.
This hack led to me realising how much time I was spending on Wordle and that I should stop, much like I did with Pokemon, Sodoku, and Angry Birds.
So, what better thing to do than to try and recreate that PowerShell script in T-SQL
Rules
I must recreate as much of the script as possible in T-SQL in only one hour. Yes, I’m aware that’s more of a rule than rules but Wordle needs five letters dammit, and “rule” was crying out for that prosthetic “s”!
Total (code)
Don’t worry, you just have to fill in the variables on lines 19-26.
Split
A few things need to be taken care of out of the bat.
The potential answers have to be stored somewhere in the database. Thankfully, I had the answers in a text file, so creating a table and then inserting them was easy.
I could do the insert with flat files, but I already have PowerShell open so…
Next, we need the variables that we can create. If I can finish this before the 1-hour mark, I’ll turn this into a stored procedure with parameters and everything! Until then, it’s script and variable times.
DECLARE
@known_letters AS varchar(5),
@excluded_letters AS varchar(26),
@position1 AS char(1),
@position2 AS char(1),
@position3 AS char(1),
@position4 AS char(1),
@position5 AS char(1),
@correct_letters AS xml,
@all_answers_sql AS nvarchar(MAX);
/* region Enter Variables here */
SET @known_letters = '';
SET @excluded_letters = '%[]%';
SET @position1 = NULL;
SET @position2 = NULL;
SET @position3 = NULL;
SET @position4 = NULL;
SET @position5 = NULL;
/* endregion Enter Variables here */
The PowerShell code has known_letters, excluded_letters, positions, and wrong_positions.
I can do all these easily enough, except for wrong_positions. I can’t think of a way to do hashtables in SQL that doesn’t equal a secondary table or user-table type, etc. I’ll leave that to the end if I have time.
known_letters is an array of strings. I haven’t updated the SQL Server version on my laptop in a while, so there is no string_split for me. Let’s do the XML way so.
/* region KnownLetters */
SELECT @correct_letters = CONCAT(
'<known_letters>',
REPLACE(@known_letters, ',', '</known_letters><known_letters>'),
'</known_letters>'
);
SELECT
[known] = [l].[y].value('.', 'char(1)')
INTO #KnownLetters
FROM
(
VALUES
(@correct_letters)
) AS [x] ([kl])
CROSS APPLY [kl].nodes('/known_letters') AS [l] (y);
/* endregion KnownLetters */
excluded_letters I can get away with by using some LIKE jiggery-pokery, where it will search for any characters between the square brackets.
positions I can split out into individual variables. I can more easily deal with them then, and it only ends up as an extra five variables this way.
Creating the table would have been handier if I had made a column for each character, but I didn’t, so it’s some SUBSTRING logic for me to get the individual characters out.
If we do know the positions of some of the letters, then I can strip out a lot of the potential answers straight away. I’m not a fan of Swiss-army-knife WHERE clauses, so I’ll do the dynamic SQL.
I’m also not a fan of WHERE 1=1 in my dynamic code, but I’m running low on time here, and it’s faster to add that in first and start everything else with an AND than it is to check if this is the first clause in the WHERE section or not.
Plus, I’m less against WHERE 1=1 than I am against Swiss-army-knife WHERE clauses.
/* region Known Positions */
CREATE TABLE #AllAnswers
(
[wordle_answers] char(5),
[char1] char(1),
[char2] char(1),
[char3] char(1),
[char4] char(1),
[char5] char(1)
);
SET @all_answers_sql = N'SELECT
[wa].[wordle_answers],
[g].[char1],
[g].[char2],
[g].[char3],
[g].[char4],
[g].[char5]
FROM [dbo].[WordleAnswers] AS [wa]
CROSS APPLY (
VALUES (
(SUBSTRING([wa].[wordle_answers], 1, 1)),
(SUBSTRING([wa].[wordle_answers], 2, 1)),
(SUBSTRING([wa].[wordle_answers], 3, 1)),
(SUBSTRING([wa].[wordle_answers], 4, 1)),
(SUBSTRING([wa].[wordle_answers], 5, 1))
)
) AS [g] ([char1], [char2], [char3], [char4], [char5])
WHERE 1=1';
IF @position1 IS NOT NULL SET @all_answers_sql = CONCAT(
@all_answers_sql,
N'
AND [g].[char1] = ',
QUOTENAME(@position1, '''')
);
IF @position2 IS NOT NULL SET @all_answers_sql = CONCAT(
@all_answers_sql,
N'
AND [g].[char2] = ',
QUOTENAME(@position2, '''')
);
IF @position3 IS NOT NULL SET @all_answers_sql = CONCAT(
@all_answers_sql,
N'
AND [g].[char3] = ',
QUOTENAME(@position3, '''')
);
IF @position4 IS NOT NULL SET @all_answers_sql = CONCAT(
@all_answers_sql,
N'
AND [g].[char4] = ',
QUOTENAME(@position4, '''')
);
IF @position5 IS NOT NULL SET @all_answers_sql = CONCAT(
@all_answers_sql,
N'
AND [g].[char5] = ',
QUOTENAME(@position5, '''')
);
SET @all_answers_sql = CONCAT(@all_answers_sql, N';')
PRINT @all_answers_sql;
INSERT INTO #AllAnswers
EXECUTE [sys].[sp_executesql] @stmt = @all_answers_sql;
/* endregion Known Positions */
Finally, we can UNPIVOT the individual characters for the words and join them with the known_letters to single down to those answers. As well as excluding characters that we know aren’t in the word.
Or else just return everything we have, minus excluded characters.
IF LEN(@known_letters) > 0 BEGIN
SELECT
*
FROM #AllAnswers AS [w]
UNPIVOT
(
[chars] FOR [chr2] IN ([w].[char1], [w].[char2], [w].[char3], [w].[char4], [w].[char5])
) AS [unpvt]
JOIN #KnownLetters AS [kl]
ON [unpvt].[chars] = [kl].[known]
WHERE
[unpvt].[wordle_answers] NOT LIKE @excluded_letters
END
ELSE
BEGIN
SELECT
*
FROM #AllAnswers AS [a]
WHERE [a].[wordle_answers] NOT LIKE @excluded_letters;
END;
Guilt
In the PowerShell script, you can add characters in the excluded_letters parameter that exist in the known_letters parameter, and it will correctly ignore them.
Alas, Tempus fugit and I didn’t get to do the same for this T-SQL version. Maybe that has something to do with translating “time flies” into Latin and then looking up other sayings in Latin, but we can’t say for sure. Mea culpa!
However, it’s been around 50 minutes with minor troubleshooting here and there, so time to test this bad boy.
I’ll take the first answer returned each time unless it is the answer we chose previously.
PowerShell
I’m not going to use the wrong_positions parameter here since I didn’t re-create that in T-SQL. Still, I got lucky and got the correct answer on the third guess
T-SQL
The T-SQL method doesn’t show each iteration as well as the PowerShell does. And, there’s more human brain power required to make sure you don’t enter the same letter in the known_letters and the excluded_letters variables.
Overall though, well done with a respectable four guesses
Point
I’m not going to say that there is no point to these exercises.
Fun is a valid a point as any other. In a work world filled with more demands on our time than the number of Pokemon (there’s still only 150, right?), more technologies to learn than combinations in Sodoku puzzles, and more people demanding the seemingly impossible out of you so that you want to yeet them at solid objects … something something Angry Birds, it’s a welcome change to do something just for fun once in a while.
You have to do some funky stuff to get the driver for SQL Server; granted, funky things basically mean run go get <Github link>. I’ve done funkier…
There’s also some better way to do this, but who cares. This is exploratory.
When setting up my SQL instances, I did some weird thing where I now have to specify a non-standard port when connecting. So I have to open up the error log in SQL Server, see what port it’s listening to, and put that into the code. Yes, I’m aware that I’m a DBA, and I should be able to fix this, but it’s my personal time & my personal test instance. I’ll fix it when I want to.
Anyway, here’s the code.
Do with it what you will; I’m just happy it works, and I’m delighted that I’m excited about it!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
And what happens when it fails, say cause the default port actively refuses connections?!
// changing `port = 60515` -> `port = 60514`
Happy out! Now to see what else we can do. Thankfully, I’ve somehow managed to get some ideas marinating in creative juices. Let’s see if I have the time and the energy to see them through!
I don’t know if this post is a rage against “database-ubiquitous” SQL or Object-Relational Mapping tools (ORMs)
I’m going to say that it’s a rant against the first because if you know what you are doing, then ORMs won’t be a performance concern
But people use ORMs instead of Stored Procedures because Them: “what happens if we need to change databases? We don’t have to worry about that now cause they’ll be baked in the application, and we won’t have to recreate the Stored Procedures in the new database cause ORMs just work!”
Everything just works until it doesn’t
I’ve worked with several companies and heard this argument many times from developers
Part of why I’m ranting about database-ubiquitousness SQL instead of ORMs is that I’ve also witnessed SQL code that has to work everywhere
I’m going to call this SQL “DUh SQL” from now on since I only have so many vowels on my keyboard and shouldn’t waste them re-writing the word “ubiquitousness”
Them: “Nope, we can’t use DATEFROMPARTS; it’s not available in other databases! Yeah, we use GETDATE() everywhere; what’s your point? WHAT!?!”
I’ve moved across different cloud providers more than I’ve moved databases, and guess what? Each cloud provider move, we’ve kept the same database
I know that knowledge sharing “in the wild” says that you should use ORMs or DUh SQL The route to a 10x engineer is paved in DRY, KISS, and SWALK Well, maybe not SWALK
Them: “But the leading minds out there say that you should keep it DUh since that means it’ll be a breeze if we have to move databases”
Sure, and I’ve been told that ice baths help in recovery I’m not going to take them, though, since I know there’s a difference between me trying to do a 5k and dedicated athletes!
I love arguing with people in my blog posts; I rarely lose
So, don’t use DUh SQL cause of reasons that may not apply to you Don’t refuse performance tuning efforts if they’ll add database-specific code And, please, learn to use your tools
Oh, and apologies for the lack of full-stops, I can’t use them in case I have to turn this post into regex
A Michal commented on the post, asking how to get a specific output from his search.
Hi,
Thanks for your sharing. What if I also want to compare case sensitively columns and the order of them (syncwindows). How can I presented it on powershell.
I mean that in the final table I want to show also something like: column_a, column_A โ> case sensitive
AND
column_a, column_a โ> different order in the table
Thanks in advance
Michal
I confess that I never got around to answering Michal until a few weeks ago when I found myself with some rare free time.
Since then, I’ve written a script, slapped it into a function, and threw it up on Github.
Here’s hoping that it does what you want this time Michal, thanks for waiting.
Shall I Compare Thee to Another Table?
The first thing that we need to do is have a couple of SQL tables to compare.
So, I threw up a Docker container and created a couple of tables with nearly the same layout.
You can see that there are around three differences here
Column orders, e.g. col9 has id 6 in dbo.DifferenceTable01 but id 5 in dbo.DifferenceTable02.
Column case sensitivity, e.g. col7 does not match COL7.
Column presence, e.g. col3 doesn’t exist in dbo.DifferenceTable01 at all.
While Compare-Object has the -CaseSensitive switch, I don’t think that it would be helpful in all these cases. Or else I didn’t want to use that command this time around.
So, I wrote a function to get the output we wanted, and yes, I now include myself among that list of people wishing for that output.
I’m allowed to be biased towards the things that I write ๐
I’ve tried to include everything you could want in the function output, i.e. column names, column ids, and statuses.
Something I’ve started to do lately is wrapping a [Diagnostics.StopWatch] in my verbose statement to see where potential slow parts of the function are.
I’d like to think that 0.2 seconds for this example aren’t too bad.
Feel free to use and abuse this function to your hearts content. I know that there are a few things that I’d add to it. Comparing across different instances being an obvious one that I’d like to put in.
Hopefully though, someone out there will find it helpful.
I recently ran into a problem with the QUOTED_IDENTIFIERS option in SQL Server, and it got me to thinking about these SET options.
I mean the fact that, on tables where there are filtered indexes or computed columns with indexes, QUOTED_IDENTIFIER is required to be on to create any other indexes is just not intuitive. But if you can’t create indexes because of it then I’d argue that it’s pretty damn important! I also found out that this problem is not just limited to QUOTED_IDENTIFIERย but to ARITHABORTย and ANSI_WARNINGS as well.
SET ARITHABORT must be ON when you are creating or changing indexes on computed columns or indexed views. If SET ARITHABORT is OFF, CREATE, UPDATE, INSERT, and DELETE statements on tables with indexes on computed columns or indexed views will fail.
And for ANSI_WARNINGSย it says:
SET ANSI_WARNINGS must be ON when you are creating or manipulating indexes on computed columns or indexed views. If SET ANSI_WARNINGS is OFF, CREATE, UPDATE, INSERT, and DELETE statements on tables with indexes on computed columns or indexed views will fail.
It’s not just Indexes
So, like a dog when it sees a squirrel, when I found out about the problems with ARITHABORTย and ANSI_WARNINGS I got distracted and started checking out what else I could break with it. Reading through the docs, because I found that it does help even if I have to force myself to do it sometimes, I found a little gem that I wanted to try and replicate. So here’s a reason why you should care about setting ARITHABORTย and ANSI_WARNINGS on.
Default to on
At one stage or another if you’re working with SQL Server, you’ve probably encountered the dreaded “Divide By 0” error:
Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered.
If you want to check this out, then here’s the code below for our table:
USE Pantheon;
-- Create our test table...
CREATE TABLE dbo.ArithAborting (
id tinyint NULL
);
GO
And our attempt at inserting that value into the table:
SET ARITHABORT ON;
GO
SET ANSI_WARNINGS ON;
GO
-- Check can we insert a "divide by 0"...
BEGIN TRY
INSERT INTO dbo.ArithAborting (id) SELECT 1/0;
END TRY
BEGIN CATCH
PRINT 'NOPE!';
THROW;
END CATCH;
And we get our good, old, dreaded friend:
Terminate!
We check our ArithAborting table and nothing is there, like we expected!
SELECT *
FROM dbo.ArithAborting;
I got nothing…
What about if we were to turn our ARITHABORTย and ANSI_WARNINGSย off though, what happens then? Well that’s a simple thing to test, we just turn them off and run the script again:
--Turn ARITHABORT off;
SET ARITHABORT OFF;
GO
SET ANSI_WARNINGS OFF;
GO
-- ...insert into our table...
BEGIN TRY
INSERT INTO dbo.ArithAborting (id) SELECT 1/0;
END TRY
BEGIN CATCH
PRINT 'NOPE!';
THROW;
END CATCH;
Termin-wait…
Now before I freak out and start thinking that I’ve finally divided by zero, let’s check the table:
During expression evaluation when SET ARITHABORT is OFF, if an INSERT, DELETE or UPDATE statement encounters an arithmetic error, overflow, divide-by-zero, or a domain error, SQL Server inserts or updates a NULL value. If the target column is not nullable, the insert or update action fails and the user receives an error.
Do I like this?
Nope!
If I have a terminating error in my script, I quite like the fact that SQL Server is looking out for me and won’t let me put in bad data, but if you have these options turned off, even if you wrap your code in an TRY...CATCHย block, it’s going to bypass it.
Plus if you are trying to divide by 0, please stop trying to break the universe. Thank you.
This is yet another time that a blog post has come about from a question by a developer. They’re good guys, I guess, they keep me on my toes.
This time it was with change logging. We didn’t have Change Data Capture (CDC), or Temporal Tables enabled (have you seen the YouTube videos by Bert Wagner ( blog | twitter ) on these?). What we did have was “manual logging” and no, I’m not even talking about Triggers.
What we had was INSERT statements, directly after a MERGE statement, that inserted into a table variable a hard-coded name of the column, the old value, and the new value.
Is that what I would do? Doesn’t matter, it was there before I got there, seems to work, and is low down on the list of priorities to change.
The question was, every time that they needed to add a column to a table, and change log it, they had to add multiple lines to the change tracking procedure and the procedure was getting gross and hard to maintain.
You know the drill by now, I quite like to play along so let us facilitate that (from now on I’m going to use Gist, formatting with native WordPress is starting to annoy me).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This will create our table and, luckily, all of it’s columns are important enough to warrant capturing when they get changed.
Despite their looks, these values are “important”
Old, Way WHERE old=way
Let’s take a look at the code that they were using, shall we?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Hey! It’s legacy code, let’s focus on just 1 problem at at time!
The main issue that I was asked about was every time a column was deemed important and needed to be added to the list, they had to insert another INSERT INTO @ChangeLogTemp...ย and they thought that it wasn’t sustainable in the long run.
Hmmm it also comes across as very RBAR doesn’t it? Every time we want to include another column to the change tracking, we have to add them row by agonizing row.ย The script is already big enough, if we keep adding more, it will get massive!
Set based is 90% of the time the right way to go but how do we do set based solutions on the same table?
New JOIN Way ON new = way
The first thing I do is to change that table variable into a temp table. Stats, indexes (if necessary), and I can query the results as we go along. Much better!
Temp > Variable?
The second thing is that, whether by luck or by design, the legacy code has the same naming conventions for the columns; new column values are have the prefix “New%” in the column name and old columns have the “Old%” prefix.
This works for us because we can now split the new columns into 2 derived tables, New and Old, and that way we have the differences.
Potential problem here…
Have you ever tried to find the differences between two consecutive rows of data? It’s fiendishly difficult. WHERE Column1 on row1 != Column1 on row2ย apparently just does not work,ย le sigh.
I’ve talked before about PIVOTย but now I’m going to introduce you to it’s little brother, UNPIVOT, which “rotating columns of a table-valued expression into column values”
I say “little brother” because the whole document talks about PIVOT, with only brief mentions of UNPIVOT in the notes.
If you’re writing documentation like this, please stop.
With UNPIVOTย we can create a table of our rows around our ID and Column names…
Potential problem averted!
… and with this, we can join on our ID and Column names and get to our more intuitive WHERE OldValue != NewValue.
Bringing it all together!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The whole thing was supposed to be to reduce the amount of changes required when they need to include or exclude columns. All in all though, it’s just 6 lines less. Not exactly the great return that you’d expect.
Yeah, true with the old way for every column we want to add we have to add an extra 6 lines while the new way adds 2.
That means for 1,024 columns:
The old way could have at least 6,144 lines per table. (1024 * 6)
The new way could have at least 2,048 lines per table (not explaining this calculation >:(ย )
So, is there anything else that we can do?
Dynamically?
I’ve talked before about T-SQL automation with Dynamic SQL and this should be a good candidate for that.
What can we make dynamic here though? How about…
The new and old columns bit?
The FOR ColumnName IN([Column1], [Column2], [Column3], [Column4], [Column5], [Column6])ย bit?
The CAST(ISNULL([Old/NewColumn], '') AS nvarcharย bit?
This is basically the same as the above. Don’t be put off by needing to add CAST(ISNULL(ย before the column names, it’s not as complex as you’d think.
STUFF just doesn’t look as pretty… ๐
Now that we have our dynamic bits, let’s create the full statements.
Full Dynamic Script
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
N'SELECT New.ColumnName AS CN, New.NewValue AS NV, Old.OldValue AS OV FROM (SELECT Unpvt.GotAnyChangeId, Unpvt.ColumnName, Unpvt.NewValue FROM (SELECT [GotAnyChangeId], ' + @NewColumns
+ N' FROM #OutputTableDynamic ) AS DataSource UNPIVOT (NewValue FOR ColumnName IN (' + @columns
+ N') ) AS Unpvt ) AS New INNER JOIN ( SELECT Unpvt.GotAnyChangeId, Unpvt.ColumnName, Unpvt.OldValue FROM (SELECT [GotAnyChangeId], ' + @OldColumns
+ N' FROM #OutputTableDynamic ) AS DataSource UNPIVOT (OldValue FOR ColumnName IN (' + @columns
+ N')) AS Unpvt) AS Old ON Old.ColumnName = New.ColumnName AND Old.GotAnyChangeId = New.GotAnyChangeId WHERE New.NewValue != Old.OldValue FOR XML PATH(''Change''), ROOT(''Changes'')';
Overall, the script is longer at nearly double the lines but where it shines is when adding new columns.
To include new columns, just add them to the table; to exclude them, just add in a filter clause.
So, potentially, if every column in this table is to be tracked and we add columns all the way up to 1,024 columns, this code will not increase. Old way: at least 6,144. New way: at least 2,048. Dynamic: no change
Summary:
Like the script, this was a massive post. Back at the start, I said that a developer came to me because they wanted to get more DRY (?) and stop needing to add more content to the stored procedure.
Do you think the developer used this?
Nope!
I can’t say that I blame them, it’s slightly ugly and unwieldy, and I wrote it so I should love it.
Yet if something was to go wrong and the need was there to open the procedure and troubleshoot it, the first person to open this up is going to let out a groan of despair!
So this request turned into a proof of concept and nothing more. No skin off my back, I have a growing list of tasks to accomplish by 5 minutes ago. Better get back to them.