T-SQL Tuesday #125 – Unit Testing Databases.

Words: 1344

Time to read: ~ 7 minutes

Welcome to the April edition of T-SQL Tuesday. This month we have Hamish Watson ( Blog | Twitter ) asking us about how we unit test our databases.

Performance Tuning

At one stage or another, I’m sure that we’ve all been asked to performance tune database code. Sometimes this tuning can involve changing the code. How do we know that the changes that we are introducing isn’t going to introduce bugs? You got it – Unit tests!

Set up

Let’s set up a contrived example – Contrived of course coming from the words “con” as in fake and “Tried V” as in I tried 5 times before I just gave up and used what I had.

USE [tempdb];
GO

/*
    Let's build this thang...
*/
CREATE TABLE [dbo].[Objects]
(
    [id] int NOT NULL CONSTRAINT [PK dbo.Objects(id)] PRIMARY KEY,
    [level] int NOT NULL,
    [object_type] int NOT NULL,
    [tag] nvarchar(256) NOT NULL,
    [description] nvarchar(max) NOT NULL,
    [parent_object_id] int NULL
);
GO

INSERT INTO [dbo].[Objects]
(
    [id],
    [level],
    [object_type],
    [tag],
    [description],
    [parent_object_id]
)
SELECT
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS [id],
    NTILE(10000) OVER (ORDER BY [S01].[alias]) AS [level],
    [M01].[severity] AS [object_type],
    [S01].[alias] AS [tag],
    [M01].[text] AS [description],
    NULLIF(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1, 0) AS [parent_object_id]
FROM [sys].[messages] AS [M01]
INNER JOIN [sys].[syslanguages] AS [S01] ON [M01].[language_id] = [S01].[lcid]
WHERE [S01].[alias] = N'English';
GO

SELECT *
FROM [dbo].[Objects];
GO

This will insert 11,466 records into the dbo.Objects table. The table has a column called parent_object_id that points to its … parent object id, wow you have been paying attention! Luckily, or unluckily, tuning the schema of the table is out of scope for this post

Next we have the piece of code that we’ve been asked to performance tune. It’s …uhm … well it works.

/*
    RBAR SQL
*/
CREATE FUNCTION [dbo].[GetChildObjects]
(
    @ObjectId int
)
RETURNS @ChildObjects TABLE
(
    [object_id] int NOT NULL,
    [level] int NOT NULL,
    [object_type] int NOT NULL,
    [tag] nvarchar(256) NOT NULL,
    [description] nvarchar(max) NOT NULL
)
AS
BEGIN
    --
    -- Insert first object.
    --
    INSERT INTO @ChildObjects
    SELECT
        id,
        level,
        object_type,
        tag,
        description
    FROM dbo.Objects
    WHERE id = @ObjectId;

    --
    -- Stack
    --
    DECLARE @Stack TABLE
    (
        ObjectId int NOT NULL,
        lvl int NOT NULL
    );

    --
    -- Initialise level for stack
    --
    DECLARE @lvl int = 1,
            @CurrentObjectId INT = NULL;

    --
    -- Insert current object
    --
    INSERT INTO @Stack
    (
        ObjectId,
        lvl
    )
    VALUES
    (@ObjectId, @lvl);

    --
    -- Loop through stack and get child objects
    --
    WHILE (@lvl > 0)
    BEGIN
        IF EXISTS (SELECT * FROM @Stack WHERE lvl = @lvl)
        BEGIN

            --
            -- Get object
            --
            SELECT @CurrentObjectId = ObjectId
            FROM @Stack
            WHERE lvl = @lvl;

            --
            -- Insert child objects
            --
            INSERT INTO @ChildObjects
            SELECT 
                id,
                level,
                object_type,
                tag,
                description
            FROM dbo.Objects
            WHERE parent_object_id = @CurrentObjectId;

            --
            -- Delete object from stack
            --
            DELETE FROM @Stack
            WHERE lvl = @lvl
            AND ObjectId = @CurrentObjectId;

            --
            -- Insert the childnodes of the current node into the stack
            --
            INSERT INTO @Stack
            (
                ObjectId,
                lvl
            )
            SELECT
                id,
                @lvl + 1
            FROM dbo.Objects
            WHERE parent_object_id = @CurrentObjectId;

            IF (@@ROWCOUNT > 0) -- If the previous statement added one or more nodes, go down for its first child.
                SET @lvl = @lvl + 1; -- If no nodes are added, check its sibling nodes
        END;
        ELSE
            SET @lvl = @lvl - 1; -- Back to the level immediately above
        END;
    RETURN;
END;
GO

Timing is important

Let’s get some times for this function. There’s nothing fancy here, we’re just going to call it for different values, returning more and more results and see how long it takes to finish.

/*
    Simple tezt...
*/
SELECT * FROM [dbo].[GetChildObjects](11466) OPTION (RECOMPILE); -- 0 seconds.
SELECT * FROM [dbo].[GetChildObjects](11000) OPTION (RECOMPILE); -- 3 seconds.
SELECT * FROM [dbo].[GetChildObjects](10000) OPTION (RECOMPILE); -- 10 seconds.
SELECT * FROM [dbo].[GetChildObjects](9000) OPTION (RECOMPILE); -- 17 seconds.
SELECT * FROM [dbo].[GetChildObjects](8000) OPTION (RECOMPILE); -- 23 seconds.
SELECT * FROM [dbo].[GetChildObjects](7000) OPTION (RECOMPILE); -- 29 seconds.
SELECT * FROM [dbo].[GetChildObjects](6000) OPTION (RECOMPILE); -- 36 seconds.
SELECT * FROM [dbo].[GetChildObjects](5000) OPTION (RECOMPILE); -- 41 seconds.
SELECT * FROM [dbo].[GetChildObjects](4000) OPTION (RECOMPILE); -- 48 seconds.
SELECT * FROM [dbo].[GetChildObjects](3000) OPTION (RECOMPILE); -- 54 seconds.
SELECT * FROM [dbo].[GetChildObjects](2000) OPTION (RECOMPILE); -- 61 seconds.
SELECT * FROM [dbo].[GetChildObjects](1000) OPTION (RECOMPILE); -- 68 seconds.
SELECT * FROM [dbo].[GetChildObjects](10) OPTION (RECOMPILE); -- 76 seconds.

Taking from 0 seconds for 1 row to 76 seconds for 11,456 rows, we now our timings.

As you can see there has been some performance tuning efforts by the code creators already. So impressed was I by the use of OPTION (RECOMPILE) that I asked why they asked me to take a look.

The execution plan seems fine; doesn’t look like there’s anything we can do with it…

Well, let’s take a look at the execution plan, shall we?

Ah!

No surprises

Listen, we both know that this needs to be rewritten. Here is the crux of the post – how do we know that our re-write isn’t going to introduce unexpected side-effects? Little happy “unspecified-features“? You got it – Unit tests!

Creating Unit Tests

To keep this brief, I’ve installed tSQLt on my tempdb database. The documentation for installing it is very easy to follow on the tSQLt website so head there for instructions. We’re going to create some tests for this function.


/*
    tSQLt time!
*/
EXECUTE [tSQLt].[NewTestClass] @ClassName = N'GetChildObjects';
GO

CREATE PROCEDURE [GetChildObjects].[test returns nothing for nothing]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](1);

    -- Assert (empty is empty).
    EXECUTE [tSQLt].[AssertEmptyTable] @TableName = N'#Actual'
END;
GO

CREATE PROCEDURE [GetChildObjects].[test returns 1 row if no child objects]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    INSERT INTO [dbo].[Objects]
    (
        [id],
        [level],
        [object_type],
        [tag],
        [description],
        [parent_object_id]
    )
    VALUES
    (
        1,
        1,
        1,
        N'Don''t care',
        N'Absolutely don''t care',
        NULL
    );

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](1);

    SELECT
        1 AS [object_id],
        1 AS [level],
        1 AS [object_type],
        N'Don''t care' AS [tag],
        N'Absolutely don''t care' AS [description]
    INTO #Expected
    
    EXECUTE [tSQLt].[AssertEqualsTable] @Expected = N'#Expected', @Actual = N'#Actual';
END;
GO


CREATE PROCEDURE [GetChildObjects].[test returns multiple rows...say 4]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    INSERT INTO [dbo].[Objects]
    (
        [id],
        [level],
        [object_type],
        [tag],
        [description],
        [parent_object_id]
    )
    SELECT
        [V].[number],
        [V].[number],
        [V].[number],
        N'Don''t care '+ CONVERT(varchar(2), [V].[number]),
        N'Absolutely don''t care '+ CONVERT(varchar(2), [V].[number]),
        NULLIF([V].[number] -1, 0)
    FROM [master].[dbo].[spt_values] AS [V]
    WHERE [V].[type] = 'P'
      AND [V].[number] > 0
      AND [V].[number] <= 10
    ORDER BY [V].[number]

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](7);

    SELECT
        [id] AS [object_id],
        [level],
        [object_type],
        [tag],
        [description]
    INTO #Expected
    FROM [dbo].[Objects]
    WHERE [id] >= 7
    ORDER BY [id];
    
    EXECUTE [tSQLt].[AssertEqualsTable] @Expected = N'#Expected', @Actual = N'#Actual';
END;
GO

Now we have tests to see what happens when the table is empty, when only 1 row should be returned, and when multiple rows are supposed to be returned. Hey! 4 rows still counts as multiple!

Running these tests couldn’t be simpler:

EXECUTE [tSQLt].[Run] @TestName = N'GetChildObjects';
GO

Re-Write time

The best piece of advice I heard for performance tuning, especially with a hectic schedule, is to

  1. Have a “good enough” goal and have a “time limit”

If the code has to run under a minute and you get it to 51 seconds after 30 minutes, great! Stop – I’m sure you’ve a to-do list that is only getting bigger.

/*
    All your bases are belonging to SET!!!
*/
DROP FUNCTION [dbo].[GetChildObjects]; --Don't worry, it's in source control.
GO
CREATE FUNCTION [dbo].[GetChildObjects]
(
    @ObjectId int
)
RETURNS TABLE
AS
RETURN
(
    WITH [Stack] AS
    (
        SELECT
            [O].[id] AS [object_id],
            [O].[parent_object_id]
        FROM [dbo].[Objects] AS [O]
        WHERE [O].[id] = @ObjectId


        UNION ALL

        SELECT
            [Obj].[id],
            [Obj].[parent_object_id]
        FROM [Stack]
        INNER JOIN [dbo].[Objects] AS [Obj] ON [Stack].[object_id] = [Obj].[parent_object_id]
    )
    SELECT
        [Det].[id] AS [object_id],
        [Det].[level],
        [Det].[object_type],
        [Det].[tag],
        [Det].[description]
    FROM [Stack] AS [S]
    INNER JOIN [dbo].[Objects] AS [Det] ON [S].[object_id] = [Det].[id]

);
GO

Wait! How do you… oh yeah. Unit Tests.

EXECUTE [tSQLt].[Run] @TestName = N'GetChildObjects';
GO

Perfect, now we can check out the timings!

SELECT * FROM [dbo].[GetChildObjects](11466) OPTION (RECOMPILE, MAXRECURSION 0); -- 0 seconds.
SELECT * FROM [dbo].[GetChildObjects](11000) OPTION (RECOMPILE, MAXRECURSION 0); -- 1 second.
SELECT * FROM [dbo].[GetChildObjects](10000) OPTION (RECOMPILE, MAXRECURSION 0); -- 3 seconds.
SELECT * FROM [dbo].[GetChildObjects](9000) OPTION (RECOMPILE, MAXRECURSION 0); -- 6 seconds.
SELECT * FROM [dbo].[GetChildObjects](8000) OPTION (RECOMPILE, MAXRECURSION 0); -- 8 seconds.
SELECT * FROM [dbo].[GetChildObjects](7000) OPTION (RECOMPILE, MAXRECURSION 0); -- 13 seconds.
SELECT * FROM [dbo].[GetChildObjects](6000) OPTION (RECOMPILE, MAXRECURSION 0); -- 15 seconds.
SELECT * FROM [dbo].[GetChildObjects](5000) OPTION (RECOMPILE, MAXRECURSION 0); -- 16 seconds.
SELECT * FROM [dbo].[GetChildObjects](4000) OPTION (RECOMPILE, MAXRECURSION 0); -- 18 seconds.
SELECT * FROM [dbo].[GetChildObjects](3000) OPTION (RECOMPILE, MAXRECURSION 0); -- 21 seconds.
SELECT * FROM [dbo].[GetChildObjects](2000) OPTION (RECOMPILE, MAXRECURSION 0); -- 23 seconds.
SELECT * FROM [dbo].[GetChildObjects](1000) OPTION (RECOMPILE, MAXRECURSION 0); -- 26 seconds.
SELECT * FROM [dbo].[GetChildObjects](10) OPTION (RECOMPILE, MAXRECURSION 0); -- 31 seconds.

Taking from 0 seconds for 1 row to 31 seconds for 11,456 rows, we now our new timings. And while I’m not a fan of the MAXRECURSION 0 I don’t have a major problem with adding it to an existing RECOMPILE statement.

A great side effect is that we have more information in the execution plan!

A wild index suggestion appeared

There we have it!

Thanks to our unit tests we have a more performant code to send back with confidence that it works the same as the old code…just faster.

Now if you’ll excuse me, my 30 minutes are up and my to-do list is growing.

Talking Data Bits

Words: 160

Time to read: ~ 1 minute.

Yes I am aware that these are normally posted directly after the podcast

I had the absolute pleasure of talking with Kevin Hill ( Blog | Twitter | Podcast ) on the first episode of his new podcast “Data Bits”.

You can check out Kevin’s blog post announcing it here and the actual podcast here.

It was an energising experience reliving past times, talking data tales, and cringing over shared scars received from working with data.

Next, Kevin had Andy Levy ( Blog | Twitter ) as his second guest on the show. It was uplifting hearing them talk shop and realising that maybe I should get more “aggressively friendly”.

You can check out the episode with Andy here.

I’m currently catching up with the next few episodes including Ginger Grant ( Blog | Twitter ), Jules Behrens ( Twitter ), and Louis Davidson ( Blog | Twitter ).

Thankfully, I’ve been able to scrounge some more time to listen to podcasts in this lock-down. There’s always an upside 😁!

Getting into Python by Referencing PowerShell

Words: 653

Time to read: ~ 3 minutes.

Same Trap, Different Language

I’ve been trying to improve my knowledge of Python for the past while. I’ve fallen into the same trap that I fell into when I started my DBA learnings though.

When I began my DBA learning, I was a Blog Post Warrior.

I read every blog post and DBA StackExchange post I could find. I had RSS feeds and alerts all set up and firing. I still have these RSS feeds and alerts though. There was nothing wrong with them, the problem was with the way I was using them. That was the extent of my learning. It was all theory and no practice!

Me: Oh, that’s how you do a page restore. Good to know.
My Brain: Do you want to try that?
Me: No, I read how to do it so I know it now.

Me and Me

I hope I don’t have to point out how sub-optimal this is. I changed how I learned about DBA topics. I built a lab and starting doing some practical work alongside the theory.

Yes, it may mean a slow down in learning throughput but it also means that I only need to learn a topic once. By embracing the practical side, I’ve moulded the information into my brain. I’ve also learned more if I hit errors that aren’t in the blog posts and have to fix them.

Same with Python

I’ve set up RSS feeds. I’ve signed up to newsletters. I’m checking out questions online. But I was not practising.

That has to change now.

I have a dependency on PowerShell. I like it. I find it easier to think that way. So that is where I have to change. Anytime I find myself reaching for PowerShell and I have time in my deadline, I’ll try to do Python as well.
That’s where this blog post comes into play.

The Reason

We’re adding information to our internal wiki. We use TFS (Team Foundation Server) for this and TFS allows a variation of markdown. If you want to create a table in this markdown, then you follow the syntax

| column_01 | column_02 | column_N |
| --- | :--- | ---: |
| column_01 value | column_02_value | column_N_value |

• :--- means left-align the value
• ---: means right-align the value

The current work involves taking the contents of a file and outputting the content with | (pipes) between them, at the start of the line, and at the end of the line.

This is quite a manual process and one that I had done more than twice.

A perfect sign to start automating!

PowerShell

PowerShell came without any real effort or thought.

$file_path = '.\Documents\Versions.txt'
Get-Content -Path $file_path |
    ForEach-Object -Process {
        $login_date, $login_count = $_ -split '\t'
        " | $login_date | $login_count | "
    }

This seems pretty intuitive to me. Get the content and on each line split out the first part before the tab as the variable login_date and the second part as login_count.
After that, it’s a case of adding those variables into a string wrapped with pipes.

Copy and Paste

Python

Python was a bit more difficult to put together and, in the end, I’m not sure I’ve done more than a literal translation of PowerShell to Python. But, hey, it works!

with open(r'C:\Users\shane.oneill\Documents\Versions.txt', 'r') as f:
	for line in f:
		lg_mth, db_cnt = line.split()
		print(f"| {lg_mth} | {db_cnt} |")
Getting my Pyth-ON!

Now I can easily copy and paste this into TFS.

Hey, maybe even create a script as I’ll more than likely need to update this regularly going forward. The most important aspect I take from this is the fact that I can. I have the ability and the practical experience to try this and other examples like this again in the future.

Sin é

That’s it. It wasn’t so much an attempt at a revolutionary post as it was an attempt to get practical experience with Python. Also a chance to get stuck in to writing blog posts again.

Hopefully there’s more to come. Python, PowerShell, and all.

T-SQL Tuesday #123: Life hacks to make your day easier

Words: 557

Time to read: ~ 3 minutes

T-SQL Tuesday has come back around! Another month without me getting a separate blog post out. I’ll get a non-T-SQL Tuesday blog post out eventually! This month, Jess Pomfret ( twitter | blog ) asks us about our life hacks.

The Humble Tomato

There is not enough time in the day. Until somebody figures out how to squish a few more hours into a day, that’s something that we’re going to have to accept.

In an effort to have some sort of time management, I try the Pomodoro technique.

It can be summarised by the points in this Wikipedia article

• Decide on the task to be done.
• Set the pomodoro timer (traditionally to 25 minutes).
• Work on the task.
• End work when the timer rings and put a checkmark on a piece of paper.
• If you have fewer than four checkmarks, take a short break (3–5 minutes), then go to step 2.
• After four pomodoros, take a longer break (15–30 minutes), reset your checkmark count to zero, then go to step 1

Wikipedia

Two Problems

I have 2 problems with the above bullet points

  1. It’s a very manual process, and
  2. The first one says I have to decide on the task to be done first.

In the immortal words of Homer Simpson:

Can’t someone else do it!

Homer Simpson

PowerShell

So, in an effort to resolve my problems, I created a PowerShell script.

You can find it here on my GitHub.

This script implements the Pomodoro technique for me, with blocks broken down into ~20/25 minutes of work and 5 minute breaks for reading blogs.

Knowing that there is so much topics to learn and stay on top of, every 3 work blocks are scheduled to learning.

After every 5 blocks, I get a longer break to grab a coffee, go for a walk, or interact with the Data/Automation family.

I even get a Toast notification when it’s time for me to change actions!

Here’s how it’s worked out for me today.

FYI – I missed starting this script when I came in thanks to back to back meetings from 08:00!

This solves the problems that I had with the Pomodoro technique for me.

  1. It’s no longer manual, it’s giving me an alert each time for the time blocks
  2. I no longer have to decide on what to do next.

The most I have to decide on is the -Random switch. Which effectively means do I want to try and get through all the items in my learning list or just pick a random one each time.

Seeing as my learning list is currently…

[String[]]$Item = 'JavaScript','Research','PSKoans','Blog','Python','PowerShell','C#','dbachecks','DBAFundamentals','Entity Framework','Containers','R'

…with undoubtedly more coming, I should really make it -Random by default!

In case you’re wondering what the “Mark” column is for, I’m trying to improve the current documentation of the system. So every Work action with an entry in the Mark column means get documenting!

Caveats

This is not a law. 

If I’m working on something and I enter focus mode, I’ll ignore the suggested blocks and keep working. Same can be said for deadlines; I’ll take the 5-minute breaks and then get back to the task.

Also, I can’t think of anyone who’s at their desk all day (I’ve been double-booked for meetings today alone!). But it’s a nice way of coming back, glancing at the PowerShell console, and knowing I’ll be making the most of my time.

T-SQL Tuesday #122 – Imposter Syndrome

Imposter syndrome.

If there was ever a phrase that brings about the dreaded Writer’s Block it would be Imposter Syndrome. I’m going to fall back to what I know I can talk about while the back part of my brain ruminates on what to write.

T-SQL Tuesday

T-SQL Tuesday is the brainchild of Adam Machanic ( Twitter | Blog ). It is a monthly blogging call by the Data community. Each month a topic for the community is offered up and a blog post is written by all wishing to take part.

This month we have John Shaulis ( Twitter | Blog ) as the host and John has given the topic of Imposter Syndrome.

I want to read your stories about when you’ve experienced, seen, or overcome imposter syndrome!

John Shaulis

You can find the invitation to this month’s topic at John’s blog post here.

Imposter Syndrome

I’ve gone through 2 cups of coffee staring at a blank page while waiting for my brain to dredge up some relatable Imposter Syndrome moment. It’s taken this long for me to realise that it’s a futile exercise. Imposter Syndrome is, by the definition that John has kindly given us, a personal experience. 

One’s success […] result of one’s […] efforts or skills.

Imposter Syndrome Definition

So, with a third cup of coffee in hand, and a deep refusal to believe that I have a caffeine dependency, allow me to tell you about the times that I feel the Imposter Syndrome the most.

Unsurprisingly

Normal working days are not the days that I feel Imposter Syndrome.

Fielding questions at work doesn’t bother me.

There’s a sense of investigation, an aim to teach, and a feeling of joy that people are taking an active interest in SQL Server.

Investigating issues and researching requests don’t trigger it either.

There are set goals here; either fix the issue or can the request be done? Plus, there is an amazing community out there that can be called upon on Twitter, Slack, etc. for answers or guidance.

Unsurprisingly the days when I am the sole person interested in SQL Server are not the days when I feel Imposter Syndrome.

Surrounded but Isolated

Imposter Syndrome hits like a brick when I’m not alone. Conferences, User Groups, and meetups are the main culprits.

Probably stemming from being a Junior DBA going to my first User Group and realising the extent of the topics I didn’t even know that I didn’t know!

Trying to learn the basics of administrating databases and going to a meeting where people are arguing about single-thread CPU performance and NUMA nodes…

Talking to people who are dealing with columnstore partitioning problems and massive memory grants because they’re bulk-loading 3 trillion rows at once.

All the while I’m there thinking my main problem was telling developers “set-based = good, row-by-row = bad”.

The same is true for conferences. 

Can’t Stop, Won’t Stop

Please don’t get me wrong. I love going to conferences, I do! 

There are few other places where you can get curated knowledge to that extent! Where you can talk to people who get it when you say “Can you believe that they wanted sysadmin privileges just to take backups!?!”.

I love seeing what people are doing, what new technologies are coming up, and catching up with people who experience the same problems that I do.

And there’s no way I’m going to stop, the “fors” don’t just outweigh the “againsts”, they completely smother them!

But it fires off my Imposter Syndrome like nothing else does!

T-SQL Tuesday #121: Gifts received for this year

Words: 401

Time to read: ~ 2 minutes.

T-SQL Tuesday, the monthly blogging party created by Adam Machanic, is still going strong. This months T-SQL Tuesday is hosted by Malathi Mahadevan ( blog | twitter ) who asks us…

“[…] what are a few things would you consider as gifts, and why?”

Invitation Post

There are many items that I consider gifts. More than I can reasonably write down in a blog post. So, in the interest of brevity, I’m going to break it down into gifts of SQL past, the gifts of SQL present, and the gifts of SQL future.

Gifts of SQL Past

I have been gifted with being mentored by DBAs that were knowledgeable and passionate about the Data Community.

I have been gifted with mentors who encouraged me to go to User Groups, volunteer at conferences and soak up as much data learnings as I could.

I have been gifted with mistakes that I have made in the past that have allowed me to learn from them. (I have definitely been gifted with the knowledge of how important test labs are!)

Gifts of SQL Present

I am gifted with a job that seems to actively repel me away from any sort of comfort zone. 

I am gifted with an active Data Community on Twitter, Slack, and in person for any issues that I run into.

I am gifted with a PowerShell community on Discord and Slack who seem to be constantly battling for the right to call themselves the “most welcoming community”.

I am gifted with a shrinking of borders so that anyone anywhere can be considered a helpful resource to a data question.

Gifts of SQL Future

I’m being gifted with new technologies to help out with old and new problems.

I’m being gifted with a field that is ever-changing. Fluid, dynamic, and will never become stagnant nor boring.

I’m being gifted with putting names to welcome faces and meeting up with new Data Community friends.

Mostly though, I’m being gifted with the opportunity to help others out in the future. Whether that be with blog posts, tweets, talks, examples, or just encouragement.

As good as all the previous gifts are, I believe the best gift is the gifts that you can give back. So, here’s hoping that the next year sees us all help more than hinder, learn more than laze, and teach more than troll.

Those are what I consider gifts.

T-SQL Tuesday #120 – What were you thinking?

Words: 712
Time to read: ~ 3.5 minutes

10 Years

T-SQL Tuesday is upon us once more. T-SQL Tuesday number 120 means something else as well. 120 monthly posts equals 10 years that Adam Machanic’s ( twitter | blog ) blog post party has been going on.

Wayne Sheffield ( twitter | blog ) is hosting this month’s event. Wayne asks us something that I’m sure we’ve all thought at some stage.

What were you thinking?

https://blog.waynesheffield.com/wayne/archive/2019/11/t-sql-tuesday-120-what-were-you-thinking/

In the beginning…

The first exploration of a system leaves a lasting impact. When you first get a chance to delve into the database, you capture a shot of what the coding standards are like. You gleam the past experiences of the developers.

I’m looking for instances of NOLOCK if I’m being honest.

…there are impressions.

This impression was a What were you thinking? experience.

  1. DEADLOCK_PRIORITY LOW on most procedures.
  2. A lot of hierarchial data types.
  3. VARCHAR(MAX) on most columns
  4. Variables at the start of procedures used in equality WHERE clauses. e.g. DECLARE @Success int; Set @Success = 4; ... WHERE StatusId = @Success.
  5. Functions that return a single, deterministic value.
  6. Multi-statement Table-Valued Functions with WHILE statements.
  7. A plethora of indexes on the tables, all single-column indexes.

I’ve said enough.

If you had seen my face at that moment, you would have laughed. Imagine me staring, horrified, eyes darting around the screen mouthing What the…

A little thinking saves a lot of shouting

Granted it took getting a coffee and staring in disbelief at the code before I recovered. It took getting another coffee after the first before I could rationalise what I was seeing.

I took what I knew, which was these developers were smart. I tried to match that with what I was seeing. And there was an answer.

Theoretical, not Physical

The codebase read like developers who were not used to interacting with a database. Developers who thought of the database as a “place to shove data” and that’s all.

It was clear they had tried to follow the DRY (Don’t Repeat Yourself) approach (#4, #5).

They had read the documentation on hierarchical data types and Microsoft’s saying:…

The built-in hierarchyid data type makes it easier to store and query hierarchical data

https://docs.microsoft.com/en-us/sql/relational-databases/hierarchical-data-sql-server?view=sql-server-ver15

…instead of a parent/child relationship tables. (#2)

They had tried to translate the .Net data type [string] into the database. Deciding that varchar(max) was its closest match. (#3)

They had tried to query the data in a row-by-row approach, instead of a set-based method (#6).

And, they had tried to deal with the consequences of these and other decisions. (#1, #7)

Understanding, not blame

It’s hard to stay annoyed at people when you can understand their motives. Their mindset is the most effective deterrent to anger I can think of. There’s no blame but understanding. You want to help them improve. And that’s where this on-going process is now.

To move away from multi-statement Table-Valued functions with WHILE statements. Here’s inline Table-Valued functions with a recursive CTE (Common Table Expression) instead.

To use variables when you have to but be aware of the change in statistics that it brings.

The difference it can make to a query and a database when the data types are apt. How memory grants, logical page reads, and more are affecting by blobs.

How DEADLOCK_PRIORITY LOW is not an option if every procedure has it! How indexes can be of more than a single column. That there is such a thing as an INCLUDES!

Seeing now that the driving force they have is to create features. But the pain force they feel is database performance. I can grok their choices and actions at the time.

Still, it didn’t stop me going What were you thinking? at the outset.

I’m no better

I’m trying to learn different languages and frameworks at the moment. If someone more knowledgeable was to come along and see my interactions with Linux. If they were to critique my Python files. Or attempt to suppress a groan at my PromQL. I’d appreciate an air of understanding, not blame at that time.

So well done to the people who dived in and attempted the work even if they didn’t know how at the time. To paraphrase; those whos face is marred by dust and sweat and blood deserve the credit.

But don’t think I didn’t see those TRANSACTION LEVEL READ UNCOMMITTED that you’re using as NOLOCKS!

Retrospective: Speaking at DataScotland

Words: 315

Time to read: 1.5 minute

Data Scotland

On the 13th of September 2019, I spoke at DataScotland; my first time talking at a data conference.

My quasi-clickbait title was Feel Validated with dbachecks. If you guessed that I was talking about dbachecks then you’re right.

This is a brief retrospective of that time. Thinking back on that time still makes me relive the emotions that I felt. Nervousness, excitement, and panic.

Good times!

Appreciation

This was my first time speaking at a conference as well as my first time attending Data Scotland.

I recommend that you check it out. It’s an amazing conference created by passionate people and staffed by dedicated volunteers.

The Good

What didn’t count as the good is the way to put this!

  • Amazing fellow speakers.
  • Getting to meet other first-time speakers.
  • Talking with the volunteers.
  • Speaking with attendees.
  • Seeing people who I hadn’t seen in a long time.

Thank you to Craig Porteous, Louise Paterson, Paul Broadwith, and Robert French for all your work and encouragement.

Thank you as well to Brent Miller, Andrew Pruski, David Alcock, and John McCormack for help with the presentation.

The Bad

Feeling drained.

I don’t put this down to DataScotland though.
You may not have heard from me for the last month. I felt drained and took time off from public exposure.

A full year of constant working on the day job and personal work.
2 conferences a month on average for the last year.
Spreading myself out on projects about SQL, PowerShell, Containers, Python, AWS, and Azure without rest.
It’s not something I could sustain without factoring in sharpening the axe time.

Overall

I’m easing myself back into things again with the caveat of planning ahead and making sure I don’t overwhelm myself.
First thing on my list, planning for DataScotland next year.

Whether it’s speaking, volunteering, or attending, I’ll be there.

T-SQL Tuesday #118 – Your fantasy SQL feature

Words: 865

Time to read: ~ 5 minutes

T-SQL Tuesday Time

Welcome back to another installment of T-SQL Tuesday, the monthly blog post call. This month we have Kevin Chant ( twitter ) who has asked us for…

[…] a post about a fantasy SQL Server feature you’ve got in mind.

Kevin Chant

It’s hard for me to believe that my last T-SQL Tuesday post was back in May 2019 but, when I look back over the list of my blog posts, that’s the last one.

I can only put it down to “what I want to do” being out of sync with “what I can do with the time I have”.

So, with that major gap in T-SQL Tuesday posts in place, I’d like to start writing these again.

Beginning with this one, and an apology.

An Apology

I’m starting with an apology for this post because, no matter how I phrase this in my head, I cannot make it seem like I am not complaining.

So I ask that you forgive me if this post comes across as me whining about the level of effort that is currently involved with this.

Fantasy SQL Server feature

My fantasy SQL Server feature is…

  • A performance rating.

I’m not talking about TCP ratings nor am I talking about Sentry One’s Health Score (although I’ll admit that’s pretty close) nor Brent Ozar ( twitter ) and sp_BlitzFirst.

What I would like is a performance rating, an X out of 100, a Low / Medium / High, a sub-par / on-par / above-par description of how your SQL Server is doing.

Why this?

I’m not whinging about this due to a mis-guided want to compare my instances against others. Believe me, I know the state of my instances are not up there.

Nor is a case of wanting to show that my instaces are “in the top 10 in Ireland / Europe / the world”. Believe me, I realised a long time ago that, while I enjoy what I do, I do not want to take the sacrifices needed to get to that level.

DevOps is the union of people, process, and products to enable continuous delivery of value to our end users.

Donovan Brown

We are trying to take major steps with DevOps in our company. To be more transparent, to reduce silos, and to share knowledge so we can get releases out to customers faster. So we can get value to our customers out there faster.

So when a Pull Request (PR) gets sent to me and I respond with concerns, suggestions, and pull some data from our instances to show as an example, I’m really not expecting this response.

Thanks for this but we’re not quite sure what you mean. Could you give us a number please? Like, our SQL Server is doing an x out of 100?

Response

It wasn’t until I was asked this and looked into how you could go about acheiving this that I realised how difficult this is?

First of all, are you talking query performance or SQL Server health?

If it’s the first, how are you going to measure that? Duration? CPU? IO? Sure Query Store would be a great help…

If it’s the later, sure include RPO and RTO. How do you measure HA and DR? Does deadlocks come into play here or query performance?

Are a failed statistics job going to affect the rating on SQL Server Health? Cause I know that it’s going to have an effect on query performance!

Fantasy Feature

So that’s my fantasy feature.

I want a performance rating built into SQL Server. One that you can measure against your own servers, or against telemetry gathered from other servers.

Break it down however you wish.

  • Rating per Query Duration is way up but your Rating per Memory is down.
  • Your Rating per Deadlock has become nearly nonexistent but your Rating per Dirty / Phantom Reads … I got some bad news there…
  • Your Batch Transactions Rating has gone up from the Last Version push but that’s because you stopped doing CURSORS and WHILE loops. Go you, we we’re thinking it was about time!

I don’t have an exact defintion

I don’t know if I’d want this as a single rating. SQL Server is more than the sum of it’s parts.

I don’t know if I’d want this as multiple ratings summed up since I don’t know how you’d weight them. Different companies have different concerns.

I also know that we have tools for this

We have Query Store, we have AGs, we have Performance counters, we have sp_Blitz%, we have Workload tools, we have git, and TFS, and Azure Devops, and AWS CloudFormation, and docker containers “kubeterised” into a CI pipeline.

I’m fully aware that we have nearly everything at our disposal to make this happen. All we need is time, a plan, and the ability to progressively see this through.

Like I said at the start, I apologise if this comes across as me whining.

But that’s not what this T-SQL Tuesday asked. It asked for your Fantasy Feature.

Well my name is Shane O’Neill and right now, I want to know that my SQL Server instance is doing X out of 100.

You tell me that and I’ll work on improving it.

Splitting Functions from Scripts in bulk

Time to read: 2.5 minutes

Words: 504

Previously on…

I’ve talked before about a couple of topics that this blog post pertains to

That is the relevant information so you’re up to speed on where I am.

Bring on the stupid

The stupid thing that I was doing was that I was manually, visually scanning the script, copying out the function definitions, and pasting them into their own function files.

This was long, this was tedious, and this was not a efficient use of my time.

Especially since the scripts were not laid out as logically as I would have liked.

Personally if I were to have nested functions in a script, I would have them towards the beginning of the file. Together, maybe in a little region that I’ve called “functions”.

Actually, if I have to have a “functions” region, then I have too many functions and I’m going to split them out anyway.

The scripts I was looking at were not laid out this way.

Sure there were what appeared to be a function region but there were also functions further down the script, created just before they were needed.

Hence, manually scanning the whole script, taking a note and a copy of each function before moving on again.

Long, tedious, wasteful.

There is a way!

Like I mentioned at the start, in the “pertinent” region, Chris Dent has a function that we have availed of before that we can use her.

Let’s take a look at what it gives us…

First of all, we get a list of the build scripts.

Get-ChildItem -Path .\Git\build-scripts\ -Filter *.ps1

So we now have a list of the scripts. Each one of these scripts may, or may not, have one or many functions defined within them.

How are we going to get these?

We pipe this list to our Get-FunctionInfo function.

Get-ChildItem -Path .\Git\build-scripts\ -Filter *.ps1 |
    Get-FunctionInfo -ErrorAction SilentlyContinue -IncludeNested

Perfect! Now to automate the final part of manual process. Can we grab the definition of these functions and split them out to a separate file per function?

First question is can we grab the function definitions?

Get-ChildItem -Path .\Git\build-scripts\ -Filter *.ps1 |
    Get-FunctionInfo -ErrorAction SilentlyContinue -IncludeNested |
 ForEach-Object {
    $_.Scriptblock.Ast.Parent.Extent.Text 
 }
I’m going to ignore that GetCurrentDateFormat function

Final bit

Now that we know that we can grab the function definition, it’s a quick step to out the contents into a file.

Get-ChildItem -Path .\Git\build-scripts\ -Filter *.ps1 |
    Get-FunctionInfo -ErrorAction SilentlyContinue -IncludeNested |
 ForEach-Object {
    $_.Scriptblock.Ast.Parent.Extent.Text |
        Out-File -FilePath ".\Git\build-scripts\build\$($_.Name).ps1"
 }

And just to double check…

Lovely!

All the functions are split off into their own .ps1 file where they can be reviewed, tests can be created for them, and/or improved.

It’s nice to push the bottleneck down the pipeline. Now I’m wondering if there’s a way we can bulk introduce Pester tests…