T-SQL Tuesday #125 – Unit Testing Databases.

Words: 1344

Time to read: ~ 7 minutes

Welcome to the April edition of T-SQL Tuesday. This month we have Hamish Watson ( Blog | Twitter ) asking us about how we unit test our databases.

Performance Tuning

At one stage or another, I’m sure that we’ve all been asked to performance tune database code. Sometimes this tuning can involve changing the code. How do we know that the changes that we are introducing isn’t going to introduce bugs? You got it – Unit tests!

Set up

Let’s set up a contrived example – Contrived of course coming from the words “con” as in fake and “Tried V” as in I tried 5 times before I just gave up and used what I had.

USE [tempdb];
GO

/*
    Let's build this thang...
*/
CREATE TABLE [dbo].[Objects]
(
    [id] int NOT NULL CONSTRAINT [PK dbo.Objects(id)] PRIMARY KEY,
    [level] int NOT NULL,
    [object_type] int NOT NULL,
    [tag] nvarchar(256) NOT NULL,
    [description] nvarchar(max) NOT NULL,
    [parent_object_id] int NULL
);
GO

INSERT INTO [dbo].[Objects]
(
    [id],
    [level],
    [object_type],
    [tag],
    [description],
    [parent_object_id]
)
SELECT
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS [id],
    NTILE(10000) OVER (ORDER BY [S01].[alias]) AS [level],
    [M01].[severity] AS [object_type],
    [S01].[alias] AS [tag],
    [M01].[text] AS [description],
    NULLIF(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1, 0) AS [parent_object_id]
FROM [sys].[messages] AS [M01]
INNER JOIN [sys].[syslanguages] AS [S01] ON [M01].[language_id] = [S01].[lcid]
WHERE [S01].[alias] = N'English';
GO

SELECT *
FROM [dbo].[Objects];
GO

This will insert 11,466 records into the dbo.Objects table. The table has a column called parent_object_id that points to its … parent object id, wow you have been paying attention! Luckily, or unluckily, tuning the schema of the table is out of scope for this post

Next we have the piece of code that we’ve been asked to performance tune. It’s …uhm … well it works.

/*
    RBAR SQL
*/
CREATE FUNCTION [dbo].[GetChildObjects]
(
    @ObjectId int
)
RETURNS @ChildObjects TABLE
(
    [object_id] int NOT NULL,
    [level] int NOT NULL,
    [object_type] int NOT NULL,
    [tag] nvarchar(256) NOT NULL,
    [description] nvarchar(max) NOT NULL
)
AS
BEGIN
    --
    -- Insert first object.
    --
    INSERT INTO @ChildObjects
    SELECT
        id,
        level,
        object_type,
        tag,
        description
    FROM dbo.Objects
    WHERE id = @ObjectId;

    --
    -- Stack
    --
    DECLARE @Stack TABLE
    (
        ObjectId int NOT NULL,
        lvl int NOT NULL
    );

    --
    -- Initialise level for stack
    --
    DECLARE @lvl int = 1,
            @CurrentObjectId INT = NULL;

    --
    -- Insert current object
    --
    INSERT INTO @Stack
    (
        ObjectId,
        lvl
    )
    VALUES
    (@ObjectId, @lvl);

    --
    -- Loop through stack and get child objects
    --
    WHILE (@lvl > 0)
    BEGIN
        IF EXISTS (SELECT * FROM @Stack WHERE lvl = @lvl)
        BEGIN

            --
            -- Get object
            --
            SELECT @CurrentObjectId = ObjectId
            FROM @Stack
            WHERE lvl = @lvl;

            --
            -- Insert child objects
            --
            INSERT INTO @ChildObjects
            SELECT 
                id,
                level,
                object_type,
                tag,
                description
            FROM dbo.Objects
            WHERE parent_object_id = @CurrentObjectId;

            --
            -- Delete object from stack
            --
            DELETE FROM @Stack
            WHERE lvl = @lvl
            AND ObjectId = @CurrentObjectId;

            --
            -- Insert the childnodes of the current node into the stack
            --
            INSERT INTO @Stack
            (
                ObjectId,
                lvl
            )
            SELECT
                id,
                @lvl + 1
            FROM dbo.Objects
            WHERE parent_object_id = @CurrentObjectId;

            IF (@@ROWCOUNT > 0) -- If the previous statement added one or more nodes, go down for its first child.
                SET @lvl = @lvl + 1; -- If no nodes are added, check its sibling nodes
        END;
        ELSE
            SET @lvl = @lvl - 1; -- Back to the level immediately above
        END;
    RETURN;
END;
GO

Timing is important

Let’s get some times for this function. There’s nothing fancy here, we’re just going to call it for different values, returning more and more results and see how long it takes to finish.

/*
    Simple tezt...
*/
SELECT * FROM [dbo].[GetChildObjects](11466) OPTION (RECOMPILE); -- 0 seconds.
SELECT * FROM [dbo].[GetChildObjects](11000) OPTION (RECOMPILE); -- 3 seconds.
SELECT * FROM [dbo].[GetChildObjects](10000) OPTION (RECOMPILE); -- 10 seconds.
SELECT * FROM [dbo].[GetChildObjects](9000) OPTION (RECOMPILE); -- 17 seconds.
SELECT * FROM [dbo].[GetChildObjects](8000) OPTION (RECOMPILE); -- 23 seconds.
SELECT * FROM [dbo].[GetChildObjects](7000) OPTION (RECOMPILE); -- 29 seconds.
SELECT * FROM [dbo].[GetChildObjects](6000) OPTION (RECOMPILE); -- 36 seconds.
SELECT * FROM [dbo].[GetChildObjects](5000) OPTION (RECOMPILE); -- 41 seconds.
SELECT * FROM [dbo].[GetChildObjects](4000) OPTION (RECOMPILE); -- 48 seconds.
SELECT * FROM [dbo].[GetChildObjects](3000) OPTION (RECOMPILE); -- 54 seconds.
SELECT * FROM [dbo].[GetChildObjects](2000) OPTION (RECOMPILE); -- 61 seconds.
SELECT * FROM [dbo].[GetChildObjects](1000) OPTION (RECOMPILE); -- 68 seconds.
SELECT * FROM [dbo].[GetChildObjects](10) OPTION (RECOMPILE); -- 76 seconds.

Taking from 0 seconds for 1 row to 76 seconds for 11,456 rows, we now our timings.

As you can see there has been some performance tuning efforts by the code creators already. So impressed was I by the use of OPTION (RECOMPILE) that I asked why they asked me to take a look.

The execution plan seems fine; doesn’t look like there’s anything we can do with it…

Well, let’s take a look at the execution plan, shall we?

Ah!

No surprises

Listen, we both know that this needs to be rewritten. Here is the crux of the post – how do we know that our re-write isn’t going to introduce unexpected side-effects? Little happy “unspecified-features“? You got it – Unit tests!

Creating Unit Tests

To keep this brief, I’ve installed tSQLt on my tempdb database. The documentation for installing it is very easy to follow on the tSQLt website so head there for instructions. We’re going to create some tests for this function.


/*
    tSQLt time!
*/
EXECUTE [tSQLt].[NewTestClass] @ClassName = N'GetChildObjects';
GO

CREATE PROCEDURE [GetChildObjects].[test returns nothing for nothing]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](1);

    -- Assert (empty is empty).
    EXECUTE [tSQLt].[AssertEmptyTable] @TableName = N'#Actual'
END;
GO

CREATE PROCEDURE [GetChildObjects].[test returns 1 row if no child objects]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    INSERT INTO [dbo].[Objects]
    (
        [id],
        [level],
        [object_type],
        [tag],
        [description],
        [parent_object_id]
    )
    VALUES
    (
        1,
        1,
        1,
        N'Don''t care',
        N'Absolutely don''t care',
        NULL
    );

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](1);

    SELECT
        1 AS [object_id],
        1 AS [level],
        1 AS [object_type],
        N'Don''t care' AS [tag],
        N'Absolutely don''t care' AS [description]
    INTO #Expected
    
    EXECUTE [tSQLt].[AssertEqualsTable] @Expected = N'#Expected', @Actual = N'#Actual';
END;
GO


CREATE PROCEDURE [GetChildObjects].[test returns multiple rows...say 4]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    INSERT INTO [dbo].[Objects]
    (
        [id],
        [level],
        [object_type],
        [tag],
        [description],
        [parent_object_id]
    )
    SELECT
        [V].[number],
        [V].[number],
        [V].[number],
        N'Don''t care '+ CONVERT(varchar(2), [V].[number]),
        N'Absolutely don''t care '+ CONVERT(varchar(2), [V].[number]),
        NULLIF([V].[number] -1, 0)
    FROM [master].[dbo].[spt_values] AS [V]
    WHERE [V].[type] = 'P'
      AND [V].[number] > 0
      AND [V].[number] <= 10
    ORDER BY [V].[number]

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](7);

    SELECT
        [id] AS [object_id],
        [level],
        [object_type],
        [tag],
        [description]
    INTO #Expected
    FROM [dbo].[Objects]
    WHERE [id] >= 7
    ORDER BY [id];
    
    EXECUTE [tSQLt].[AssertEqualsTable] @Expected = N'#Expected', @Actual = N'#Actual';
END;
GO

Now we have tests to see what happens when the table is empty, when only 1 row should be returned, and when multiple rows are supposed to be returned. Hey! 4 rows still counts as multiple!

Running these tests couldn’t be simpler:

EXECUTE [tSQLt].[Run] @TestName = N'GetChildObjects';
GO

Re-Write time

The best piece of advice I heard for performance tuning, especially with a hectic schedule, is to

  1. Have a “good enough” goal and have a “time limit”

If the code has to run under a minute and you get it to 51 seconds after 30 minutes, great! Stop – I’m sure you’ve a to-do list that is only getting bigger.

/*
    All your bases are belonging to SET!!!
*/
DROP FUNCTION [dbo].[GetChildObjects]; --Don't worry, it's in source control.
GO
CREATE FUNCTION [dbo].[GetChildObjects]
(
    @ObjectId int
)
RETURNS TABLE
AS
RETURN
(
    WITH [Stack] AS
    (
        SELECT
            [O].[id] AS [object_id],
            [O].[parent_object_id]
        FROM [dbo].[Objects] AS [O]
        WHERE [O].[id] = @ObjectId


        UNION ALL

        SELECT
            [Obj].[id],
            [Obj].[parent_object_id]
        FROM [Stack]
        INNER JOIN [dbo].[Objects] AS [Obj] ON [Stack].[object_id] = [Obj].[parent_object_id]
    )
    SELECT
        [Det].[id] AS [object_id],
        [Det].[level],
        [Det].[object_type],
        [Det].[tag],
        [Det].[description]
    FROM [Stack] AS [S]
    INNER JOIN [dbo].[Objects] AS [Det] ON [S].[object_id] = [Det].[id]

);
GO

Wait! How do you… oh yeah. Unit Tests.

EXECUTE [tSQLt].[Run] @TestName = N'GetChildObjects';
GO

Perfect, now we can check out the timings!

SELECT * FROM [dbo].[GetChildObjects](11466) OPTION (RECOMPILE, MAXRECURSION 0); -- 0 seconds.
SELECT * FROM [dbo].[GetChildObjects](11000) OPTION (RECOMPILE, MAXRECURSION 0); -- 1 second.
SELECT * FROM [dbo].[GetChildObjects](10000) OPTION (RECOMPILE, MAXRECURSION 0); -- 3 seconds.
SELECT * FROM [dbo].[GetChildObjects](9000) OPTION (RECOMPILE, MAXRECURSION 0); -- 6 seconds.
SELECT * FROM [dbo].[GetChildObjects](8000) OPTION (RECOMPILE, MAXRECURSION 0); -- 8 seconds.
SELECT * FROM [dbo].[GetChildObjects](7000) OPTION (RECOMPILE, MAXRECURSION 0); -- 13 seconds.
SELECT * FROM [dbo].[GetChildObjects](6000) OPTION (RECOMPILE, MAXRECURSION 0); -- 15 seconds.
SELECT * FROM [dbo].[GetChildObjects](5000) OPTION (RECOMPILE, MAXRECURSION 0); -- 16 seconds.
SELECT * FROM [dbo].[GetChildObjects](4000) OPTION (RECOMPILE, MAXRECURSION 0); -- 18 seconds.
SELECT * FROM [dbo].[GetChildObjects](3000) OPTION (RECOMPILE, MAXRECURSION 0); -- 21 seconds.
SELECT * FROM [dbo].[GetChildObjects](2000) OPTION (RECOMPILE, MAXRECURSION 0); -- 23 seconds.
SELECT * FROM [dbo].[GetChildObjects](1000) OPTION (RECOMPILE, MAXRECURSION 0); -- 26 seconds.
SELECT * FROM [dbo].[GetChildObjects](10) OPTION (RECOMPILE, MAXRECURSION 0); -- 31 seconds.

Taking from 0 seconds for 1 row to 31 seconds for 11,456 rows, we now our new timings. And while I’m not a fan of the MAXRECURSION 0 I don’t have a major problem with adding it to an existing RECOMPILE statement.

A great side effect is that we have more information in the execution plan!

A wild index suggestion appeared

There we have it!

Thanks to our unit tests we have a more performant code to send back with confidence that it works the same as the old code…just faster.

Now if you’ll excuse me, my 30 minutes are up and my to-do list is growing.

T-SQL Tuesday #120 – What were you thinking?

Words: 712
Time to read: ~ 3.5 minutes

10 Years

T-SQL Tuesday is upon us once more. T-SQL Tuesday number 120 means something else as well. 120 monthly posts equals 10 years that Adam Machanic’s ( twitter | blog ) blog post party has been going on.

Wayne Sheffield ( twitter | blog ) is hosting this month’s event. Wayne asks us something that I’m sure we’ve all thought at some stage.

What were you thinking?

https://blog.waynesheffield.com/wayne/archive/2019/11/t-sql-tuesday-120-what-were-you-thinking/

In the beginning…

The first exploration of a system leaves a lasting impact. When you first get a chance to delve into the database, you capture a shot of what the coding standards are like. You gleam the past experiences of the developers.

I’m looking for instances of NOLOCK if I’m being honest.

…there are impressions.

This impression was a What were you thinking? experience.

  1. DEADLOCK_PRIORITY LOW on most procedures.
  2. A lot of hierarchial data types.
  3. VARCHAR(MAX) on most columns
  4. Variables at the start of procedures used in equality WHERE clauses. e.g. DECLARE @Success int; Set @Success = 4; ... WHERE StatusId = @Success.
  5. Functions that return a single, deterministic value.
  6. Multi-statement Table-Valued Functions with WHILE statements.
  7. A plethora of indexes on the tables, all single-column indexes.

I’ve said enough.

If you had seen my face at that moment, you would have laughed. Imagine me staring, horrified, eyes darting around the screen mouthing What the…

A little thinking saves a lot of shouting

Granted it took getting a coffee and staring in disbelief at the code before I recovered. It took getting another coffee after the first before I could rationalise what I was seeing.

I took what I knew, which was these developers were smart. I tried to match that with what I was seeing. And there was an answer.

Theoretical, not Physical

The codebase read like developers who were not used to interacting with a database. Developers who thought of the database as a “place to shove data” and that’s all.

It was clear they had tried to follow the DRY (Don’t Repeat Yourself) approach (#4, #5).

They had read the documentation on hierarchical data types and Microsoft’s saying:…

The built-in hierarchyid data type makes it easier to store and query hierarchical data

https://docs.microsoft.com/en-us/sql/relational-databases/hierarchical-data-sql-server?view=sql-server-ver15

…instead of a parent/child relationship tables. (#2)

They had tried to translate the .Net data type [string] into the database. Deciding that varchar(max) was its closest match. (#3)

They had tried to query the data in a row-by-row approach, instead of a set-based method (#6).

And, they had tried to deal with the consequences of these and other decisions. (#1, #7)

Understanding, not blame

It’s hard to stay annoyed at people when you can understand their motives. Their mindset is the most effective deterrent to anger I can think of. There’s no blame but understanding. You want to help them improve. And that’s where this on-going process is now.

To move away from multi-statement Table-Valued functions with WHILE statements. Here’s inline Table-Valued functions with a recursive CTE (Common Table Expression) instead.

To use variables when you have to but be aware of the change in statistics that it brings.

The difference it can make to a query and a database when the data types are apt. How memory grants, logical page reads, and more are affecting by blobs.

How DEADLOCK_PRIORITY LOW is not an option if every procedure has it! How indexes can be of more than a single column. That there is such a thing as an INCLUDES!

Seeing now that the driving force they have is to create features. But the pain force they feel is database performance. I can grok their choices and actions at the time.

Still, it didn’t stop me going What were you thinking? at the outset.

I’m no better

I’m trying to learn different languages and frameworks at the moment. If someone more knowledgeable was to come along and see my interactions with Linux. If they were to critique my Python files. Or attempt to suppress a groan at my PromQL. I’d appreciate an air of understanding, not blame at that time.

So well done to the people who dived in and attempted the work even if they didn’t know how at the time. To paraphrase; those whos face is marred by dust and sweat and blood deserve the credit.

But don’t think I didn’t see those TRANSACTION LEVEL READ UNCOMMITTED that you’re using as NOLOCKS!