T-SQL Tuesday #125 – Unit Testing Databases.

Words: 1344

Time to read: ~ 7 minutes

Welcome to the April edition of T-SQL Tuesday. This month we have Hamish Watson ( Blog | Twitter ) asking us about how we unit test our databases.

Performance Tuning

At one stage or another, I’m sure that we’ve all been asked to performance tune database code. Sometimes this tuning can involve changing the code. How do we know that the changes that we are introducing isn’t going to introduce bugs? You got it – Unit tests!

Set up

Let’s set up a contrived example – Contrived of course coming from the words “con” as in fake and “Tried V” as in I tried 5 times before I just gave up and used what I had.

USE [tempdb];
GO

/*
    Let's build this thang...
*/
CREATE TABLE [dbo].[Objects]
(
    [id] int NOT NULL CONSTRAINT [PK dbo.Objects(id)] PRIMARY KEY,
    [level] int NOT NULL,
    [object_type] int NOT NULL,
    [tag] nvarchar(256) NOT NULL,
    [description] nvarchar(max) NOT NULL,
    [parent_object_id] int NULL
);
GO

INSERT INTO [dbo].[Objects]
(
    [id],
    [level],
    [object_type],
    [tag],
    [description],
    [parent_object_id]
)
SELECT
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS [id],
    NTILE(10000) OVER (ORDER BY [S01].[alias]) AS [level],
    [M01].[severity] AS [object_type],
    [S01].[alias] AS [tag],
    [M01].[text] AS [description],
    NULLIF(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1, 0) AS [parent_object_id]
FROM [sys].[messages] AS [M01]
INNER JOIN [sys].[syslanguages] AS [S01] ON [M01].[language_id] = [S01].[lcid]
WHERE [S01].[alias] = N'English';
GO

SELECT *
FROM [dbo].[Objects];
GO

This will insert 11,466 records into the dbo.Objects table. The table has a column called parent_object_id that points to its … parent object id, wow you have been paying attention! Luckily, or unluckily, tuning the schema of the table is out of scope for this post

Next we have the piece of code that we’ve been asked to performance tune. It’s …uhm … well it works.

/*
    RBAR SQL
*/
CREATE FUNCTION [dbo].[GetChildObjects]
(
    @ObjectId int
)
RETURNS @ChildObjects TABLE
(
    [object_id] int NOT NULL,
    [level] int NOT NULL,
    [object_type] int NOT NULL,
    [tag] nvarchar(256) NOT NULL,
    [description] nvarchar(max) NOT NULL
)
AS
BEGIN
    --
    -- Insert first object.
    --
    INSERT INTO @ChildObjects
    SELECT
        id,
        level,
        object_type,
        tag,
        description
    FROM dbo.Objects
    WHERE id = @ObjectId;

    --
    -- Stack
    --
    DECLARE @Stack TABLE
    (
        ObjectId int NOT NULL,
        lvl int NOT NULL
    );

    --
    -- Initialise level for stack
    --
    DECLARE @lvl int = 1,
            @CurrentObjectId INT = NULL;

    --
    -- Insert current object
    --
    INSERT INTO @Stack
    (
        ObjectId,
        lvl
    )
    VALUES
    (@ObjectId, @lvl);

    --
    -- Loop through stack and get child objects
    --
    WHILE (@lvl > 0)
    BEGIN
        IF EXISTS (SELECT * FROM @Stack WHERE lvl = @lvl)
        BEGIN

            --
            -- Get object
            --
            SELECT @CurrentObjectId = ObjectId
            FROM @Stack
            WHERE lvl = @lvl;

            --
            -- Insert child objects
            --
            INSERT INTO @ChildObjects
            SELECT 
                id,
                level,
                object_type,
                tag,
                description
            FROM dbo.Objects
            WHERE parent_object_id = @CurrentObjectId;

            --
            -- Delete object from stack
            --
            DELETE FROM @Stack
            WHERE lvl = @lvl
            AND ObjectId = @CurrentObjectId;

            --
            -- Insert the childnodes of the current node into the stack
            --
            INSERT INTO @Stack
            (
                ObjectId,
                lvl
            )
            SELECT
                id,
                @lvl + 1
            FROM dbo.Objects
            WHERE parent_object_id = @CurrentObjectId;

            IF (@@ROWCOUNT > 0) -- If the previous statement added one or more nodes, go down for its first child.
                SET @lvl = @lvl + 1; -- If no nodes are added, check its sibling nodes
        END;
        ELSE
            SET @lvl = @lvl - 1; -- Back to the level immediately above
        END;
    RETURN;
END;
GO

Timing is important

Let’s get some times for this function. There’s nothing fancy here, we’re just going to call it for different values, returning more and more results and see how long it takes to finish.

/*
    Simple tezt...
*/
SELECT * FROM [dbo].[GetChildObjects](11466) OPTION (RECOMPILE); -- 0 seconds.
SELECT * FROM [dbo].[GetChildObjects](11000) OPTION (RECOMPILE); -- 3 seconds.
SELECT * FROM [dbo].[GetChildObjects](10000) OPTION (RECOMPILE); -- 10 seconds.
SELECT * FROM [dbo].[GetChildObjects](9000) OPTION (RECOMPILE); -- 17 seconds.
SELECT * FROM [dbo].[GetChildObjects](8000) OPTION (RECOMPILE); -- 23 seconds.
SELECT * FROM [dbo].[GetChildObjects](7000) OPTION (RECOMPILE); -- 29 seconds.
SELECT * FROM [dbo].[GetChildObjects](6000) OPTION (RECOMPILE); -- 36 seconds.
SELECT * FROM [dbo].[GetChildObjects](5000) OPTION (RECOMPILE); -- 41 seconds.
SELECT * FROM [dbo].[GetChildObjects](4000) OPTION (RECOMPILE); -- 48 seconds.
SELECT * FROM [dbo].[GetChildObjects](3000) OPTION (RECOMPILE); -- 54 seconds.
SELECT * FROM [dbo].[GetChildObjects](2000) OPTION (RECOMPILE); -- 61 seconds.
SELECT * FROM [dbo].[GetChildObjects](1000) OPTION (RECOMPILE); -- 68 seconds.
SELECT * FROM [dbo].[GetChildObjects](10) OPTION (RECOMPILE); -- 76 seconds.

Taking from 0 seconds for 1 row to 76 seconds for 11,456 rows, we now our timings.

As you can see there has been some performance tuning efforts by the code creators already. So impressed was I by the use of OPTION (RECOMPILE) that I asked why they asked me to take a look.

The execution plan seems fine; doesn’t look like there’s anything we can do with it…

Well, let’s take a look at the execution plan, shall we?

Ah!

No surprises

Listen, we both know that this needs to be rewritten. Here is the crux of the post – how do we know that our re-write isn’t going to introduce unexpected side-effects? Little happy “unspecified-features“? You got it – Unit tests!

Creating Unit Tests

To keep this brief, I’ve installed tSQLt on my tempdb database. The documentation for installing it is very easy to follow on the tSQLt website so head there for instructions. We’re going to create some tests for this function.


/*
    tSQLt time!
*/
EXECUTE [tSQLt].[NewTestClass] @ClassName = N'GetChildObjects';
GO

CREATE PROCEDURE [GetChildObjects].[test returns nothing for nothing]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](1);

    -- Assert (empty is empty).
    EXECUTE [tSQLt].[AssertEmptyTable] @TableName = N'#Actual'
END;
GO

CREATE PROCEDURE [GetChildObjects].[test returns 1 row if no child objects]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    INSERT INTO [dbo].[Objects]
    (
        [id],
        [level],
        [object_type],
        [tag],
        [description],
        [parent_object_id]
    )
    VALUES
    (
        1,
        1,
        1,
        N'Don''t care',
        N'Absolutely don''t care',
        NULL
    );

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](1);

    SELECT
        1 AS [object_id],
        1 AS [level],
        1 AS [object_type],
        N'Don''t care' AS [tag],
        N'Absolutely don''t care' AS [description]
    INTO #Expected
    
    EXECUTE [tSQLt].[AssertEqualsTable] @Expected = N'#Expected', @Actual = N'#Actual';
END;
GO


CREATE PROCEDURE [GetChildObjects].[test returns multiple rows...say 4]
AS
BEGIN
    -- Arrange (new "fake" table).
    EXECUTE [tSQLt].[FakeTable] @TableName = N'[dbo].[Objects]';

    INSERT INTO [dbo].[Objects]
    (
        [id],
        [level],
        [object_type],
        [tag],
        [description],
        [parent_object_id]
    )
    SELECT
        [V].[number],
        [V].[number],
        [V].[number],
        N'Don''t care '+ CONVERT(varchar(2), [V].[number]),
        N'Absolutely don''t care '+ CONVERT(varchar(2), [V].[number]),
        NULLIF([V].[number] -1, 0)
    FROM [master].[dbo].[spt_values] AS [V]
    WHERE [V].[type] = 'P'
      AND [V].[number] > 0
      AND [V].[number] <= 10
    ORDER BY [V].[number]

    -- Act
    SELECT
        *
    INTO #Actual
    FROM [dbo].[GetChildObjects](7);

    SELECT
        [id] AS [object_id],
        [level],
        [object_type],
        [tag],
        [description]
    INTO #Expected
    FROM [dbo].[Objects]
    WHERE [id] >= 7
    ORDER BY [id];
    
    EXECUTE [tSQLt].[AssertEqualsTable] @Expected = N'#Expected', @Actual = N'#Actual';
END;
GO

Now we have tests to see what happens when the table is empty, when only 1 row should be returned, and when multiple rows are supposed to be returned. Hey! 4 rows still counts as multiple!

Running these tests couldn’t be simpler:

EXECUTE [tSQLt].[Run] @TestName = N'GetChildObjects';
GO

Re-Write time

The best piece of advice I heard for performance tuning, especially with a hectic schedule, is to

  1. Have a “good enough” goal and have a “time limit”

If the code has to run under a minute and you get it to 51 seconds after 30 minutes, great! Stop – I’m sure you’ve a to-do list that is only getting bigger.

/*
    All your bases are belonging to SET!!!
*/
DROP FUNCTION [dbo].[GetChildObjects]; --Don't worry, it's in source control.
GO
CREATE FUNCTION [dbo].[GetChildObjects]
(
    @ObjectId int
)
RETURNS TABLE
AS
RETURN
(
    WITH [Stack] AS
    (
        SELECT
            [O].[id] AS [object_id],
            [O].[parent_object_id]
        FROM [dbo].[Objects] AS [O]
        WHERE [O].[id] = @ObjectId


        UNION ALL

        SELECT
            [Obj].[id],
            [Obj].[parent_object_id]
        FROM [Stack]
        INNER JOIN [dbo].[Objects] AS [Obj] ON [Stack].[object_id] = [Obj].[parent_object_id]
    )
    SELECT
        [Det].[id] AS [object_id],
        [Det].[level],
        [Det].[object_type],
        [Det].[tag],
        [Det].[description]
    FROM [Stack] AS [S]
    INNER JOIN [dbo].[Objects] AS [Det] ON [S].[object_id] = [Det].[id]

);
GO

Wait! How do you… oh yeah. Unit Tests.

EXECUTE [tSQLt].[Run] @TestName = N'GetChildObjects';
GO

Perfect, now we can check out the timings!

SELECT * FROM [dbo].[GetChildObjects](11466) OPTION (RECOMPILE, MAXRECURSION 0); -- 0 seconds.
SELECT * FROM [dbo].[GetChildObjects](11000) OPTION (RECOMPILE, MAXRECURSION 0); -- 1 second.
SELECT * FROM [dbo].[GetChildObjects](10000) OPTION (RECOMPILE, MAXRECURSION 0); -- 3 seconds.
SELECT * FROM [dbo].[GetChildObjects](9000) OPTION (RECOMPILE, MAXRECURSION 0); -- 6 seconds.
SELECT * FROM [dbo].[GetChildObjects](8000) OPTION (RECOMPILE, MAXRECURSION 0); -- 8 seconds.
SELECT * FROM [dbo].[GetChildObjects](7000) OPTION (RECOMPILE, MAXRECURSION 0); -- 13 seconds.
SELECT * FROM [dbo].[GetChildObjects](6000) OPTION (RECOMPILE, MAXRECURSION 0); -- 15 seconds.
SELECT * FROM [dbo].[GetChildObjects](5000) OPTION (RECOMPILE, MAXRECURSION 0); -- 16 seconds.
SELECT * FROM [dbo].[GetChildObjects](4000) OPTION (RECOMPILE, MAXRECURSION 0); -- 18 seconds.
SELECT * FROM [dbo].[GetChildObjects](3000) OPTION (RECOMPILE, MAXRECURSION 0); -- 21 seconds.
SELECT * FROM [dbo].[GetChildObjects](2000) OPTION (RECOMPILE, MAXRECURSION 0); -- 23 seconds.
SELECT * FROM [dbo].[GetChildObjects](1000) OPTION (RECOMPILE, MAXRECURSION 0); -- 26 seconds.
SELECT * FROM [dbo].[GetChildObjects](10) OPTION (RECOMPILE, MAXRECURSION 0); -- 31 seconds.

Taking from 0 seconds for 1 row to 31 seconds for 11,456 rows, we now our new timings. And while I’m not a fan of the MAXRECURSION 0 I don’t have a major problem with adding it to an existing RECOMPILE statement.

A great side effect is that we have more information in the execution plan!

A wild index suggestion appeared

There we have it!

Thanks to our unit tests we have a more performant code to send back with confidence that it works the same as the old code…just faster.

Now if you’ll excuse me, my 30 minutes are up and my to-do list is growing.

Talking Data Bits

Words: 160

Time to read: ~ 1 minute.

Yes I am aware that these are normally posted directly after the podcast

I had the absolute pleasure of talking with Kevin Hill ( Blog | Twitter | Podcast ) on the first episode of his new podcast “Data Bits”.

You can check out Kevin’s blog post announcing it here and the actual podcast here.

It was an energising experience reliving past times, talking data tales, and cringing over shared scars received from working with data.

Next, Kevin had Andy Levy ( Blog | Twitter ) as his second guest on the show. It was uplifting hearing them talk shop and realising that maybe I should get more “aggressively friendly”.

You can check out the episode with Andy here.

I’m currently catching up with the next few episodes including Ginger Grant ( Blog | Twitter ), Jules Behrens ( Twitter ), and Louis Davidson ( Blog | Twitter ).

Thankfully, I’ve been able to scrounge some more time to listen to podcasts in this lock-down. There’s always an upside 😁!