When you NEED to Declare a Table Using the OUTPUT Clause

I learned a thing myself at the end!

Words: 532

Time to read: ~ 2.5 minutes

Update: 20180417 – Added explanation of SQL Script rather than random statements.

Continue reading “When you NEED to Declare a Table Using the OUTPUT Clause”

Output over cursors

I am not pro-cursor. I am not, however, extremely anti-cursor.

I think that if there is a cursor used for a certain problem it just means we don’t know enough about the problem, the data or the code to think of a better solution.

But I’ve had an experience with cursors and, for my cases, found a better way.

That is what this blog post is going to be about, a case where we were using a cursor until we realised a better solution for it.


Background:

We have a weird table.

It’s our fault though as we’ve made it weird, we haven’t properly normalised it and it’s basically just a massive catch-all for records and for some strange reason, only 1 in 10 records have a ticket number.

So let’s create it!

-- Create the test table.
CREATE TABLE
    [dbo].[ProductActions]
    (
    [product_action_id]    INT IDENTITY(0, 1) PRIMARY KEY,
    [product_id]           INT,
    [allowed_action_id]    INT,
    [ticket_number]        VARCHAR(20),
    [date_actioned]        DATE
    );

-- Populate it.
WITH
    L0    AS(SELECT 1 AS c UNION ALL SELECT 1),
    L1    AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2    AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3    AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4    AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5    AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B)
INSERT dbo.ProductActions
    (product_id, allowed_action_id, date_actioned)
SELECT TOP (10000000)
    product_id = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) % 5,
    allowed_action_id = (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + 5) / 5,
    date_actioned = CAST(DATEADD(DAY, (SELECT ABS(CHECKSUM(NEWID())) % 50000), 0) AS DATE)
FROM L5;

UPDATE dbo.ProductActions
SET ticket_number = (CASE
                     WHEN product_action_id % 10 = 0
                     THEN 'TCKT'
                           + REPLACE(QUOTENAME(product_action_id, ''''),'''','')
                     ELSE NULL
                     END);

 

 

…that took around 1 minute for 10,000,000 records, including the create and update statements…

Now say something diabolical happens, a bad delete statement, a disgruntled employee who just won the lottery, a chimpanzee riding on a segway, whatever but for some reason all the records holding our ticket numbers get deleted.
However, thankfully (not sure if that’s the right word) we have Excel sheets of the records that were dropped.

ticketNumbersDeleted

… 9 seconds! 9 seconds for that wealthy employee/chimpanzee to cause havoc…

However, we don’t have everything; we have the product_id and the allowed_action_id.
So the business tells us that the dates don’t matter, it just needs those records back and those ticket numbers!


What we did:

Cursors. (I know, I’m sorry.)

However, we had problems.
Our table had a lot of traffic so we weren’t guaranteed an unbroken sequential range of identifiers that we could update.

And we couldn’t remove these foreign inserts from our range as we had no way of identifying which was our records and which were foreign records.

So we created a cursor. Not a performance tuned cursor since we were relatively new to SQL at the time but a run-of-the-mill, Google what it is and grab the template off the internet cursor.

Steps:

  1. Import the records into a holding table
  2. Give the table a sequential identifier for ordering
  3. Get the first record
  4. Insert it into the original table
  5. Grab the identity of the insert we just inserted
  6. Update it to have the correct ticket number
  7. Grab the next record and repeat steps 4 – 6
  8. Finish

All together 8 steps (although steps 3 – 7 are done around 1 million times) but how fast is it?

Step 2:

ALTER TABLE
dbo.DeletedRecordsPaper
ADD [ID] INT IDENTITY(0, 1);
GO

…around 34 seconds, surprisingly long…sign of things to come?…

Steps 3 – 7:

DECLARE @prodID INT,
@allowed INT,
@entered INT;
-- Start of Step 3

DECLARE FillMissingRecords
CURSOR FOR
SELECT product_id, allowed_action_id
FROM dbo.DeletedRecordsPaper
ORDER BY ID ASC;

OPEN FillMissingRecords

FETCH NEXT
FROM FillMissingRecords
INTO @prodID,
@allowed

WHILE @@FETCH_STATUS = 0
BEGIN  -- Step 4
INSERT dbo.ProductActions
(product_id, allowed_action_id, date_actioned)
SELECT @prodID,
@allowed,
GETDATE();
-- Step 5
SELECT @entered = SCOPE_IDENTITY();
-- Step 6
UPDATE dbo.ProductActions
SET ticket_number = 'TCKT' + CAST(@entered AS varchar(10))
WHERE product_action_id = @entered;
-- Step 7
FETCH NEXT
FROM FillMissingRecords
INTO @prodID,
@allowed

END

CLOSE FillMissingRecords

DEALLOCATE FillMissingRecords;

How’d it do? Well it worked; 1 million records took 9 minutes and 35 seconds though.

…9 minutes and 35 seconds…9 minutes and 35 seconds?!?…I could have recreated the table 9 and a half times in that time! I knew it was going to be slow, I even went out and got a coffee while I was waiting but C’MON!!!!

So altogether, with the adding of the sequential identifier, it took 10 minutes to run.

Now I can’t remember why exactly but this happened a few times so we had to run this a few times.


What we learned:

1). Eventually monkies improve with using keyboards, so too do humans and 2). it’s SQL Server. Use Set Theory!

Enter OUTPUT

Looking at this issue the main problem that we could see was that we needed to know what records we had inserted to update them with a ticket number.

We thought that we couldn’t figure out what we had inserted without using SCOPE_IDENTITY, we didn’t know about the inserted table, we didn’t know about variable tables, essentially we didn’t know about OUTPUT.

Steps:

  1. Import the records into a holding table
  2. Declare a holding table for the identities
  3. Insert all the records into the original table and output the values to our holding table
  4. Update all the records inserted to have the correct ticket number
  5. Finish

All together 5 steps, already an improvement, but that’s just steps, not performance. Is it an actual performance improvement?

…to be honest, if it was anyways faster than 9 minutes, I’d be happy…

-- Step 2
DECLARE @entered
TABLE
(
entered_id INT
);

-- Step 3
INSERT dbo.ProductActions
(product_id, allowed_action_id, date_actioned)
OUTPUT inserted.product_action_id
INTO @entered
SELECT product_id,
allowed_action_id,
GETDATE()
FROM dbo.DeletedRecordsPaper;

-- Step 4
UPDATE pa
SET ticket_number = 'TCKT' + CAST(entered_id AS varchar(10))
FROM dbo.ProductActions AS [pa]
JOIN @entered AS e
ON pa.product_action_id = e.entered_id;

 

29 SECONDS!!!! 1 million rows in 29 seconds!

So we’ve gone from 1 million rows in 9 minutes 35 seconds (~1,730 rows a second) to 1 million in 29 seconds (~34,482 rows a second).

…those results flustered me so much that I forgot maths so if anybody wants to work out that improvement percentage, please let me know. I want to write that one down!…


Conclusion:

The most dangerous phrase in the English language is probably “what does this button do?”
The second most dangerous is “we’ve always done it that way”.
There’s always room for improvement testing (on the right system i.e. NOT PRODUCTION) and OUTPUT over CURSORS any day.