This blog post is not what I was intending to write, at least not so early in my list of blog postings. Mainly due to the fact that it’s main issue is around SARG-ability which I think is a bit more advanced than normal.
…by ‘a bit more advanced’ I mean any conversation involving me and SARG-ability would carry along these lines…
> Do you know what SARG-ability is?
>> Me? Yeah, totally!
> Great! What is it so?
Which isn’t totally wrong but since SARG-ability is a main part of this post, it’s best I give you a definition and an example (BONUS: a lovely video!)
So a query is SARGable if “the DBMS engine can take advantage of an index to speed up the execution of the query.”
This SARG-ability is what we are going to be searching for today because we’re going to be searching a table of 15,000,000 rows.
…why 15,000,000? Well I first wanted to populate as many rows as I could but thanks to the joys that are Cartesian products I ended up crashing my SSMS! So 15,000,000 will do us…
Now if this was a book and we had to search every single 15,000,000 pages of that book for the page with the result we want OR if we had the option to go to the back of the book, look up on the index where the the result is and go directly to that page, which do you think would be faster? Easier?
…I’m really hoping you went with the Index option…
Hence, in our example SARG-ability is good and it’s what we want to achieve today.
So with that, on to the actual post…
One of the most common things to search for in SQL Server is dates. Nice and simple and everyone has (presumably) done it.
SELECT * FROM [<my_schema>].[<my_table>] WHERE [<date_column>] = '<date>';
Fine, grand, no problem…if your table is small but remember, we’ve got a table with 15,000,000 rows. Checking every single row is just not going to cut it here.
Unfortunately that is what happened in this situation.
A requirement was given to Development to get all the rows where the date was this year, and Development were smart enough to know that there is a SQL Server function for that.
DECLARE @Year INT = 2016; SELECT [Test_Date] FROM [dbo].[DateTest] WHERE YEAR([Test_Date]) = @Year;
So they sent it onto me to approve and apply to the QA environment and I did what I have come to learn is what most DBA’s do to Developers, I said no.
Why? Here’s why?
First lets create a test table to run this against…this insert took around 2 minutes on my machine.
-- First, create our table CREATE TABLE [dbo].[DateTest] ([Date_Test_Id] INT IDENTITY(1, 1), [Test_Date] datetime2(3)); -- Populate it with 15,000,000 random rows -- from 1st Jan 1900 to 1st Jan 2017. INSERT INTO [dbo].[DateTest] ([Test_Date]) SELECT TOP (15000000) DATEADD(DAY, 0, ABS(CHECKSUM(NEWID())) % 42734) FROM [sys].[messages] AS [m1] CROSS JOIN [sys].[messages] AS [m2]; -- Create an index we can use for our tests CREATE NONCLUSTERED INDEX [nci_DateTest_Test_Date] ON [dbo].[DateTest] ([Test_Date]); -- Show the data SELECT * FROM [DateTest];
…who else knew that there was 42734 days from the 1st of Jan 1900 to the 1st of Jan 2017? I didn’t!…
So now that 1). we have our table and b). we have an index we can use, we can run the developer’s query and be SARGable right?
DECLARE @Year INT = 2016; SELECT [Test_Date] FROM [dbo].[DateTest] WHERE YEAR([Test_Date]) = @Year; GO
Nope! Table scan, ignores our Index and reads all 15M (too lazy for all the zeros) for a measely 127,782 rows! It’s not the slowest, taking around 3.960 seconds but still, we want SARGable!!!
…your row results may differ, random is random after all…
So being the kind, gentle, caring soul that I am (as well as being on my fourth coffee) I went back with a SARGable method.
DECLARE @Year INT = 2016; SELECT [Test_Date] FROM [dbo].[DateTest] WHERE [Test_Date] < DATEADD(YEAR, (@Year - 1900) + 1, 0) -- Start of Next Year AND [Test_Date] > DATEADD(YEAR, (@Year - 1900), - 1); -- End of Last Year GO
Look at that beautiful, beautiful index seek. 127,782 rows in 1.807 seconds, over twice as fast!
But as beautiful as that index seek is, it is not what the developer saw.
> That’s an ugly query, it’s going to be horrible to troubleshoot that.
And off he went.
What he came back with was pleasantly surprising and reminds me that one of the best thing about SQL Server is, that for all it’s restrictive syntax and rules, there is no 1 way to do anything.
What he came back with was the beauty of DATEFROMPARTS.
DECLARE @Year INT = 2016; SELECT [Test_Date] FROM [dbo].[DateTest] WHERE [Test_Date] BETWEEN DATEFROMPARTS(@year, 1, 1) AND DATEFROMPARTS(@year, 12, 31); GO
Boom! Index seek of 127,782 rows in 1.807 seconds.
I was happy cause it was SARG-able and he was happy as it was easier to troubleshoot.
So he asked if it could be applied to the QA environment and I did what I have come to learn is what most DBA’s do to Developers, I said no.
In case you’re wondering why, consider two things: the data type of the [Test_Date] column and this example.
SELECT * FROM [dbo].[DateTest] WHERE [Test_Date] = '2016-12-31';
378 row(s) affected.
UPDATE TOP (3) [dbo].[DateTest] SET [Test_Date] = '2016-12-31 12:34:56' WHERE [Test_Date] = '2016-12-31';
3 row(s) affected.
3 rows difference…I’ll let you figure out why.