Wednesday, October 11, 2017

Sargable Queries...

It has been a while since I wrote some of my best practices posts. I decided to revisit these posts again to see if anything has changed, I also wanted to see if I could add some additional info.

Today we are going to look at sargable queries. You might ask yourself, what is this weird term sargable. Sargable comes from searchable argument, sometimes also referred as Search ARGument ABLE. What that means is that the query will be able to use an index, a seek will be performed instead of a scan. In general any time you have a function wrapped around a column, an index won't be used


Some examples that are not sargable
WHERE LEFT(Name,1) = 'S'
WHERE Year(SomeDate) = 2012
WHERE OrderID * 3 = 33000


Those three should be rewritten like this in order to become sargable

WHERE Name LIKE 'S%'
WHERE SomeDate >= '20120101' AND SomeDate < '20130101'
WHERE OrderID = 33000/3


Let's create a table, insert some data so that we can look at the execution plan
Create this simple table

CREATE TABLE Test(SomeID varchar(100))


Let's insert some data that will start with a letter followed by some digits

INSERT Test
SELECT LEFT(v2.type,1) +RIGHT('0000' + CONVERT(varchar(4),v1.number),4) 
FROM master..spt_values v1
CROSS JOIN (SELECT DISTINCT LEFT(type,1) AS type 
FROM master..spt_values) v2
WHERE v1.type = 'p'


That insert should have generated 32768 rows


Now create this index on that table

CREATE CLUSTERED INDEX cx_test ON Test(SomeID)

Let's take a look at the execution plan, hit CTRL + M, this will add the execution plan once the query is done running

SELECT * FROM Test
WHERE SomeID LIKE 's%'

SELECT * FROM Test
WHERE LEFT(SomeID,1) = 's'

Here is what the plans looks like


As you can see it is 9% versus 91% between the two queries, that is a big difference
Hit CTRL + M again to disable the inclusion of the plan

Run this codeblock, it will give you the plans in a text format

SET SHOWPLAN_TEXT ON
GO

SELECT * FROM Test
WHERE SomeID LIKE 's%'

SELECT * FROM Test
WHERE LEFT(SomeID,1) = 's'
GO

SET SHOWPLAN_TEXT OFF
GO

Here are the two plans
|--Clustered Index Seek(OBJECT:([master].[dbo].[Test].[cx_test]),
SEEK:([master].[dbo].[Test].[SomeID] >= 'RĂ¾' AND [master].[dbo].[Test].[SomeID] < 'T'),
WHERE:([master].[dbo].[Test].[SomeID] like 's%') ORDERED FORWARD)
|--Clustered Index Scan(OBJECT:([master].[dbo].[Test].[cx_test]),
WHERE:(substring([master].[dbo].[Test].[SomeID],(1),(1))='s'))
As you can see the top one while looking more complicated is actually giving you a seek

Making a case sensitive search sargable

Now let's take a look at how we can make a case sensitive search sargable as well
In order to do a search and make it case sensitive, you have to have a case sensitive collation, if your table is not created with a case sensitive collation then you can supply it as part of the query
Here is an example to demonstrate what I mean


This is a simple table created without a collation

CREATE TABLE TempCase1 (Val CHAR(1))
INSERT TempCase1 VALUES('A')
INSERT TempCase1 VALUES('a')

Running this select statement will return both rows

SELECT * FROM TempCase1
WHERE Val = 'A' 

Val
-----
A
a

Now create the same kind of table but with a case sensitive collation


CREATE TABLE TempCase2 (Val CHAR(1) COLLATE SQL_Latin1_General_CP1_CS_AS)
INSERT TempCase2 VALUES('A')
INSERT TempCase2 VALUES('a')
Run the same query

SELECT * FROM TempCase2
WHERE Val = 'A' 

Val
-----
A


As you can see you only get the one row now that matches the case

To return both rows, you can supply the case insensitive collation in the query itself

SELECT * FROM TempCase1
WHERE Val = 'A' COLLATE SQL_Latin1_General_CP1_CI_AS


Val
-----
A
a


Now let's take a look at how we can make the case sensitive search sargable
First create this table and insert some data


CREATE TABLE TempCase (Val CHAR(1))
 
INSERT TempCase VALUES('A')
INSERT TempCase VALUES('B')
INSERT TempCase VALUES('C')
INSERT TempCase VALUES('D')
INSERT TempCase VALUES('E')
INSERT TempCase VALUES('F')
INSERT TempCase VALUES('G')
INSERT TempCase VALUES('H')


Now we will insert some lowercase characters

INSERT TempCase
SELECT LOWER(Val) FROM TempCase


Now we will create our real table which will have 65536 rows

CREATE TABLE CaseSensitiveSearch (Val VARCHAR(50))

We will do a couple of cross joins to generate the data for our queries

INSERT CaseSensitiveSearch
SELECT t1.val + t2.val + t3.val + t4.val
FROM TempCase t1
CROSS JOIN TempCase t2
CROSS JOIN TempCase t3
CROSS JOIN TempCase t4

Create an index on the table

CREATE INDEX IX_SearchVal ON CaseSensitiveSearch(Val)

Just like before, if we run this we will get back the exact value we passed in and also all the upper case and lower case variations

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' 

Here are the results of that query
Val
-----
AbCd
ABcd
Abcd
ABCd
aBCd
abCd
aBcd
abcd
abCD
aBcD
abcD
aBCD
ABCD
AbCD
ABcD
AbcD


If you add the case sensitive collation to the query, you will get only what matches your value

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS

Here is the result, it maches what was passed in
Val
---
ABCD


The problem with the query above is that it will cause a scan. So what can we do, how can we make it perform better? It is simple... combine the two queries

First grab all case sensitive and case insensitive values and then after that filter out the case insensitive values
Here is what that query will look like

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

AND Val LIKE 'ABCD' will result in a seek, but now when it also does the Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS part, it only returns the row that matches your value

If you run both queries, you can look at the plan difference (hit CTRL + M so that the plan is included)

SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS



SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

Here is the plan


As you can see, there is a big difference between the two
Here is the plan in text as well

SET SHOWPLAN_TEXT ON
GO
 
SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS



SELECT * FROM CaseSensitiveSearch
WHERE Val = 'ABCD' COLLATE SQL_Latin1_General_CP1_CS_AS
AND Val LIKE 'ABCD'

GO
 
SET SHOWPLAN_TEXT OFF
GO
|--Table Scan(OBJECT:([tempdb].[dbo].[CaseSensitiveSearch]),
WHERE:(CONVERT_IMPLICIT(varchar(50),[tempdb].[dbo].[CaseSensitiveSearch].[Val],0)=CONVERT(varchar(8000),[@1],0)))
|--Index Seek(OBJECT:([tempdb].[dbo].[CaseSensitiveSearch].[IX_SearchVal]), SEEK:([tempdb].[dbo].[CaseSensitiveSearch].[Val] >= 'ABCD'
AND [tempdb].[dbo].[CaseSensitiveSearch].[Val] <= 'ABCD'),
WHERE:(CONVERT_IMPLICIT(varchar(50),[tempdb].[dbo].[CaseSensitiveSearch].[Val],0)='ABCD' AND [tempdb].[dbo].[CaseSensitiveSearch].[Val] like 'ABCD') ORDERED FORWARD)

I really wish Microsoft would take the time to internally rewrite these two queries when it hits the optimizer


WHERE LEFT(Name,1) = 'S'
WHERE Year(SomeDate) = 2012



It should create these and then performance would be much better

WHERE Name LIKE 'S%'
WHERE SomeDate >= '20120101' AND SomeDate < '20130101'

I think there are probably some SQL Server consultants cursing me now for even trying to suggest this :-)

No comments: