String functions in the WHERE clause and bad estimates

Posted by Barac in SQL Server on Apr 19th, 2018 | 2 comments

Recently we had really nice meetup gathering here in Auckland NZ. Was a beautiful evening, there were 20 people talking about SQL server, was perfect. One of the topics was “String functions in the where clause and bad estimates”

For this demo I used SQL Server 2014 Developer Edition.

As a database I used StackOverflow, you can download this DB from Brent Ozar web page, just follow the instructions

How to Download the Stack Overflow Database via BitTorrent

Using string function in the WHERE clause can cause bad estimates.

I will query TAB Badges (has just clustered index on ID column). In the WHERE clause I will use STRING FUNCTION LEFT. LEFT Returns the left part of a character string with the specified number of characters.

SET STATISTICS TIME, IO ON
DBCC FREEPROCCACHE
SELECT [Name]
,[UserId]
FROM [StackOverflow].[dbo].[Badges] b
WHERE Left(Name,4) like 'Beta%'
SET STATISTICS IO OFF

We can see that we have 155202 logical reads, and query returning 2511 rows

(2511 row(s) affected)
Table ‘Badges’. Scan count 1, logical reads 155202, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Since we do not have Non-Clustered index on that table optimizer use Clustered index to scan whole table

If you want to check size of your indexes you can use following query

USE StackOverflow
SELECT i.[name] AS IndexName, SUM(s.[used_page_count]) * 8 AS IndexSizeKB, SUM(s.[used_page_count]) PageCountNumber
FROM sys.dm_db_partition_stats AS s
INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id]
AND s.[index_id] = i.[index_id]
WHERE i.[name] in ('PK_Badges__Id')
GROUP BY i.[name]
ORDER BY i.[name]
GO

Lets now check the estimates for that Clustered Index Scan

For 2511 Actual Number of Rows we have Estimated 2261260, over two million rows, and since we have a scan we are reading every single row in that table 25125061, over 25 million rows

Exact number of rows for Badges table.

Lets try now next query

It is the same query but this time without LEFT function within WHERE clause

DBCC FREEPROCCACHE
SET STATISTICS TIME, IO ON
SELECT [Name]
,[UserId]
FROM [StackOverflow].[dbo].[Badges] b
WHERE Name like 'Beta%'
SET STATISTICS IO OFF

Execution plan is still the same.

But estimates are more accurate now. Estimated number of rows is now 6748, actual number of rows is the same as before as well as number of rows read, since we still do not have non-clustered index on that table.

If we change our WHERE clause little bit more and use ‘=’ instead of ‘like’ we will have equal number of estimates and actual rows number. Number of rows read stay the same until we cover this query with non-clustered index

DBCC FREEPROCCACHE
SET STATISTICS TIME, IO ON
SELECT [Name]
,[UserId]
FROM [StackOverflow].[dbo].[Badges] b
WHERE Name= 'Beta'
SET STATISTICS IO OFF

So lets add that index

CREATE NONCLUSTERED INDEX [NCI_Name] ON [dbo].[Badges]
(
[Name] ASC,
[Id] ASC
)
INCLUDE ( [UserId]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

If we run the same query again we will have following plan.

We can see that query optimizer using non-clustered index seek to retrieve data from the table and our estimates, actual number of rows and actual number of rows read are the same, 2511.

Now we have just 15 logical reads, instead of 155202 logical reads like in the first query.

(2511 row(s) affected)
Table ‘Badges’. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In the following example you can see how the bad estimates can cause excessive memory grant.

2 Responses to “ “SQL Server / Query Optimization / String functions in the WHERE clause and bad estimates”

Jason says:

May 29, 2018 at 9:48 am

with column statistics, Sql server uses column histogram to estimate number of rows

Reply
- Barac says:
  
  May 29, 2018 at 10:09 am
  
  I agree with you, this was just an example how estimated number of rows can be off with an actual.
  
  Reply

About Me

My name is Zoran, currently living in Auckland, New Zealand. I am a data specialist with more than 15 years of hands-on experience in database administration and optimisation. I am also, Certified Microsoft Trainer (MCT) and Microsoft Certified Solutions Expert (MCSE) with a Master‘s degree in Information Technology. Sharing knowledge and contributing to the SQL Server community is my passion. I am also an organiser of the Auckland SQL User Meetup Group. As well I am active blogger and speaker at different SQL events such as SQL Saturdays, Meetups etc.

SQL Server / Query Optimization / String functions in the WHERE clause and bad estimates

Similar Posts:

2 Responses to “ “SQL Server / Query Optimization / String functions in the WHERE clause and bad estimates”

Leave a Reply to Barac Cancel reply

About Me

Email Subscription

Index Usage Report

Auckland SQL Group

MSDN PROFILE

Archives

SQL SERVER UPDATES

Recent Posts

Tag Cloud