SQL Server / Query Optimization / String functions in the WHERE clause and bad estimates

Recently we had really nice meetup gathering here in Auckland NZ. Was a beautiful evening, there were 20 people talking about SQL server, was perfect. One of the topics was “String functions in the where clause and bad estimates”

For this demo I used SQL Server 2014 Developer Edition.

As a database I used StackOverflow, you can download this DB from Brent Ozar web page, just follow the instructions

How to Download the Stack Overflow Database via BitTorrent

 

Using string function in the WHERE clause can cause bad estimates.

I will query TAB Badges (has just clustered index on ID column). In the WHERE clause I will use STRING FUNCTION LEFT. LEFT Returns the left part of a character string with the specified number of characters.

 

We can see that we have 155202 logical reads, and query returning 2511 rows

(2511 row(s) affected)
Table ‘Badges’. Scan count 1, logical reads 155202, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Since we do not have Non-Clustered index on that table optimizer use Clustered index to scan whole table

 

If you want to check size of your indexes you can use following query

 

 

Lets now check the estimates for that Clustered Index Scan

 

For 2511 Actual Number of Rows we have Estimated 2261260, over two million rows, and since we have a scan we are reading every single row in that table 25125061, over 25 million rows

 

Exact number of rows for Badges table.

 

Lets try now next query

It is the same query but this time without LEFT function within WHERE clause

 

Execution plan is still the same.

But estimates are more accurate now. Estimated number of rows is now 6748, actual number of rows is the same as before as well as number of rows read, since we still do not have non-clustered index on that table.

 

If we change our WHERE clause little bit more and use ‘=’ instead of ‘like’ we will have equal number of estimates and actual rows number. Number of rows read stay the same until we cover this query with non-clustered index

 

So lets add that index

 

If we run the same query again we will have following plan.

We can see that query optimizer using non-clustered index seek to retrieve data from the table and our estimates, actual number of rows and actual number of rows read are the same, 2511.

 

Now we have just 15 logical reads, instead of 155202 logical reads like in the first query.

(2511 row(s) affected)
Table ‘Badges’. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

In the following example you can see how the bad estimates can cause excessive memory grant.

 

 

Similar Posts:

2 Responses to “ “SQL Server / Query Optimization / String functions in the WHERE clause and bad estimates”

  1. Jason says:

    with column statistics, Sql server uses column histogram to estimate number of rows

Leave a Reply

Your email address will not be published. Required fields are marked *