In part 5 of this series I started working through a list of queries designed to demonstrate ways in which the optimizer can produce bad cardinality estimates. In this installment I’m going work through the remainder of the list. The outstanding queries are as follows:

select count(*) from t1 where trunc_5 > (
	select max(trunc_5) from t1 where mod_200 = 0
);

select count(*) from t1 where five_per_sec > sysdate - 60/86400;

execute dbms_lock.sleep(10)
select count(*) from t1 where five_per_sec > sysdate - 10/86400;


variable b1 number
exec :b1 := 10
select count(*) from t1 where skewed = :b1;

exec :b1 := 100
select count(*) from t1 where skewed = :b1;

The first query demonstrates the re-appearance of one of the optimizer’s favourite guesses – 5% – which is frequently used as an estimate of the effect of a subquery. The next pair (with the 10 second sleep in between) demonstrates the potentially catastrophic side effects of statistics going out of date. The final pair introduces us to some of the problems of bind variables and skewed data distributions.

Remember that I’m running all these demonstrations on 11.2.0.4 and, so far, haven’t generated histograms on any of the columns referenced by the queries.

Subquery effects

Here’s the default execution plan from the first query in our list:

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |  1308 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   | 50000 |   244K|   658   (9)| 00:00:04 |
|   3 |    SORT AGGREGATE    |      |     1 |     9 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |  5000 | 45000 |   650   (8)| 00:00:04 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("TRUNC_5">)
   4 - filter("MOD_200"=0)

The query identified 195 rows, but the prediction was 50,000 – which is 5% of the starting 1,000,000.

The error doesn’t really matter in this simple example, but you can imagine that in a more complex query – one that joins the result set from line 2 to another table – an estimate of 50,000 would encourage the optimizer to do a hash join to the next table, while an estimate that was closer to the actual 195 might encourage the optimizer to do a nested loop.

Although there are a few options available there isn’t a nice way to address this problem because that 5% is hard coded into the optimizer.

First, of course, you could simply work out an optimum plan and create a complete set of hints to enforce that plan (perhaps improving the robustness of the hinting by generating the equivalent SQL Plan Baseline after getting the plan you want). If the query were simple enough you might be able to get away with a suitably populated /*+ leading() */ hint in the main query block and the /*+ no_unnest push_subq */ pair in the subquery.

Alternatively, if you’re licensed for the Performance Pack, you could use the dbms_sqltune package (perhaps through the graphic OEM interface) to create an SQL Profile for the query. This will exercise the query and discover that the default 5% is a bad estimate and can create a set of “opt_estimate()” hints to help the optimizer calculate better estimates, which ought to result in a better plan being selected.

Finally, you may be lucky enough to be able to add a “simple” cardinality() hint to the SQL that gets the optimizer started with a suitable estimate and allows it to generate reasonable estimates for the rest of the plan. Unfortunately the cardinality() hint is not quite supported, not really documented, and is much more subtle (and therefore a lot harder to use) than the few notes available on the Internet might suggest.

In my trivial case I could simply add the hint /*+ cardinality(t1 195) */ to the SQL:

select
        /*+ cardinality(t1 195) */
        count(*)
from    t1
where   trunc_5 > (
                select max(trunc_5) from t1 where mod_200 = 0
        )
;
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |  1308 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |     5 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   195 |   975 |   658   (9)| 00:00:04 |
|   3 |    SORT AGGREGATE    |      |     1 |     9 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |  5000 | 45000 |   650   (8)| 00:00:04 |
-----------------------------------------------------------------------------

This hasn’t changed the cost of the query, but it has changed the cardinality estimate at line 2 as required. Oracle doesn’t use the cardinality() internally – if you check the optimizer’s trace file (event 10053) you’ll see that the hint has been translated to an opt_estimate() hint of the form:

opt_estimate(table t1@sel$1 rows=195.000000)

Like all “micro-management” hinting, though, you typically need far more cardinality hints than you might first imagine. As it stands this hint is a “single table access” hint but in more complex cases you may have to include lots of hints to say things like “if you start at t1 it will give you 195 rows, but if you start at table t2 it will give you 5000 rows”, then you will probably have to use the “join cardinality” version of the hint to say things like: “if you start t1 -> t2 that join will give you 800 rows, but t1 -> t3 will give you 650 rows”. Generally speaking it’s a bit of a pain to try and work out all the cardinality hints you need to ensure that the plan you need will appear. It’s best to avoid the technique and stick with more “traditional” hinting.

The passage of time

To run my test I created the data, gathered stats, then ran various queries and pulled their execution plans from memory. By the time I ran the query that compared column five_per_sec with “sysdate – 60 seconds” a few seconds had already passed between creating the data and running the query. The time component on the highest value of five_per_sec was 17:15:17, while the time I ran the query had moved on to 17:15:31, a gap of 14 seconds – which meant there were 46 seconds of data in the table matching the predicate “five_per_sec > sysdate – 60/86400”. Here’s the execution I got when running the query:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   752 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   230 |  1840 |   752  (21)| 00:00:04 |
---------------------------------------------------------------------------

As you can see the optimizer’s estimate is 230 rows – a perfect match for 46 seconds at five rows per second.

What happens, though if I wait a further 10 seconds (time is now 17:15:41) and then query for sysdate – 10 / 86400? I’m querying for data later than 17:15:31, and there is no data in the table matching that predicate, but what does the optimizer expect to find:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   752 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     5 |    40 |   752  (21)| 00:00:04 |
---------------------------------------------------------------------------

The optimizer’s estimate is 5 rows. A few extra tests will show you that this is the number of rows that match “column = constant” – as soon as you execute a query with a range-based predicate that the optimizer can see is asking for data that is entirely outside the known low/high range the arithmetic falls back to the arithmetic for “column = constant”.

The passage of time – case study

In my model the error was not too bad – if the data had carried on arriving at the same rate while I was doing my tests there would have been 50 rows in the last 10 seconds, so 5 wouldn’t (necessarily) be a terrible estimate and might not have caused problems. But the basic behaviour can have a terrible impact, as the following production example shows.

Data arrives at the rate of roughly 300 rows per minute (5 per second) and a very important polling job runs a query every few minutes to identify any data that has arrived in the last 15 minutes (and is still in a certain state) – typically identifying about 2,000 to 4,000 rows. The query is fairly simple and includes the time-critical predicate:

where arrival_time > sysdate – 1/96

Because of the rate of data change the automatic stats collection job runs every night on this table so the statistics are as accurate as a simple, default, setup could make them.

Every few days – apparently completely at random – the polling job hits a terrible performance problem and takes several minutes to run instead of a couple of seconds. Not surprisingly this has an impact on the business operation and makes a big difference to how the users feel about the system. What’s going on?

Just 15 minutes after the stats have been collected the optimizer’s estimate for this query is going to drop from a few thousand to a mere 5 (because, on average, there are 5 rows for any give value of arrival_time). If this happens the plan changes catastrophically because the optimizer decides that it can find a really good strategy for dealing with 5 rows that turns out to be a very bad strategy for the several thousand rows that are really there.

The query is executed every few minutes, so it usually gets re-optimised – producing a good execution plan – very shortly after the stats have just been collected. Most days the cursor then stays in memory and is constantly re-used for the rest of the day; so the optimizer never needs to re-optimize the statement so never has a chance to create the bad plan. Unfortunately something happens occasionally to create a huge demand for memory in the shared pool and the good plan is flushed from memory and the query is re-optimised several hours after the stats have been collected – producing the bad “only 5 rows” plan – and the query then runs badly until the next day or (as the client discovered one day) they gather stats on the table.

Columns that are time (or sequence) based can introduce all sorts of odd problems – though you may have to be a little unlucky to suffer badly as a consequence; but if you have any code which is always looking for “the last few rows” this problem is one to watch out for. There are three basic strategies you could follow if you see the problem:

  • Hard-code the good execution plan with hints (or an SQL Plan Baseline / Stored Outline).
  • Set up a job to gather stats on the critical column and invalidate all related cursors as frequently as necessary (every 5 minutes would be appropriate for the case study).
  • Write some code that calls dbms_stats.set_column_stats() to set the high-value on the column stats to a future value that is far enough forward to avoid the catastrophic change in plan (but not so far that it causes the optimizer to err in the opposite direction). This is really just a variant of the second option, but (a) it’s cheaper to execute and (b) you may be able to execute it far less frequently than you would the job to gather stats.

Bind Variables

The last two queries in the list introduce the problems of bind variables. We will be visiting this topic in more detail in a future installment in the series, the only point I want to make here is that skewed data patterns and bind variables do not mix – you have to help the optimizer.

Looking at the way the data was generated we can work out that the first query of the pair should return 21 rows and the second should return 201 rows (there are 2N + 1 rows for “skewed = N”), but the execution plan in both cases predicts 1,000 rows:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   666 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 |  4000 |   666  (11)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("SKEWED"=:B1)

This is because we did nothing to let the optimizer know that the distribution of the data values was hugely uneven; so the available statistics simply tell it that there are 1,000,000 rows in total with 1,000 distinct values, so any one value will return 1,000,000 / 1,000 = 1,000 rows. We would have got the same result even if we have used the literal values 10 and 100 in the queries rather than using bind variables.

To help Oracle understand the skew we have to create a histogram on the column. Unfortunately I have 1,000 distinct values in the column and in 11g a histogram consists of a maximum of 255 rows (equating to 254 buckets) so Oracle is constrained to generating a height-balanced histogram on the column. In fact, although we can see that that variation between the scarcest data and the most common data is enormous (1 row for skewed = 0 and 1999 rows for skewed = 999) left to “auto-detect” the need for a histogram Oracle doesn’t even create one. And if we force Oracle to construct a histogram of 254 buckets it doesn’t capture any popular values (with 1M rows and 254 buckets a value would have to appear in about 4,000 rows before it could register as popular and, if you’re unlucky, might fail to be noticed even when it appeared in around 8,000 rows). So in 11g a “gathered” histogram won’t help.

To continue the demonstration I’ve switched to 12c. Again if I tell Oracle to use “size auto” to gather a histogram on the column Oracle decides to use 254 buckets then decides it’s not worth it. However in 12c I can have up to 2048 rows in the histogram and when I tell Oracle to use the maximum it creates a frequency histogram with one row for each of the 1,000 values in the column. This is the plan I then get for the two queries:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   665 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |    21 |    84 |   665  (11)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("SKEWED"=:B1)

It’s the same plan for both queries, but at least it predicts the correct number of rows for the first execution. The optimizer manages to do this because on the first execution the query has to be optimized and the optimizer “peeks” at the actual value of the bind variable to help it choose the plan, so the query is optimized as if the predicate read: “skewed = 10”, and the histogram has the information that there are 21 rows for the value 10.

Unfortunately when we execute the query with a new value for the bind variable the optimizer recognizes that it has seen the exact same text in the very recent past and discovers that it has an execution plan in memory that is legal for the query, and simply re-uses it.

Bind peeking and re-using execution plans is a benefit because we don’t compete for and use up resources to optimize every single query execution; but when the data distribution is so uneven that different input values produce dramatically different sizes of result set, plan re-use becomes a liability and we may need to re-optimize statements that are visibly identical but mechanically very different.

This conflict between bind variables and histograms is a long-standing issue, and Oracle has introduced many strategies over the last 20 years to limit the damaging side effects of the conflict; ultimately, though, the application developer needs to be aware of the threat and identify the special cases where it is necessary to get the database and the client code co-operating to deal with extremely skewed data distributions.

Summary

I’ve covered three “harder” problems of optimization. The first is that subqueries introduce a “fixed guess” of 5% that is hard to workaround unless you’re prepared to hint an actual execution path, use a poorly documented cardinality() hint, or use the tuning tool to generate an SQL Profile (set of opt_estimate() hints).

The second is that a column based on time or sequence values can introduce random instability if the most important queries are to find “the most recent data”. The algorithm that the optimizer uses when a query uses an “out of range” value to optimize a range-based predicate could result in a very bad cardinality estimate, leading to a bad execution plan. Either you hint (or create an SQL Plan Baseline for) the execution plan you need, or you take action to keep the high_value recorded for the critical column at a suitably recent (or future) value that the optimizer always gets the right plan when re-optimising the query.

Finally we have to recognize that bind variables and skewed data patterns do not mix – even in the presence of histograms. We will say more about this in a future installment, but for the moment we note that if the same query text could produce extreme variations in the volume of data reported then you probably need to create a histogram on the critical column but you then need to ensure that your code co-operates with the database (perhaps by using literals for the critical predicates) to make most effective use of that histogram.

 

–> Catalogue of current articles in CBO series.

 

Tags: ,