Basics of the Cost Based Optimizer – Part 2

In the first installment of this series I gave an informal description of how the optimizer would consider the possibilities when choosing an execution path for a simple two-table join between an orders table and an order_lines table that I had described. In that installment I supplied the DDL for the tables and its indexes; in this installment I’m going

In the first installment of this series I gave an informal description of how the optimizer would consider the possibilities when choosing an execution path for a simple two-table join between an orders table and an order_lines table that I had described.

In that installment I supplied the DDL for the tables and its indexes; in this installment I’m going to take a look at the data I created, show you a few of the relevant numbers, and then show how the numbers affect the optimizer’s choice of path. The arguments will still be fairly informal, though.

The key point of this installment is that though the optimizer can do some very clever things there are still flaws in the default model which you may be able to identify through simple queries and resolve through precisely targeted changes to configuration.

The data

Here’s the code I used to populate the two tables:

I created 18 concurrent sessions and executed the code 100,000 times from each of the sessions. This should have given me exactly 1.8M rows but I found a few SQL*Plus sessions crashing randomly so launched a couple of extra sessions and ended up with a few more rows than originally planned. The number of rows ended up at 1,810,995 rows in orders and 2,166,518 rows in order_lines for an average of 1.196 order lines per order, with the following breakdown:

It’s worth making a couple of points about the insert statements. The case statement for orders models the number of orders doubling each year over the last 3 years; the ln() (natural logarithm) call in the level for order_lines is a convenient way of getting a typical “real-world” pattern of order lines per order (and minutes per phone-call, cars per family, etc.). The final point to bear in mind is that the tables are located in a tablespace defined to use automatic (bitmap) segment space management – the tablespace is also using system managed extent allocation, but that doesn’t really have a significant impact in this case.

The fact that I inserted a pause of 1/100th of a second between each order helped to reduce the problems of the “ITL explosion”, though the pattern of activity still managed to produce some space inefficiencies on index leaf block splits. The run-time to load the data was about 20 minutes – but if you don’t have a large SGA and fast discs you may find you lose a lot of extra time on “db file sequential read” waits as you maintain the index order_lines(product_id), and “log file parallel write” waits (and associated session “log file sync” waits) as the redo streams out to disc.

Getting Started

We’ll be addressing the main query in the next installment, but I want to start by running a couple of simpler queries to see how the optimizer thinks our data should be handled. Our original query was for product 101234 (an attribute of order_lines) over the last seven days (an attribute of orders), but I’ll start with a query against just the order lines – let’s just sum the value of all order lines for that product – a query that has to visit exactly the data we’re interested in, but conveniently returns only one row:

From a business perspective we were expecting something like 1,000 order lines for each product and that (within reasonable limits) is what the optimizer predicted. Based on our knowledge of the business we were also expecting those rows to be spread fairly evenly across the three years which, moving a little closer to the database technology tells us that we might typically expect every row to be in a separate block – and that (approximately) is what the optimizer’s estimated cost is telling us.

The optimizer has decided that the resources needed to get the data are approximately equivalent to physically reading 1,064 blocks from the database files (4 for the index plus a further 1060 for the table). Given that the execution plan is an index range scan following by a table access by index rowid we know that this equivalence is a direct representation of the fact that the optimizer really does expect the run-time engine to do 1,064 single block reads.

Now let’s look at the other end of our original query – orders for the last 7 days. To execute the original query we would need the order_id to join to the order_lines table, so we’ll select the order_id in this test but use it to get an approximate count of the number of orders in the date range by finding the difference between the min and max ids in the range and adding 1.

(The query identified 20,000 rows while the estimated rowcount was 11,500 – but that discrepancy is consistent with the prediction we made in part 1 about the effect of the number of orders doubling each year.)

The big surprise though – showing simultaneously how smart and how ignorant the optimizer can be – is the strange execution plan that does a hash join between two indexes to get the result. It’s very clever that the optimizer has found a plan that doesn’t need to visit the table, but it’s surprising that the optimizer thinks this plan is more efficient than visiting the table through a simple index range scan; after all we know that the last 7 days of orders are packed into the last “few” blocks in the table – in fact we can check that this is true:

The data we want is scattered across just 447 blocks and we might find similar figures if we sample a few other week-long intervals, so it seems reasonable to expect the optimizer to work out that it should take something in the order of 500 blocks to get all the data it needs – so why does it do a fast full scan on the primary key index at a cost of 5,083 (and don’t ask why 5,083 + 35 seems to sum to 4,115 – sometimes the optimizer does strange things that you don’t need to investigate right away – even in 11.2.0.4) which is about 10 times the cost that the “obvious” plan should be.

We can answer the critical question by reminding ourselves of two things – the tablespace is bitmap managed (ASSM) and I had 18 concurrent processes inserting data. The point of ASSM is to reduce contention on DML, particularly on concurrent inserts, by pointing different sessions at different blocks that are typically, though not necessarily, spread over a batch of 16 consecutive blocks. This is a very good strategy for avoiding “buffer busy waits” (and the more extreme “gc buffer busy waits” that you would get using RAC) but it does mean that at a very fine level of detail the data appears (to the optimizer) to be far more scattered than it really is. We can show the effect with another simple query – reporting the file and block id for a batch of consecutive orders.

As you can see I’ve picked 20 orders ordered by order_date which has given me (not coincidentally) 20 consecutive order IDs; but the data is scattered over 13 different blocks in a randomized fashion. This means that as Oracle walks the index to fetch the data it often jumps from one table block to another – and a measure of this “jumping” activity is used as an important detail when the optimizer decides how efficient a particular indexed access path might be.

When you gather index stats, Oracle uses a function called sys_op_countchg() to count the number of jumps to “a different table block” that would occur as it moves in order through the entire index. This number appears in view user_indexes as the clustering_factor. The higher the clustering_factor the more random (physical) reads the optimizer thinks it would have to do to access the table data in index order.

Historically, though, Oracle would fail to notice that a jump to “the next required block from the table” was taking it back to a table block that had appeared in its very recent history so, walking the sample above, it would see 20 different table blocks where we see only 13. In 12c, with a backport to 11.2.0.4, we can set a “table preference” to tell Oracle to remember recent history as it is calculating the clustering_factor. For example, if we set the preference to 8, Oracle wouldn’t remember that it had recently seen block 29360 (labelled a) the second time it reached it, so it would count it a second time; on the other hand it would remember visiting blocks 29272 and 27952 and count them only once each. Block 29207 (labelled c) is a particularly useful example – when Oracle reaches it the 2nd time it will have forgotten the first visit, so it will count the block a second time, but when it reaches it the 3rd time it will remember the previous (2nd) visit so won’t increment the count.

ASSM tends to scatter inserts over 16 blocks (which makes 16 a good default value for this option), but in my case I connected 18 sessions to the database and kept those sessions alive for a long time doing single row inserts with commits continuously; so I might expect to see a pattern that’s a little worse than typical – a scattering effect that’s broadly spread over 18 blocks. Because of this I’m going to set the table preference to tell Oracle to remember 18 steps of history then gather index stats again and see what happens

Here are the details of the clustering factors before and after adjusting the “history” setting:

As you can see, three of the indexes report a much smaller clustering factor once we instruct Oracle to allow for the effects of concurrency and ASSM scattering of data. The absence of change for the orl_fk_prd index shouldn’t come as a surprise since this is the index on the product ID and we know that individual products really are very randomly scattered throughout the entire history of the table.

So what does the reduced (and more realistic) clustering_factor do to the execution plan of our simple query against the orders table? Here’s the new plan:

The plan is the one we might have expected based on our knowledge of the data. We have to access a “small” number of table blocks by single block read after the index range scan; the optimizer’s estimate is approximately 691 – 32 = 659 single block reads, which doesn’t quite match our precise knowledge for the specific set of data, but with the adjusted clustering_factor the optimizer has got a much more realistic estimate of the work involved.

For comparative purposes we could reset the table_blocks_cached preference to 1, gather stats again, and see what we get if we force Oracle to take this path with the /*+ index(orders (date_ordered)) */ hint:

Without fixing the clustering factor, the optimizer’s estimate for the plan involving a simple index range scan was 11,109 – indicating an estimate of 11,074 distinct table blocks (11,109 – 35) needing to be read. If you compare 11,074 with 659 (the costs / estimates of table block reads) and 1,738,979 with 103,293 (the two clustering_factors) you’ll see that in both cases the ratio is about 16.8 (with some variation due to different sample sizes)

Summary

We’ve examined a couple of queries that will eventually lead us to the table join that we’re interested in. Because they are single-table queries we have a good idea of how many rows are involved, how many blocks are relevant, and how effective the available indexes should be.

Since we’re expecting a simple table access by index range scan we know that the optimizer’s estimated cost for each query should be close to the number of distinct blocks it thinks it will have to visit (which it equates with single block physical reads), and we have a good idea of how many blocks that should be.

However, the optimizer chose an unexpected path for one of the queries with a much higher cost estimate than our well-informed estimate and, since we can use some simple SQL to prove that our estimation method was appropriate, we know that the optimizer is making an invalid assumption somewhere in its calculations.

For index range scans and index full scans a significant contributor to the cost calculation for visiting the table is the index’s clustering_factor and we knew that our orders and order_lines data are fairly well clustered by date; however our knowledge of ASSM and the application (particularly the degree of concurrency) tells us that while the big picture shows well-clustered data, the very fine detail shows a localized scattering effect which Oracle’s traditional algorithm has exaggerated enormously.

Fortunately a recent version of Oracle allowed us to configure a “table preference” to address this particular problem and, after we set this preference to match our pattern of concurrent activity, we can see after the next call to gather stats that this has affected the clustering_factor of all the indexes in exactly the way we might hope – i.e. reduced it dramatically for 3 of them and left the fourth unchanged.

With the corrected clustering_factor the optimizer has switched to the path we expect, with a cost that is a good match for our expectation.

In the next installment we’ll move on to looking at the join that we’re really interested in, and see another of the problems that the optimizer runs into when it doesn’t understand our data as well as we do.

–> Catalogue of current articles in CBO series.