09 July 2015

5072 views

4 0

Basics of the Cost Based Optimizer – Part 4

In the first three installments of this series I outlined the mechanisms used by Oracle to analyze a very simple query. Sticking with the informal approach, I’m going to look at the high-level strategy the optimizer takes when handling a more complex query. To do this I’ll be looking at query blocks, scratching the surface of query transformation, explaining why it’s a good idea to name all your query blocks and know how to find the names of any query blocks generated by the optimizer as it transforms your original query into something it would prefer to handle.

Query Blocks

The “unit of optimization” – the thing that the optimizer bases its arithmetic on – is the query block. Every time you see one of the words “select”, “insert”, “update”, “delete”, “merge” in an SQL statement it’s introducing a query block. Our query from the previous installments was a simple select statement consisting of a single query block; here is another statement, which consists of two query blocks:

select

/*+ qb_name(main) */

from emp emp1

where

sal > (

select

/*+ qb_name(subq) */

avg(emp2.sal)

from emp emp2

where emp2.deptno = emp1.deptno

)

;

The emp table comes from the demo scott schema and this query reports any employee who earns more than the average salary for their department and, as you can see, the statement includes a correlated subquery to calculate the average of each employee’s department.

You’ll notice the qb_name() hint that I’ve used to give a specific name to each of the query blocks in the statement – as a former C programmer I can’t resist naming the main query block “main”, and I’ve named the correlated subquery “subq”. For reasons we’ll examine later it can be very useful to name all query blocks, particularly when you’re trying to debug or hint a complex query. If you don’t name your query blocks explicitly then Oracle will name them with names like sel$1, sel$2, and so on, for select statements with a similar convention for updates, deletes, etc.

So what does it mean to say that the “unit of optimization” is the query block when we have a query with two query blocks? From a high-level perspective this really means the optimizer doesn’t want to work out a plan for anything more complex than a very simple query structure that looks like this:

select {list of expressions}

from {list of tables}

where {list of predicates}

[group by / having / order by etc …]

which means that things like correlated subqueries are too complicated, which means that Oracle has to transform our query into the desired form (if that can be done legally) or it has to optimize the two query blocks separately then work out how to stitch the two pieces together. It’s particularly in the area of query transformation that the optimizer has evolved and become more sophisticated in recent years.

Option 1: Separate optimization.

If the optimizer decides to handle our two query blocks independently this means, ignoring a lot of the detailed working, that it has to optimise the following two pieces of SQL:

Query block main:

select *

from emp

where sal > :b1

;

Query block subq

select

dept_no, avg(sal)

from emp

where deptno = :b2

;

The first thing to note about these two query blocks is that we are optimizing for “unknown values”. Optimising for query block main then gives us the cost of execution and an estimate of the number of rows that it will return. Optimising for query block subq gives us the cost of executing the subquery once. With these figures the optimizer can then estimate how many times it will have to run the subquery, and the number of times that the sal test will be satisfied.

Here’s an execution plan that shows this happening. I’ve had to put a no_unnest hint into the subquery itself (though I could have put the hint no_unnest (@subq) in the main body of the query) to make this plan appear:

----------------------------------------------------------------------------

----------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 38 | 12 (0)| 00:00:01 |

|* 1 | FILTER | | | | | |

| 2 | TABLE ACCESS FULL | EMP | 14 | 532 | 3 (0)| 00:00:01 |

| 3 | SORT AGGREGATE | | 1 | 7 | | |

|* 4 | TABLE ACCESS FULL| EMP | 5 | 35 | 3 (0)| 00:00:01 |

----------------------------------------------------------------------------

In this plan we can see the full tablescan needed for query block main appearing at operation 2, and the plan needed for query block subq appearing at operations 3 and 4. The FILTER operation combines the two sub-plans, in principle executing query block subq once for every row returned by query block main.

There are a couple of numbers worth mentioning.

The 5 rows predicted at operation 4 comes from (number of rows in emp / number of distinct deptno in emp – viz: 14 / 3).
The total cost of the query is 12 because the optimizer has calculated that the subquery (with a cost of 3) will have to be executed 3 times (number of distinct deptno values likely to be returned in the tablescan) for a total of 9, plus the cost of the original tablescan.
The final number of rows ought to be derived as 5% of the rows found in the tablescan (the standard “guess” that the optimizer uses for a filter like “column > {unknown}”. In this “filter subquery” case, though, and for many recent versions of Oracle, there’s a bug that reduces the final estimate to 1 irrespective of how much larger I make the emp table.

My two original query blocks are fairly easy to see in the plan but if we add the ‘alias’ formatting option to the call to dbms_xplan.display() we can produce an explicit report of the query block names optimised (and the query blocks associated with each object). Here’s what you would get with this example:

Query Block Name / Object Alias (identified by operation id):

-------------------------------------------------------------

1 - MAIN

2 - MAIN / EMP1@MAIN

3 - SUBQ

4 - SUBQ / EMP2@SUBQ

You will find that some lines in more complex plans won’t report a query block name, but in out simple example every line is labelled with the query block it belongs to. Also, every object is now reported using its alias (emp1, emp2) which has been qualified with the query block it originally appeared in. It was easy in my example to work out from the plan which appearance of emp went with which part of my original statement, but that’s not always the case, so the clear labelling here can be very helpful when you have to deal with complicated SQL.

Option 2: Unnesting.

Rather than leave our query in two pieces, the optimizer may decide to turn the subquery into a “table” (in fact an inline view) that could be moved into the main query to produce a join – and in many cases with newer versions of Oracle if such a move is legal it usually happens, which is why I had to put in a hint to stop this “unnesting” in my original query. Without the hint, this is the plan I got:

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 64 | 7 (15)| 00:00:01 |

|* 1 | HASH JOIN | | 1 | 64 | 7 (15)| 00:00:01 |

| 2 | VIEW | VW_SQ_1 | 3 | 78 | 4 (25)| 00:00:01 |

| 3 | HASH GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 |

| 4 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |

| 5 | TABLE ACCESS FULL | EMP | 14 | 532 | 3 (0)| 00:00:01 |

--------------------------------------------------------------------------------

It’s still easy to see our subquery here in the VIEW operator which references an object called VW_SQ_1 (one of several transformation-specific names that Oracle uses for views generated internally). The VW_ prefix appears very commonly, and the SQ bit appears fairly commonly for subquery unnesting. (For a list of many other generated view names see https://jonathanlewis.wordpress.com/2011/02/27/internal-views/)

In this case Oracle still has to optimize the subq query block separately, but this time it has optimized it without the correlation predicate and bind variable; in effect it has optimized the following:

select

/*+ qb_name (subq) */

avg(emp2.sal) avg_sal,

emp2.deptno item_1

from

test_user.emp emp2

group by

emp2.deptno

;

After which it has used its estimates of this query block’s cardinality and distinct values to move the view inline and treat it as if it were a table in the following two-“table” join:

select

/*+ qb_name (main) */

emp1.*

from (

select /*+ qb_name (subq) */

avg(emp2.sal) avgsal,

emp2.deptno item_1

from

test_user.emp emp2

group by emp2.deptno

) vw_sq_1,

test_user.emp emp1

where

emp1.deptno = vw_sq_1.item_1

and emp1.sal > vw_sq_1.avgsal

;

Once the optimizer starts transforming our queries our orginal query blocks start to disappear – and this can make it harder to sort out what’s gone wrong with an execution plan and what we can do to change it. Let’s take a look at the query block / alias section of this unnesting execution plan:

Query Block Name / Object Alias (identified by operation id):

-------------------------------------------------------------

1 - SEL$B4BE209F

2 - SEL$AA0D0E02 / VW_SQ_1@SEL$D6166863

3 - SEL$AA0D0E02

4 - SEL$AA0D0E02 / EMP2@SUBQ

5 - SEL$B4BE209F / EMP1@MAIN

None of the original query block names remain – although we can see still see that emp1 originally came from main and emp2 from subq. We now have three more query block names (which, I have to say, is one more than I would have expected).

As the subquery unnested and turned into an inline view it got a new query block name (based, I believe, on a hash function applied to the text “subq”), strangely when this inline view was folded inline in the main query it seemed to get another layer wrapped around it that produced another query block name (based on the hash function applied to the text “sel$d8166863”), and because the main query had been modified to include a second object a new query block name was generated (based on the hash function being applied to the two texts “main” and “sel$aa0d0e02”).

Being able to find these query block names can be very important when you decide it’s necessary to use hints to control the way that Oracle transforms a query – sometimes the query block you need to hint doesn’t exist until after some transformation has taken place, so you can’t put a simple hint into the base SQL, you have to use a globally qualified hint i.e. direct the hint to a specifically named query block . (See: https://jonathanlewis.wordpress.com/2007/06/25/qb_name/)

Although I’ve said the optimizer likes to turn complex queries into a simple list of tables with list of predicates our example so far is still in a state where we have to optimise two query blocks. We can see this fairly explicitly in the query block / alias section of the plan, but it becomes even clearer in the outline section of the plan (generated by adding ‘outline’ to the formatting options of dbms_xplan):

Outline Data

-------------

/*+

BEGIN_OUTLINE_DATA

USE_HASH_AGGREGATION(@"SEL$9F3C0132")

FULL(@"SEL$9F3C0132" "EMP2"@"SUBQ1")

USE_HASH(@"SEL$FB2CCE48" "EMP1"@"MAIN")

LEADING(@"SEL$FB2CCE48" "VW_SQ_1"@"SEL$B3936C35" "EMP1"@"MAIN")

FULL(@"SEL$FB2CCE48" "EMP1"@"MAIN")

NO_ACCESS(@"SEL$FB2CCE48" "VW_SQ_1"@"SEL$B3936C35")

OUTLINE(@"MAIN")

OUTLINE(@"SEL$B3936C35")

OUTLINE(@"SUBQ1")

UNNEST(@"SUBQ1")

OUTLINE_LEAF(@"SEL$FB2CCE48")

OUTLINE_LEAF(@"SEL$9F3C0132")

ALL_ROWS

DB_VERSION('11.2.0.4')

OPTIMIZER_FEATURES_ENABLE('11.2.0.4')

IGNORE_OPTIM_EMBEDDED_HINTS

END_OUTLINE_DATA

If you read through this outline you can see two lines using the hint outline_leaf(). An outline leaf identifies a query block that was finally selected by the optimizer as part of the optimum plan. You can also see various outline() hints – the original query blocks are there, of course, plus the intermediate query block SEL$B3936C35 that was used as a wrapper to the unnested subquery.

Option 2b: Unnesting and merging

Can we (or the optimizer) get closer to its ideal of a single query block with a simple set of joins ? The answer (at least in this case) is yes. The plan the optimizer gave us aggregated the emp table by department, then joined back to emp on deptno. In principle we could use “complex view merging” after the initial transformation to rewrite the query to join emp to itself before aggregating. Rather than rewriting it in our source SQL we could tell the optimizer to adopt that strategy by giving it a merge() hint – here’s how:

select

/*+ qb_name(main) merge(@SEL$AA0D0E02)*/

from emp emp1

etc …

I’ve directed this hint (note the “@” sign) to the query block sel$aa0d0e02 – the query block that we saw in the previous section labelling operations 2, 3, and 4 of the execution plan. I’ve told Oracle that that query block (the inline view) should be merged into the main body of the plan rather than being optimized and (effectively) run as an independent part of the whole query. Here’s the plan this gives me:

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 4 | 180 | 7 (15)| 00:00:01 |

|* 1 | FILTER | | | | | |

| 2 | HASH GROUP BY | | 4 | 180 | 7 (15)| 00:00:01 |

|* 3 | HASH JOIN | | 65 | 2925 | 6 (0)| 00:00:01 |

| 4 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |

| 5 | TABLE ACCESS FULL| EMP | 14 | 532 | 3 (0)| 00:00:01 |

-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):

-------------------------------------------------------------

1 - SEL$9E18A930

4 - SEL$9E18A930 / EMP2@SUBQ

5 - SEL$9E18A930 / EMP1@MAIN

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("EMP1"."SAL">AVG("EMP2"."SAL"))

3 - access("EMP2"."DEPTNO"="EMP1"."DEPTNO")

We’ve joined on deptno, aggregated on something, then applied the filter emp1.sal > avg(emp2.sal). The transformed query looks like this – exactly the type of thing that the optimizer loves to optimize:

select

emp1.*

from

emp emp2,

emp emp1

where

emp2.deptno = emp1.deptno

group by

emp2.deptno,

emp1.rowid, emp1.deptno, emp1.comm, emp1.sal, emp1.hiredate,

emp1.mgr, emp1.job, emp1.ename, emp1.empno

having

emp1.sal > avg(emp2.sal)

If we examine the optimizer trace file (event 10053) we would find that the optimizer had actually considered and rejected this path (just as it had considered and rejected the first filter subquery path). Checking back to the previous plan you’ll see that the latest plan appears to have the same cost as the previous one, so you might ask why the optimizer chose the one it did – in fact, to 4 decimal places, the cost of the previous plan was 7.0712 compared to 7.0735 for this plan, so it was a close call, but the previous plan was a little cheaper.

Summary

Oracle has a concept of the query block which, effectively, is the optimizer’s basic unit of optimisation, and takes the form of a simple join between tables. If we write a complex statement involving subqueries, inline aggregates, etc. we should recognise this concept and name each query block.

We’ve examined a statement consisting of 2 query blocks and noted that Oracle will attempt to transform the statement in various ways in an attempt to reduce the number of query blocks involved. Whatever transformation it chose, it had to optimise (do the optimisation arithmetic) for each query block individually, with some final arithmetic to model its strategy for connecting together the final set of query blocks.

Our first transformation in fact did nothing and left us with two unchanged query blocks that had to be connected through a FILTER operation – where the final arithmetic was remarkably similar to that of a nested loop join. Our second transformation also left us with two query blocks to optimise, but one of those query blocks was designed to generate an inline view that could be treated arithmetically as part of a simple table join – the transformations, however, lost our original query block names and introduced some new ones. The third transformation reduced the query to a single query block which is the ideal from the perspective of the optimizer, but it turned out to have a more costly execution plan than one of the previous options.

Because the optimizer makes mistakes and because the optimizer has to make some guesses about the selectivity of complex predicates, the final transformation picked by the optimizer may not be the best, so we need to be able to block a bad choice and redirect the optimizer to a good choice. We can do this with hints, but sometimes we need to direct a hint to a specific query block, and it may be a query block that doesn’t exist in our original query, so being aware of the option for seeing the generated query block names in the execution plans is a useful step in manually optimising complex SQL statements.

–> Catalogue of current articles in CBO series.

Basics of the Cost Based Optimizer – Part 4

Query Blocks

Option 1: Separate optimization.

Option 2: Unnesting.

Option 2b: Unnesting and merging

Summary

Subscribe for more articles

Rate this article

Jonathan Lewis

Related articles

Execution Plans Part 7: Query Blocks and Inline Views

Managing Query Store Hints

A Guide to INSERT, UPDATE, and DELETE Statements in Oracle

Tags