In part 3 of this series we used a simple hash join to introduce the a simple guideline for reading execution plans – the “first child first, recursive descent” method. This allowed us to work out the order in which Oracle produced rowsources and (implicitly) the order in which it visited the different physical objects in the query.

At the start of this series I emphasized the point that this rule doesn’t work in all cases and in part 5 we will be looking at some of the cases where we have to be more subtle; but for part 4 we’re going to stick with the simpler cases to look at some details of the timing and use of predicates as we apply the rule.

Basics

Here’s the basic execution plan for our two-table hash join again:

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    10 |   300 |    22   (0)| 00:00:01 |
|*  1 |  HASH JOIN                   |       |    10 |   300 |    22   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |    10 |   150 |    11   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |    10 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2    |    10 |   150 |    11   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I1 |    10 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   3 - access("T1"."N_1000"=1)
   5 - access("T2"."N_1000"=100)

By applying the “first child first” rule we decided that the run-time engine would start with an index range scan of t1_i1 at line 3, using the rowids it acquired there to select columns from t1 at line 2 and build an in-memory hash table at line 1 from that data; then it would do an index range scan of t2_i1 at line 5, using the rowids it acquired there to select columns from t2 and probe the in-memory hash table to find matches and construct a new rowsource at line 1 that would be passed upwards and onwards to the client program.

Thinking about this description we can appreciate that there is a difference in the way Oracle handles the two child operations to the hash join. The second child (access to t2) cannot start until the first child (access to t1) has completed – the hash join is an example of a “blocking” operation. Once the hash table has been built, however, Oracle can call the second child to return one row at a time, probe the hash table, and pass each surviving row in turn up to its parent operation – so there is a piecewise flow of data from that moment onwards.

I have occasionally seen the suggestion that because of this blocking operation the optimizer can’t do a hash join when using the first_rows(n) – or even the old first_rows – optimization mode. This isn’t true – if the optimizer thinks that it can build the hash table very quickly and find the first row from the second table very cheaply then a hash join may still turn out to be the lowest cost path it can find to return the first N rows.

We can compare the blocking effects of a hash join with a different join option by hinting the optimizer into a merge join – which gives us this plan:

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    10 |   300 |    24   (9)| 00:00:01 |
|   1 |  MERGE JOIN                   |       |    10 |   300 |    24   (9)| 00:00:01 |
|   2 |   SORT JOIN                   |       |    10 |   150 |    12   (9)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |    10 |   150 |    11   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |    10 |       |     1   (0)| 00:00:01 |
|*  5 |   SORT JOIN                   |       |    10 |   150 |    12   (9)| 00:00:01 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T2    |    10 |   150 |    11   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN          | T2_I1 |    10 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N_1000"=1)
   5 - access("T2"."ID"="T1"."ID")
       filter("T2"."ID"="T1"."ID")
   7 - access("T2"."N_1000"=100)

In this plan we see a merge join operation in line 1 which has two child operations, the sort join in line 2 and a second sort join in line 5. Applying “first child first” recursively we infer that Oracle starts with an index range scan on t1_i1, accesses data we might want from t1, and sorts it (on the id column since that’s the join column) at line 2. If we’re lucky, the sorted data set will fit in memory (in the session’s PGA) at line 1: the first child is a blocking operation – we can’t start the second child until that sort is complete.

Then we go to the second child and, following “first child first” again, we see that we do an index range scan of t2_i1, visit t2, and sort the resulting data set (again on the id column and again, we hope, into memory): the second child is also a blocking operation – the merge join itself can’t get started until the sort completes.

When the two sorted rowsources are available the merge join can walk through the first rowsource one row at a time, probing the second rowsource for matches and constructing result rows to pass upwards to its parent. Since the second rowsource is sorted the amount of work that Oracle has to do to find each set of matching rows is at worst o(log(N)) – where N is the number of rows in the second rowsource; Oracle can do a binary chop (using log2(N) checks) to find the first matching row and then scan sequentially onwards from there (though, in fact, the code is smarter than that and takes advantage of the fact that the probes are coming from data that is also sorted which means the code can reduce the work by “remembering” where it started from on the previous probe – see, for example this blog post of mine on merge joins.

In fact this strategy for a merge join gives us our very first case of an execution path that doesn’t quite describe exactly what’s going on, and it’s a case where calling on “rowsource execution statistics” gives us some interesting new information. Let’s run the query and use a slightly more more sophisticated call to the dbms_xplan package to see how often the different steps of the plan are called. (Note: the following output came from 11.2.0.4, but I’ve edited out the Buffers, A-time and “memory” columns.)

alter session set statistics_level = all;
set linesize 156
set trimspool on
set pagesize 60
set serveroutput off

select
        /*+
                leading(t1, t2)
                use_merge(t2)
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n_1000 = 1
and     t2.id     = t1.id
and     t2.n_1000 = 100
;

select * from table(dbms_xplan.display_cursor(null,null,'iostats last'));

--------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      0 |
|   1 |  MERGE JOIN                   |       |      1 |     10 |      0 |
|   2 |   SORT JOIN                   |       |      1 |     10 |     10 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |      1 |     10 |     10 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |      1 |     10 |     10 |
|*  5 |   SORT JOIN                   |       |     10 |     10 |      0 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     10 |     10 |
|*  7 |     INDEX RANGE SCAN          | T2_I1 |      1 |     10 |     10 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N_1000"=1)
   5 - access("T2"."ID"="T1"."ID")
       filter("T2"."ID"="T1"."ID")
   7 - access("T2"."N_1000"=100)

The thing to look at in this output is the Starts column – where the little detail that might raise a question is the number of starts of line 5: the second sort. Apparently Oracle sorted the data from the second table 10 times; but since the developers at Oracle are clearly rather clever people we can be fairly sure that that’s NOT what really happened: we need a better interpretation of the sort join operation and a clearer understanding of how it fits into the execution plan.

The second sort join operation really comprises two components – one component probes a sorted data set based on a supplied value, while the other really does sort a data set. Perhaps a more appropriate name for the operation should be “probe an in-memory sorted data set, but acquire it and sort it if it’s not already in memory”. It’s worth remembering that a line in an execution plan can include higher level logic of the form “if condition X is met do A else do B” – the ability of a sort join to do, or not do, sorting is just one example of this.

Checking the A-rows value for line 2 (the number of rows coming back from the first child) we can see that it’s 10 – which explains why Oracle called the second child 10 times: just like a nested loop the second child is called once for each row from the first child. We acquire and sort the entire data set once, then reuse the sorted data, ultimately probing it a total 10 times.

It’s at this point that we can make an initial comment about interpreting the Predicate Information. If you examine line 5 you will see that it uses both an “access” predicate and a “filter” predicate – and both predicates use exactly the same expression.

The difference between the two types of predicate is, loosely speaking, that an access predicate tells us how to “go and find” items (rows) of data, and a filter predicate tells us how to check– after we’ve done the work of finding it – whether it was the item (row) we really wanted.

In the case of the second sort join operation, the access predicate tells us how to find the position of the first matching row in the sorted data set, and the filter predicate tells us how to check each row as we walk along the sorted data set in order so that we can stop when eventually we get to a row that fails the filter predicate test.

As a very informal guideline – if you think you’re accessing all the tables in the right order using the right indexes, but still seem to be doing too much work while executing a query, it’s possible that you’re accessing too much data and then using a filter predicate to throw a lot of it away.

We can take our examination of the merge join and the business of blocking and timing a little further if we have suitable indexes in place, specifically indexes on the join columns. In this case (possibly through hinting, because the optimizer seems fairly resistant to using merge joins) our execution plan could look like this:

--------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      0 |
|   1 |  MERGE JOIN                   |       |      1 |     10 |      0 |
|*  2 |   TABLE ACCESS BY INDEX ROWID | T1    |      1 |     10 |     10 |
|   3 |    INDEX FULL SCAN            | T1_PK |      1 |  10000 |  10000 |
|*  4 |   SORT JOIN                   |       |     10 |     10 |      0 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     10 |     10 |
|   6 |     INDEX FULL SCAN           | T2_PK |      1 |  10000 |  10000 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N_1000"=1)
   4 - access("T2"."ID"="T1"."ID")
       filter("T2"."ID"="T1"."ID")
   5 - filter("T2"."N_1000"=100)

In this example the “PK” indexes are based on the id column, which means the two index full scan operations at lines 3 and 6 are accessing the data in exactly the order we need to avoid sorting. We can see from the plan that Oracle doesn’t sort the first (t1) data set it simple gets the rows in order, and uses a filter predicate at line 2 to eliminate all the rows it doesn’t need.

Looking at the A-rows in line 3 and comparing it with the A-rows in line 2 we can see that we have generated a rowsource of 10,000 items (rowids, basically) at line 3 and used the filter predicate in line 2 to discard all but 10 rows after visiting the table (10,000 times – oops!) to acquire the corresponding rows – potentially a hugely inefficient thing to do.

The same level of inefficiency is visible in lines 5 and 6 – we do an index full scan, returning 10,000 rowids in line 6, and then use a filter predicate after visiting table t2 in line 5, discarding 9,990 of the rows. The strangest thing about the treatment of table t2, though, is that we sort the resulting rowsource in line 4 – even though we should know that it is already in the correct sorted order for the merge join. The explanation for this apparently redundant sort is that it’s a convenient way of getting the data out of the buffer cache and into a private work area; it’s not really being done to re-arrange the data into the right order.

This plan shows another variation in the way that blocking operations can appear. The way we access the first table is not a blocking operations, it’s only as we access the second table that a blocking operation (the sort) appears. The pattern of execution in this plan is:

  1. Start the index full scan for t1, find the first eligible row for the join
  2. Do an index full scan for t2, find all eligible rows for the join, transfer them (through a redundant sort) into the private work area; probe the work area for the first matching row and return subsequent matches for the merge
  3. Fetch the second candidate row from t1 (stepping along the index, visiting the table and discarding as necessary)
  4. Probe the work area for the starting match for the second row etc.
  5. Repeat from (3)

We could move on to nested loops to see more examples of how the shape of the plan affects our interpretation of what Oracle is doing, and how the predicate timing can have a significant effect on how the work is done; however, there are several different execution plans for a simple nested loop, and we could easily spend too much time on the topic, so I’ll refer you to an item I wrote recently on nested loops through the ages and pick up a few details in passing in future articles in this series.

The key points to take away from this article are: timing is important; the operations in an execution plan don’t always describe exactly what’s going on; the predicate section is an important aid to understanding the work that the optimizer is expecting to happen, and the rowsource (run-time) execution statistics can be a very great aid in seeing what’s really happening.

 

Tags: ,