Oracle’s optimizer has many strategies for acquiring the data you want with the minimum of work. Some of these strategies are really quite subtle and sophisticated – think of the Star Transformation, for example, which uses a two-phase approach to dimension tables; or the Index Join which does hash joins between indexes to avoid visiting tables.

Despite all the clever things that it can do the optimizer is still only a program, and human ingenuity can still find useful execution paths that are not currently coded into the optimizer. This article describes one such generic path.

The requirement.

I’m going to take a highly simplified order-processing query to demonstrate an important principle. Imagine I work for the HMV store and want to report all orders placed for classical CDs – which account for about 1 in every 1,000 of the items ordered. The query might look something like this:

select
	ord.*
from
	orders		ord,
	products		prd
where
	ord.date_placed   > sysdate - 1
and	prd.id            = ord.id_product
and	prd.product_group = 'CLASSICAL CD'
;

Conveniently there is an index on the orders table that starts with the date_placed column so we have an method of accessing recent orders fairly efficiently, and we get the following execution plan (you may recognize from the “doubled” nested loop operation that I’m using 11g for this example):

-----------------------------------------------------
| Id  | Operation                     | Name        |
-----------------------------------------------------
|   0 | SELECT STATEMENT              |             |
|   1 |  NESTED LOOPS                 |             |
|   2 |   NESTED LOOPS                |             |
|   3 |    TABLE ACCESS BY INDEX ROWID| ORDERS      |
|*  4 |     INDEX RANGE SCAN          | ORD_DAT_PRD |
|*  5 |    INDEX UNIQUE SCAN          | PRD_PK      |
|*  6 |   TABLE ACCESS BY INDEX ROWID | PRODUCTS    |
-----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ORD"."DATE_PLACED">SYSDATE@!-1)
   5 - access("ORD"."ID_PRODUCT"="PRD"."ID")
   6 - filter("PRD"."PRODUCT_GROUP"='CLASSICAL CD')

This plan seems to be as efficient as anything we could do; but there is a bit of a problem here – we’re after a fairly small fraction of the orders but we have to visit the orders table to read every single order for the last 24 hours before joining to the products table to identify the classical CDs.

Now, as you might guess from the name of the index I’ve used to get to the orders table (ord_dat_prd), the product ID which we use to join to the products table is already part of the index so, in principle, I could get to the products table without first visiting the orders table – but my select list includes all the columns from the orders table, which is why the optimizer will force me to visit the orders table before going to the products table. This query is actually one I met in a product system very recently – the number of orders per day was in the millions, the number of orders for the “interesting” product group was typically a couple of thousand. Oracle was picking a sub-optimal path (not catastrophically sub-optimal, but there was a fairly large amount of redundant work being done).

First Enhancement

I can make the query more efficient by making Oracle visit the orders table twice; or rather, I can write some SQL that looks as if it will visit the table twice but take advantage of the fact that an index may hold all the information you need to allow you to avoid visiting the table. Here’s my query:

select
	ord2.*
from	(
	select
		ord.rowid
	from
		orders		ord,
		products	prd
	where
		ord.date_placed  > sysdate - 1
	and	prd.id            = ord.id_product
	and	prd.product_group = 'CLASSICAL CD'
	) 		ordv,
	orders	ord2
where
	ord2.rowid = ordv.rowid
;

In an inline view I’ve requested the rowids of all the orders for the date range where the product group is classical CDs. Since all the columns I want (including the rowid) can be found in the ord_dat_prd index Oracle can produce this result set with a query that doesn’t visit the orders table – avoiding (for my client) a few million buffer gets. But once the inline view has produced exactly the rowids I need, I can then join back to the orders table to collect the small number of rows I really want. Here’s the resulting execution plan:

-----------------------------------------------------
| Id  | Operation                     | Name        |
-----------------------------------------------------
|   0 | SELECT STATEMENT              |             |
|   1 |  NESTED LOOPS                 |             |
|   2 |   NESTED LOOPS                |             |
|*  3 |    INDEX RANGE SCAN           | ORD_DAT_PRD |
|*  4 |    TABLE ACCESS BY INDEX ROWID| PRODUCTS    |
|*  5 |     INDEX UNIQUE SCAN         | PRD_PK      |
|   6 |   TABLE ACCESS BY USER ROWID  | ORDERS      |
-----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ORD"."DATE_PLACED">SYSDATE@!-1)
   4 - filter("PRD"."PRODUCT_GROUP"='CLASSICAL CD')
   5 - access("ORD"."ID_PRODUCT"="PRD"."ID")

At line 3 I do an index range scan of the ord_dat_prd index getting the product ids and rowids for all the orders in the relevant time period. The nested loop at line 2 is how I discard rowids for the products I don’t want – by joining on product id to the products table and returning only classical CDs. Finally I access the orders table by “user” rowid (rather than “index” rowid).

Of course, I have made my query more complex, and you should only take this type of approach if you decide that the performance benefit merits the increased complexity (hence future maintenance cost) of the query. In this case, for example, you might decide against making the change since all you are saving is some buffer gets and CPU (it’s likely, after all, that orders place in the last 24 hours will stay in the buffer cache for the next two or three days). In another case you may be able to determine that by postponing visits to the table, and then visiting only the blocks you really need, you are eliminating buffer gets that would have turned into a large number of disk reads.

Second Enhancement

There is a common mantra that if a table doesn’t appear in the select list it shouldn’t appear in the from clause. There’s a lot to be said for this idea – but be careful, it can lead to SQL that is a little hard to comprehend, performs badly, and gives the wrong result. Nevertheless we might want to look at the strategy in this case if the efficiency of the query is really important.

Although our first rewrite no longer visits the orders table unless we really want the row, we still have to visit the products table for every order in the date range. Of course, as a trivial efficiency aid we might make the products table an IOT (index organized table), or add the product_group to the index we use to support the primary key – but we still make the visit once for every relevant order. Can we reduce that workload ? The answer is “maybe”. Here’s another version of the query – which is logically  equivalent to the join version of the query only because of the uniqueness of products.id:

select
	ord2.*
from	(
	select
		ord.rowid
	from
		orders		ord
	where
		ord.date_placed  > sysdate - 1
	and	exists (
			select
				/*+ no_unnest */
				null
			from
				products	prd
			where
				prd.id            = ord.id_product
			and	prd.product_group = 'CLASSICAL CD'
		)
	) 		ordv,
	orders	ord2
where
	ord2.rowid = ordv.rowid
;

What I’ve done here is change the inline view to execute an existence subquery after picking an entry from the ord_dat_prd index. I’ve explicitly stated that I do not want Oracle to unnest this subquery because I want it to operate as a “filter” subquery with the following execution plan:

-----------------------------------------------------
| Id  | Operation                     | Name        |
-----------------------------------------------------
|   0 | SELECT STATEMENT              |             |
|   1 |  NESTED LOOPS                 |             |
|*  2 |   INDEX RANGE SCAN            | ORD_DAT_PRD |
|*  3 |    TABLE ACCESS BY INDEX ROWID| PRODUCTS    |
|*  4 |     INDEX UNIQUE SCAN         | PRD_PK      |
|   5 |   TABLE ACCESS BY USER ROWID  | ORDERS      |
-----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORD"."DATE_PLACED">SYSDATE@!-1)
       filter(EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "PRODUCTS" "PRD"
          WHERE "PRD"."ID"=:B1 AND "PRD"."PRODUCT_GROUP"='CLASSICAL CD'))
   3 - filter("PRD"."PRODUCT_GROUP"='CLASSICAL CD')
   4 - access("PRD"."ID"=:B1)

In this plan we see the same index range scan at line 2 – but the filter predicate on line 2 is a call to the subquery. If we are lucky (and if we don’t have many different products – which is unlikely in this case) we may find that our subquery benefits from “scalar subquery caching”, which means that under perfect conditions we will run the subquery just once per product sold in the date range, rather than once per row.

This version of the query is even harder to understand than the previous version, so it’s very important to think carefully before you do a rewrite of this kind – especially when there are more than two tables involved. Nevertheless once you adopt the idea that an index can be used as a “skinny table”, and then move on to the idea that you can postpone visiting a table until you have to, it becomes easier to find more efficient ways of avoiding redundant visits to data.

Footnote

In 11g Oracle Corp. introduced an interesting “doubled” nested loop join; all I’ve done in my first strategy is to separate the two steps of the loop to make the table access explicitly dependent on a “user rowid” rather than an “index rowid”. Before rushing off to rewrite lots of your SQL in this form, you might like to wait for Oracle 12 – you never know, the process I’ve just described might just be the next step in the evolution of the nested loop.

Acknowledgements

This article was first published (with some minor variations) in Select, the magazine of the IOUG, and is republished here with their permission.