Indexes are expensive to maintain so we want to get the greatest benefit we can from them and make sure that Oracle uses them whenever it’s appropriate. Sometimes, though, the optimizer seems to ignore indexes that we think would be very effective and when this happens it’s often because the statistics that Oracle has gathered do not reflect the quality of the index. In this article we review the ways in which we can help the optimizer recognize the quality of the indexes that exist in our system. To keep things simple we’re going to restrict the discussion to B-tree indexes and avoid all reference to bitmap operations, index-organized tables and partitioning.

The Statistics

The key statistics about B-tree indexes are the number of rows (index entries), the number of distinct keys, the blevel (or height, which is equal to blevel + 1), the number of leaf blocks, and the clustering factor. Broadly speaking the first two statistics allow the optimizer to estimate the number of table rows the query is likely to visit, and the last three allow it to estimate the number of block visits it will have to make to access those rows. Both estimates can be subject to huge errors that result in the optimizer picking a very bad execution plan, so we need to know about methods we can use to give the optimizer the maximum amount of help without spending a huge amount of our time constantly fiddling with figures. Even when both estimates are sound we have to realize that “the number of blocks Oracle has to visit” is not the same thing as “the number of visits that Oracle has to make” – Oracle may have to visit the same block many times and fail to recognize that a cached (previously visited) block is likely to be cheaper to visit than a block it has not previously visited.

There are three basic mechanisms that can help us help Oracle – but all three require us to understand the data and put in a little effort to create a wrapper around the default mechanism that Oracle uses to collect index statistics. The three mechanisms are (a) setting the “table_cached_blocks” preference to help Oracle get a better estimate of the clustering_factor, (b) using the packaged procedure dbms_stats.set_index_stats() to bypass or enhance Oracle’s estimates, and (c) defining column group statistics to help the optimizer deal with partial use of indexes.

Table_Cached_Blocks

The clustering_factor is a measure of how randomly you have to jump around the table as you walk through the index in key order. If the optimizer has estimated that it has to collect just 50 rows from a table but that those rows require jumping to 50 separate locations in the table it may choose to do a tablescan, while for another table it may decide it has to collect 200 rows but estimate that it need only jump to 20 separate locations to find them – the (apparent) location of the data has as big an impact on the choice of index as the (estimated) quantity of data.

The problem with the clustering_factor is the way that Oracle calculates it. The code walks the index in key order (or key order per leaf_block when sampling) extracting the block address of each table row in turn and counting the number of times the block address changes during the walk. At one extreme the count could be as small as the number of blocks in the table (if the order of the rows in the table blocks is a good match for the order that index entries appear in the index), at the other extreme the count could be as large as the number of rows in the table (if each table row is in a different block from the row identified by the previous index entry).

Think carefully about the way I’ve stated that last requirement. If you have a table with just two blocks and 100 rows in it with a key generated by a sequence number then the clustering_factor could be as small as two (numbers 1 – 50 are in block 1, numbers 51 – 100 are in block 2) or it could be as big as 100 if, for some reason, two processes had been inserting rows into alternate blocks. In the first case Oracle would think an index on the sequential key was fantastic, in the second it would consider it to be terrible.

So why would two processes be inserting “consecutive” rows into different blocks? Because that’s what ASSM (automatic segment space management) is supposed to do to minimize “buffer busy waits” appearing for the table. I created a little demo of the principle, starting with a simple table and sequence:

create table t1 (n1 number, v1 varchar2(140));
create sequence s1 cache 10000;
lock table t1 in exclusive mode;

Then I’ve written a little SQL script that looks like this:

lock table t1 in share mode;
commit;

begin
	for i in 1..1000 loop
		insert into t1 values (s1.nextval, rpad(i,140));
		commit;
	end loop;
end;
/

I’ve then started this script from 4 different sessions. Of course they all end up waiting on a TM lock because the first session has an exclusive lock on the table – but when I commit from the first session they all acquire and release their share lock and start inserting as fast as they can. After the brief moment of frantic activity from the database I created an index on t1(n1), gathered stats on the table, then ran the following query:

select table_name, num_rows, blocks from user_tables
union all
select index_name, num_rows, clustering_factor from user_indexes
;

TABLE_NAME             NUM_ROWS     BLOCKS
-------------------- ---------- ----------
T1                         4000        124
T1_I1                      4000       2509

2 rows selected.

The table holds just 124 data blocks, but the optimizer thinks that it would have to do 2,509 block visits if you told it to collect every row from the table using an index full scan, as we can see (approximately) from the following autotrace output:

SQL> select /*+ index(t1) */ v1 from t1 where n1 >0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1429545322

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  4000 |   566K|  2523   (1)| 00:00:13 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |  4000 |   566K|  2523   (1)| 00:00:13 |
|*  2 |   INDEX RANGE SCAN          | T1_I1 |  4000 |       |    10   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1">0)

The cost of the query is a little over 2,509 because it includes the cost of walking down the index (user_indexes.blevel) and then along all the leaf blocks (user_indexes.leaf_blocks) and a small extra cost to allow for the CPU time needed.

We know that the data is actually very well clustered – with just a little bit of “local” scattering due to ASSM, but Oracle tends to interpret a little bit of scattering as a huge amount of scatter. Fortunately 11.2.0.4 has some code backported from 12c that allows us to tell Oracle to “remember” a little of the history of its walk through the index as it calculates the clustering_factor. We can set a table-level preference called “table_cached_blocks” to tell Oracle not to increment its block-change counter if the latest table block address matches one from the recent past. Setting the value to 16 (which would be a fairly sensible value to choose as the default) means Oracle will remember the previous 16 table block addresses it had recorded and only increment the counter if the latest block address doesn’t match any of them.

begin
	dbms_stats.set_table_prefs(user,'t1','table_cached_blocks',16);
	dbms_stats.gather_table_stats(user,'t1', cascade=>true);
end;
/

select index_name, num_rows, clustering_factor from user_indexes;

INDEX_NAME             NUM_ROWS CLUSTERING_FACTOR
-------------------- ---------- -----------------
T1_I1                      4000                86

1 row selected.

In this case the clustering_factor is even smaller than the number of blocks in the table – there are actually only 86 blocks that hold rows, the excess 38 previously reported by user_tables.blocks have been formatted (thanks to the side effects of ASSM and concurrency) but not yet used.

As a side note, when I set the table_cached_blocks to just 4 (because I knew there were only going to be 4 concurrent processes) the clustering_factor dropped to 122. Either figure would be suitable for giving Oracle a good enough idea of the quality of this index to give us the best possible chance that it would use the index when it was appropriate.

The maximum value allowed for table_cached_blocks is 255, the default is 1, and if you set it to zero Oracle will choose an “automatic” value – in my example it seemed to pick the value 2 which led to a clustering_factor of 244. My view is that the value should probably be the maximum of (16, expected degree of concurrent inserts, 16 * number of RAC nodes).

It’s worth mentioning that setting the value to something other than one does increase the CPU time spent calculating the clustering_factor – Oracle has to walk the array of previous block addresses all the time, and keep rolling the latest value into the array rather than just copying and checking against “the previous value”.

Set_Index_Stats

One of the drawbacks to the table_cached_blocks preference is that it is set at the table level so it is applied to all the indexes on the table: you may find that there are indexes where you want to set a different value – or even create the effect of a value larger than the limit of 255. In a similar vein there are likely to be cases where you want to adjust one of the other statistics of an index individually – or even create some stats rather than let Oracle gather them. One of the problems of gathering stats is that 11g does a wonderful job of gathering basic table stats using the auto_sample_size and approximate_ndv to examine 100% of the data very quickly but, for large indexes, then aims to sample about 1,140 leaf blocks to estimate the number of distinct keys (num_distinct), leaf_blocks and clustering_factor; this leaves scope for some very bad estimates.

If you run into a problem of very bad index stats you can always put some procedures in place to use a larger sample size on the indexes – but there are a couple of “count(distinct)” aggregations in the SQL that does the job and they can be very resource-intensive for large sample sizes (CPU, memory, and temporary tablespace). So you might simply decide to overwrite whatever Oracle has gathered with some statistics you think are more suitable. Oracle supplies the procedure dbms_stats.set_index_stats() to do this; so if Oracle has already gathered some index stats you can then run some code like the following:

declare
    m_numrows       number;
    m_numlblks      number;
    m_numdist       number;
    m_avglblk       number;
    m_avgdblk       number;
    m_clstfct       number;
    m_indlevel      number;
begin
    dbms_stats.get_index_stats(
        ownname     => 'INDEX_OWNER',
        indname     => 'INDEX_NAME',
        numrows     => m_numrows,
        numlblks    => m_numlblks,
        numdist     => m_numdist,
        avglblk     => m_avglblk,
        avgdblk     => m_avgdblk,
        clstfct     => m_clstfct,
        indlevel    => m_indlevel
    );
--
--  Something here to change a few variables
--  e.g. m_numrows := 1000000;
--       m_numdist := 20000;
--
    dbms_stats.set_index_stats(
        ownname     => 'INDEX_OWNER',
        indname     => 'INDEX_NAME',
        numrows     => m_numrows,
        numlblks    => m_numlblks,
        numdist     => m_numdist,
        avglblk     => m_avglblk,
        avgdblk     => m_avgdblk,
        clstfct     => m_clstfct,
        indlevel    => m_indlevel
    );
end;
/

For self-consistency reasons don’t forget to set avglblk to ceiling(numlblks/numdist), and avgdblk to ceiling(clstfct/numdist).

Column Groups

The final problem we often get with indexes (especially in projects to rationalize the choice of indexes and minimize the number) is poor cardinality estimates for partial use of multi-column indexes and the side effects of indexes and joins.

The optimizer knows (approximately) how many distinct keys there are in an index but if you supply a predicate referencing only the first few columns of an index the optimizer tends to fall back on the column stats and fudge factors. Similarly if you join two tables using predicates that are an exact match for indexes on the tables the optimizer can use the distinct_keys from the index to help it estimate join cardinalities – even if it doesn’t use the index. Drop an index, or add a column to the index, and the cardinality estimates may change, resulting in a change of plan.

Here’s a simplistic model to give you the flavour of the problem:

create table t1 as
select
        mod(rownum,100) id1,
        mod(rownum,100) id2,
        mod(rownum,17)  n1,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 10000
;

create index t1_i1 on t1(id1,id2);

execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly explain
select padding from t1 where id1 = 0 and id2 = 0;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100 | 10700 |    23   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 | 10700 |    23   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID1"=0 AND "ID2"=0)

From the index statistics the optimizer could see it was due to collect 100 very scattered rows from a small table, and sensibly chose a tablescan. Now imagine that we decide to create an index t1_i2 which extends the original index to (id1, id2, n1); naturally we drop the index t1_i1; we might have done this to make a critical query faster, we might have done it as part of a project to minimize the set of indexes on this table. Here’s the new execution plan:

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     6 |   642 |     8   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     6 |   642 |     8   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_I2 |     6 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID1"=0 AND "ID2"=0)

The optimizer has lost the information about the number of distinct combinations of id1 and id2 so (initially) has decided that the number of combinations is 100 * 100 = 10,000. But this is clearly rwrong because it also knows that the total number of distinct values in the new index is only 1,700, so it has used that value as the number of distinct combinations and predicted a cardinality of 6, and used the index.

The decision to use the index could be a bad idea, and the enormous underestimate could be even worse. So as we drop the index we create a column group representing the index that used to exist:

begin
        dbms_stats.gather_table_stats(
                ownname    => user, 
                tabname    => 't1',
                method_opt => 'for columns (id1, id2) size 1'
        );
end;
/

With this column group in place the optimizer gets back to the original cardinality estimate and full tablescan.

Any time you drop a multi-column index it might be a good idea to create column group stats on the columns that were used in the index definitions. Any time you “add a column” to an existing multi-column index, you probably ought to create column group stats for the previous definition. And any time you create any multi-column index you should consider creating a column group for any “leading edge” subsets of its columns if you know of important queries that might make partial use of the index.

Unfortunately there’s a basic limit of 20 column groups allowed per table. (Technically the higher of 20 and the total number of columns / 10, but not many table go over 200 columns), so you will probably get to the point where you need to make critical choices. If you do, bear in mind that (e.g.) two column groups of two columns each might be “good enough” to represent a single column group of 4 columns – you might be able to avoid creating some of the column groups you first think of.

Conclusion

Oracle gives us three significant mechanisms for manipulating statistics in a way that helps the optimizer pick a good execution plan from a minimal set of indexes. Although it can be an irritation to have to implement some of these mechanisms – especially finding a moment at which to inject calls to dbms_stats.set_index_stats() – the combination of good statistics with a minimal set of indexes is an important step to a stable and efficient system.

Footnote:

Eliminating “redundant” indexes is a really good idea in 12c where the time spent evaluating adaptive execution plans and acquiring dynamic statistics can escalate dramatically if the optimizer is given a large choice of similar indexes.

Tags: , ,