In the first part of this post I’ve provided a table that summarizes the Dynamic Sampling levels on session / system level. In the “Blocks Sampled” column there were numbers in brackets for the level 3 and 4 description. In the second part of this post I’ll explain why.
In the presence of statistics Dynamic Sampling behaves differently: For level 3 and 4 the number of blocks sampled will be halved. But what is more significant and also explains the odd result at the end of the “Introduction” post: In the presence of statistics by default Dynamic Sampling only evaluates the selectivity of predicates, but does not provide the basic table size and cardinality – these values will be trusted and taken from the existing statistics.
So if for whatever reason the underlying table statistics exist but are outdated or plain wrong, the resulting estimates based on Dynamic Sampling will be wrong, too, even if the selectivity estimate is improved, because the final cardinality is based on filter selectivity * base cardinality.
If you want to override this default behaviour you can do so by using a (half-way) documented / undocumented hint that is called DYNAMIC_SAMPLING_EST_CDN. It was mentioned in the 9.2 manuals but since then can no longer be found. It is a table level hint which means that it requires a valid alias as parameter and needs to be specified multiple times if it is supposed to be applied to more than a single table.
By doing so Dynamic Sampling will be used to override the table blocks and base cardinality information that otherwise would be taken from the existing statistics.
This is the modified test case that allows reproducing the odd result from the previous part of the series:
set echo on timing on time on drop table t; purge table t; -- Create a sample table -- with only ten rows create table t as select rownum as id , mod(rownum, 10) + 1 as attr1 , rpad('x', 100) as filler from dual connect by level <= 10 ; -- Deliberately gather statistics now exec dbms_stats.gather_table_stats(null, 't') -- Add another bunch of rows -- but don't update statistics insert /*+ append */ into t (id, attr1, filler) select rownum as id , mod(rownum, 10) + 1 as attr1 , rpad('x', 100) as filler from dual connect by level <= 1000000 - 10 ; commit; -- Use level 10 dynamic sampling to activate it in the presence of statistics -- and to deliberately sample the full data set alter session set optimizer_dynamic_sampling = 10; -- alter session set "_optimizer_dyn_smp_blks" = 16; -- Set a tracefile identifier for easier identification alter session set tracefile_identifier = 'dyn_sample2'; -- Enable the optimizer trace file in a 10g compatible way alter session set events '10053 trace name context forever, level 1'; -- This is the 11g way -- alter session set events 'trace[rdbms.SQL_Optimizer.*]'; -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN -- in addition to the actual execution -- set autotrace on -- Although we sampled all data -- and the results passed the checks -- the cardinality estimate is still way off select /* dynamic_sampling_est_cdn(t) */ count(*) as cnt from t where attr1 = 1 and id > 0 ; -- set autotrace off -- Disable the optimizer trace alter session set events '10053 trace name context off'; -- The 11g way of disabling it -- alter session set events 'trace[rdbms.SQL_Optimizer.*] off'; select * from table(dbms_xplan.display_cursor(null, null)); -- This tells you where the trace file is column tracefile format a200 select value || case when instr(dbms_utility.port_string, 'WIN_NT') > 0 then '\' else '/' end || ( select instance_name from v$instance ) || '_ora_' || ( select spid || case when traceid is not null then '_' || traceid else null end from v$process where addr = ( select paddr from v$session where sid = ( select sid from v$mystat where rownum = 1 ) ) ) || '.trc' as tracefile from v$parameter where name = 'user_dump_dest' ;
So this test case deliberately generates non-representative statistics. As you can see, by default the optimizer uses the cardinality and size information from the statistics, therefore even processing the full segment with Dynamic Sampling doesn’t produce a reasonable final cardinality estimate.
And the explanations from part one of this post also provide information on why the original query had to be modified for this test case. With statistics in place Dynamic Sampling will only be triggered if the query fulfils at least one of the criteria that correspond to level 3 or level 4, so it needs for example two predicates on the same table (level 4 definition).
If you change the commented DYNAMIC_SAMPLING_EST_CDN hint into a real hint you’ll notice that the cardinality estimates will be spot on, even with the wrong statistics in place, because the optimizer will use the Dynamic Sampling results instead of the existing statistics for the table size and base cardinality estimates.
Table Level DYNAMIC_SAMPLING
Let’s have a look at the table level Dynamic Sampling hint. This hint requires specifying two parameters: Instead of just the level as the DYNAMIC_SAMPLING cursor level hint the alias of the table to be sampled needs to be put as first parameter, for example:
Now the funny thing is, that this form of Dynamic Sampling behaves quite differently than the cursor level hint.
First of all it basically tells Oracle to perform Dynamic Sampling unconditionally on the specified table. Second the levels have now a different meaning. They simply determine the number of blocks to be sampled. There is a simple formula that determines the number of blocks sampled. It is:
2^(level - 1)*_optimizer_dyn_smp_blks (default: 32)
So for level 1 we end up with:
2^0*32 = 1*32 = 32
Level 4 for example means
2^3*32 = 8 * 32 = 256
and so on, up to level 9:
2^8*32 = 256 * 32 = 8192
Level 10 then again means 4 billion blocks (2^32) to sample. Notice how the number of blocks sampled differ from those used for the statement/cursor levels described in part one of this post.
The interesting point about this form of Dynamic Sampling is that although I’ve said that the sampling is performed unconditionally, the sanity checks are still in place. This means that the sampling will be always performed, but the results might still be rejected.
The simplest way to see this happening is to run a query against a table that has statistics gathered without any predicates against this table but request Dynamic Sampling on table level.
The resulting execution plan won’t mention that Dynamic Sampling has been used, but when looking into the optimizer trace file it becomes obvious that Dynamic Sampling was performed as requested by the table level hint, but the results have not been used:
** Performing dynamic sampling initial checks. ** ** Dynamic sampling initial checks returning TRUE (level = 2). *** 2012-02-06 18:43:16.405 ** Generated dynamic sampling query: query text : SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ 1 AS C1, 1 AS C2 FROM "T" SAMPLE BLOCK (0.396925 , 1) SEED (1) "T") SAMPLESUB *** 2012-02-06 18:43:16.452 ** Executed dynamic sampling query: level : 2 sample pct. : 0.396925 actual sample size : 4536 filtered sample card. : 4536 orig. card. : 10 block cnt. table stat. : 1 block cnt. for sampling: 15872 max. sample block cnt. : 64 sample block cnt. : 63 min. sel. est. : -1.00000000 ** Not using dynamic sampling for single table sel. or cardinality. DS Failed for : ----- Current SQL Statement for this session (sql_id=0xtga7j29q9x7) -----
The results weren’t used because statistics exist and no predicate is applied for which the selectivity could have been determined using Dynamic Sampling. Since by default only the selectivity is taken from Dynamic Sampling the result of the Dynamic Sampling will be ignored. Now one might argue that performing Dynamic Sampling in such a case is pointless in first place, but that is the way the code currently works.
If a predicate is applied then Dynamic Sampling results will be used even in the presence of statistics, because now the selectivity determined can be used.
Again this default behaviour can be modified by using the DYNAMIC_SAMPLING_EST_CDN hint. When specified the Dynamic Sampling results will always be used because the table size and base cardinality will be based on the Dynamic Sampling results no matter whether statistics exist or not.
11.2 And Parallel Execution
Since release 11gR2 (11.2) there is a new functionality: In case the default system/session level Dynamic Sampling level 2 is in use and a statement makes use of the Enterprise Edition Parallel Execution feature the Dynamic Sampling level might be auto-adjusted to higher levels between 4 and 8 (session / cursor level).
The rationale behind this new behaviour is the idea that Parallel Execution is expected to take some significant time to execute anyway hence the overhead of Dynamic Sampling at optimization time is hopefully negligible. The potentially increased precision of the cardinality estimates are assumed outweighing the additional activity performed at optimization time.
The auto-adjustment with Parallel Execution at present seems to be dependent on the segment size in blocks, starting at a size of 64,000 blocks (adjusts to level 4) and increasing up to 65,535,000 blocks (level 8). If multiple segments are Dynamic Sampling candidates, the largest segment dictates the auto-adjustment level.
If any non-default system/session optimizer level setting is used, or an explicit DYNAMIC_SAMPLING cursor/table level hint gets used, the auto-adjustment doesn’t take place and the specified level is obeyed.
Note: The recursive Dynamic Sampling query is always performed using serial execution, no matter what parallel degree is assigned to an object. This probably makes sense since it is assumed that only a rather limited amount of blocks will be sampled, currently up to 8192, of course except the level 10 setting
There are various bugs and edge cases that relate to Dynamic Sampling depending on the version and patch set used.
Here are two important points to consider:
- If user defined statistics (for example using DBMS_STATS.SET_TABLE_STATS or SET_COLUMN_STATS) have been defined, which can be seen from the USER_STATS column available in various dictionary views like USER/ALL/DBA_TAB_STATSISTICS, USER/ALL/DBA_TAB_COL_STATISTICS etc., then Dynamic Sampling is not supposed to override them, which makes kind of sense because the user defined statistics have been hopefully been set with a good reason in mind. Table-level user defined statistics in fact disable Dynamic Sampling completely for that table, no matter how Dynamic Sampling is triggered. The same applies to SQL Profiles by the way, because these are supposed to provide additional information about selectivities and cardinalities to the optimizer and therefore disable Dynamic Sampling, too. Note that SQL Profiles require an Enterprise Edition plus Diagnostic and Tuning Pack license
Note that there are bugs where Oracle doesn’t adhere to this rule, for example when using user-defined column statistics then in some versions Oracle still might override them with the Dynamic Sampling results, which is in particular relevant for joins as we will see in later parts of this series
- If statistics have been gathered on a table and these statistics tell Oracle that a table has 0 rows this effectively disables default Dynamic Sampling. If you think about what I’ve described so far then this makes sense and is the consequence of the behaviour witnessed: Since by default Dynamic Sampling doesn’t override the table size and base cardinality there is no point in determining any selectivity because the resulting cardinality will always be 0.By specifying the DYNAMIC_SAMPLING_EST_CDN hint the Dynamic Sampling will become effective because now the information from the statistics will be overridden.
Still I believe the side effects of this behaviour could be dire: One can easily end up with a volatile table that is empty when the default statistics job (from 10g on) runs during the maintenance windows and therefore gets a 0 rows statistics update, but is supposed to benefit from Dynamic Sampling at optimization time. This is the way the feature works at present, and as I said the behaviour is in line with what has been observed so far. One has to be aware of such a table and act accordingly, either by taking care of good statistics when necessary (if possible and applicable) or removing the table statistics and locking them to prevent them from being modified accidentally.
So in summary the following information is important to understand how to control Dynamic Sampling activity:
- There are two different forms of Dynamic Sampling invocations. One is defined on instance/session level (OPTIMIZER_DYNAMIC_SAMPLING) or cursor level (DYNAMIC_SAMPLING hint only specifying the level), the other one on table level (DYNAMIC_SAMPLING hint specifying both alias and level)
- The levels have different meanings for the two flavours. For the session / cursor level different rules identify when Dynamic Sampling will be triggered. For the table level Dynamic Sampling the level only determines the number of blocks to sample, sampling will always be performed (but the results not necessarily used) except for edge cases just outlined. The number of blocks to sample is different between the two forms – the same level can mean different number of blocks depending on which flavour of Dynamic Sampling gets used
- By default Dynamic Sampling is only used to determine the selectivity of predicates in the presence of statistics. Only if statistics are missing, the table size and the base cardinality will be determined. This applies to both flavours
- The DYNAMIC_SAMPLING_EST_CDN table level hint can be used to change this default and have Dynamic Sampling override existing statistics
In the next installment of this series I’ll have a look at some real-world scenarios and how good (or bad) Dynamic Sampling copes with them…