If you follow me on my blog you might wonder if I let you get away without an actual test case, and of course I won’t do so. Here is a simple test case that demonstrates Dynamic Sampling in action:

set echo on timing on time on
drop table t;
purge table t;
-- Create a sample table
create table t
as
select
        rownum as id
        , mod(rownum, 10) + 1 as attr1
        , rpad('x', 100) as filler
from
          dual
connect by
          level <= 1000000
;

-- Use the default value (from 10g on, in 9i it used to be 1)
alter session set optimizer_dynamic_sampling = 2;
-- Set a tracefile identifier for easier identification
alter session set tracefile_identifier = 'dyn_sample1';

-- Enable the optimizer trace file in a 10g compatible way
alter session set events '10053 trace name context forever, level 1';
-- This is the 11g way
-- alter session set events 'trace[rdbms.SQL_Optimizer.*]';

-- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN
-- in addition to the actual execution
set autotrace on

select
          count(*) as cnt
from
          t
where
          attr1 = 1
;

set autotrace off

-- Disable the optimizer trace
alter session set events '10053 trace name context off';
-- The 11g way of disabling it

-- alter session set events 'trace[rdbms.SQL_Optimizer.*] off';

-- This tells you where the trace file is
-- Note that from 11g on this query can be simplified to
-- select tracefile from v$process where addr = (select paddr from
v$session where sid = userenv('sid'))
column tracefile format a200

select
         value ||
         case
         when instr(dbms_utility.port_string, 'WIN_NT') > 0
         then '\'
         else '/'
         end ||
         (
           select instance_name from v$instance
         ) ||
         '_ora_' ||
             (
           select
                   spid ||
                   case
                   when traceid is not null
                   then '_' || traceid
                   else null
                   end
           from
                   v$process
           where
                   addr = (
                            select
                                   paddr
                            from
                                   v$session
                              where
                                   sid = (
                                            select
                                                    sid
                                            from
                                                    v$mystat
                                                where
                                                    rownum = 1
                                             )
                              )
            ) ||
        '.trc'
        as tracefile
from
        v$parameter
where
        name = 'user_dump_dest'
;

Note: Although for the sake of simplicity this test case uses the SQL*Plus AUTOTRACE feature to display the execution plan along with the actual execution of a SQL statement it should be mentioned that AUTOTRACE does have its quirks and is not always reliable, in particular when dealing with bind variables. Although it seems to be a handy feature for a quick comparison of SQL statement performance I strongly recommend using DBMS_XPLAN.DISPLAY_CURSOR (or Real-Time SQL Monitoring if the corresponding license is available) instead for such purposes as it offers a completely different level of insight for SQL statement performance measurement

Output snippet:

19:18:01 SQL> -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN
19:18:01 SQL> -- in addition to the actual execution
19:18:01 SQL> set autotrace on
19:18:01 SQL>
19:18:01 SQL> select
19:18:01   2         count(*) as cnt
19:18:01   3  from
19:18:01   4         t
19:18:01   5  where
19:18:01   6         attr1 = 1
19:18:01   7  ;
       CNT
----------
    100000
Elapsed: 00:00:01.88
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
-------------------------------------------------------------------------
| Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|  0 | SELECT STATEMENT   |      |    1 |    13 |   4324  (1)| 00:00:52 |
|  1 |  SORT AGGREGATE    |      |    1 |    13 |            |          |
|* 2 |   TABLE ACCESS FULL| T    |  108K|  1372K|   4324  (1)| 00:00:52 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ATTR1"=1)
Note
-----
 - dynamic sampling used for this statement

Looking at the relevant part of the output generated (here taken from a 10.2.0.4 version) it becomes obvious that:

  • We have a confirmation telling us that Dynamic Sampling results have been used for optimization
  • The dynamic sampling result is reasonable: 108,000 rows estimated vs. 100,000 actual rows although we didn’t gather any statistics

Note: For your entertainment you can repeat the experiment and set the
Dynamic Sampling level to 0 to see what the result looked like if no
Dynamic Sampling was used. In such cases the Cost-Based Optimizer reverts to hard-coded defaults that might be way off from the actual statistics.

It’s important to understand that one of the most critical components of the cost calculation is the cardinality of the different row sources and joins. In general you can assume: If the cardinality estimates are in the right ballpark the Cost-Based Optimizer most of the time will come up with a reasonable execution plan – or putting it the other way around: It’s very likely that the resulting execution plan will be suboptimal if the cardinality estimates are not in the right ballpark

Checking the resulting trace file and searching for “/* OPT_DYN_SAMP */” we can identify the following snippet that shows the Dynamic Sampling details:

  ----------------------------------------- 
  BEGIN Single Table Cardinality Estimation 
  ----------------------------------------- 
*** 2012-01-24 19:18:01.297 
** Performing dynamic sampling initial checks. ** 
  Column (#2): ATTR1(NUMBER) NO STATISTICS (using defaults) 
    AvgLen: 13.00 NDV: 40511 Nulls: 0 Density: 2.4685e-005 
** Dynamic sampling initial checks returning TRUE (level = 2). 
** Dynamic sampling updated table stats.: blocks=15871 
*** 2012-01-24 19:18:01.297 
** Generated dynamic sampling query: 
    query text : 
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE 
NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') 
NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) 
FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T") 
NO_PARALLEL_INDEX("T") */ 1 AS C1, CASE WHEN "T"."ATTR1"=1 THEN 1 ELSE 0 
END AS C2 FROM "T" SAMPLE BLOCK (0.396950 , 1) SEED (1) "T") SAMPLESUB 
*** 2012-01-24 19:18:01.453 
** Executed dynamic sampling query: 
    level : 2 
    sample pct. : 0.396950 
    actual sample size : 4284 
    filtered sample card. : 429 
    orig. card. : 1296344 block cnt. table stat. : 15871
    block cnt. for sampling: 15871
    max. sample block cnt. : 64
    sample block cnt. : 63
    min. sel. est. : 0.01000000
** Using dynamic sampling card. : 1079228
** Dynamic sampling updated table card.
** Using single table dynamic sel. est. : 0.10014006
   Table: T Alias: T
     Card: Original: 1079228 Rounded: 108074 Computed: 108073.95 Non Adjusted: 108073.95

I won’t go into too many details in this part of the series but you can see from these lines:

** Dynamic sampling updated table stats.: blocks=15871
** Using dynamic sampling card. : 1079228
** Using single table dynamic sel. est. : 0.10014006
    Card: Original: 1079228 Rounded: 108074 Computed: 108073.95 Non Adjusted: 108073.95

that Dynamic Sampling has been used to determine the size of the object in blocks (something that the optimizer had used anyway even without Dynamic Sampling by the way), the original cardinality of the segment before applying the filter predicate and the selectivity of the filter predicate to arrive at the final cardinality estimate of 108,074.

Note: This example was explicitly crafted to be Dynamic Sampling “friendly”, because we have a uniform data distribution across the table and the filter predicate applies to many rows (10 percent). We’ll see in one of the upcoming parts of the series what happens if the data distribution is more realistic and potentially less friendly to the optimizer.

I won’t leave you without the following odd result, which can be obtained by slightly modifying above test case:

 
17:41:02 SQL> -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN
17:41:02 SQL> -- in addition to the actual execution
17:41:02 SQL> set autotrace on
17:41:02 SQL>
17:41:02 SQL> -- Although we sampled all data
17:41:02 SQL> -- and the results passed the checks
17:41:02 SQL> -- the cardinality estimate is still way off
17:41:02 SQL> select
17:41:02   2             count(*) as cnt
17:41:02   3  from
17:41:02   4             t
17:41:02   5  where
17:41:02   6             attr1 = 1
17:41:02   7  and        id > 0
17:41:02   8  ;

       CNT
----------
 100000

Elapsed: 00:00:02.80

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
|  Id | Operation          | Name | Rows | Bytes | Cost (%CPU)| Time      |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    1 |     6 |       2 (0)|  00:00:01 |
|   1 |  SORT AGGREGATE    |      |    1 |     6 |            |           |
|*  2 |   TABLE ACCESS FULL| T    |   10 |    60 |       2 (0)|  00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ATTR1"=1 AND "ID">0)

Note
-----
 - dynamic sampling used for this statement (level=10)

Now isn’t that an astonishing result: Although Dynamic Sampling has been performed on the highest available level which can be told from the “Notes” section (as outlined above level 10 means that the whole segment has been read during sampling, and since the level is mentioned this also indicates that this time a 11g version was used) and the results haven’t been rejected (otherwise we wouldn’t see it mentioned in the notes section) the cardinality estimate of 10 shown in the plan is way off in comparison to the actual one (still 100,000 as before, the modified where clause didn’t change the outcome) – so although Dynamic Sampling should help the optimizer to come up with better estimates, it didn’t quite work out in this particular case.

Note: I don’t show you yet how I modified the test case. I only show you the result which also uses a slightly modified query. It will become obvious in a later part of the series why the query had to be modified to arrive at these odd results of Dynamic Sampling.

Clearly there is something going on I haven’t explained yet and more analysis is necessary. So, as with almost any feature you can see that it doesn’t always work as expected and therefore it is important to gain a better understanding how it works and why it might not produce the expected results.

To be continued…

Go to Dynamic Sampling (I), an Introduction – Part 1