Dynamic Sampling (I), an Introduction – Part 2

If you follow me on my blog you might wonder if I let you get away without an actual test case, and of course I won’t do so. Here is a simple test case that demonstrates Dynamic Sampling in action:

set echo on timing on time on

drop table t;

purge table t;

-- Create a sample table

create table t

select

rownum as id

, mod(rownum, 10) + 1 as attr1

, rpad('x', 100) as filler

from

dual

connect by

level <= 1000000

;

-- Use the default value (from 10g on, in 9i it used to be 1)

alter session set optimizer_dynamic_sampling = 2;

-- Set a tracefile identifier for easier identification

alter session set tracefile_identifier = 'dyn_sample1';

-- Enable the optimizer trace file in a 10g compatible way

alter session set events '10053 trace name context forever, level 1';

-- This is the 11g way

-- alter session set events 'trace[rdbms.SQL_Optimizer.*]';

-- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN

-- in addition to the actual execution

set autotrace on

select

count(*) as cnt

from

where

attr1 = 1

;

set autotrace off

-- Disable the optimizer trace

alter session set events '10053 trace name context off';

-- The 11g way of disabling it

-- alter session set events 'trace[rdbms.SQL_Optimizer.*] off';

-- This tells you where the trace file is

-- Note that from 11g on this query can be simplified to

-- select tracefile from v$process where addr = (select paddr from

v$session where sid = userenv('sid'))

column tracefile format a200

select

value ||

case

when instr(dbms_utility.port_string, 'WIN_NT') > 0

then '\'

else '/'

end ||

(

select instance_name from v$instance

) ||

'_ora_' ||

(

select

spid ||

case

when traceid is not null

then '_' || traceid

else null

end

from

v$process

where

addr = (

select

paddr

from

v$session

where

sid = (

select

sid

from

v$mystat

where

rownum = 1

)

) ||

'.trc'

as tracefile

from

v$parameter

where

name = 'user_dump_dest'

;

Note: Although for the sake of simplicity this test case uses the SQL*Plus AUTOTRACE feature to display the execution plan along with the actual execution of a SQL statement it should be mentioned that AUTOTRACE does have its quirks and is not always reliable, in particular when dealing with bind variables. Although it seems to be a handy feature for a quick comparison of SQL statement performance I strongly recommend using DBMS_XPLAN.DISPLAY_CURSOR (or Real-Time SQL Monitoring if the corresponding license is available) instead for such purposes as it offers a completely different level of insight for SQL statement performance measurement

Output snippet:

19:18:01 SQL> -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN

19:18:01 SQL> -- in addition to the actual execution

19:18:01 SQL> set autotrace on

19:18:01 SQL>

19:18:01 SQL> select

19:18:01 2 count(*) as cnt

19:18:01 3 from

19:18:01 4 t

19:18:01 5 where

19:18:01 6 attr1 = 1

19:18:01 7 ;

CNT

----------

100000

Elapsed: 00:00:01.88

Execution Plan

----------------------------------------------------------

Plan hash value: 2966233522

-------------------------------------------------------------------------

-------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 13 | 4324 (1)| 00:00:52 |

| 1 | SORT AGGREGATE | | 1 | 13 | | |

|* 2 | TABLE ACCESS FULL| T | 108K| 1372K| 4324 (1)| 00:00:52 |

-------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("ATTR1"=1)

Note

-----

- dynamic sampling used for this statement

Looking at the relevant part of the output generated (here taken from a 10.2.0.4 version) it becomes obvious that:

We have a confirmation telling us that Dynamic Sampling results have been used for optimization
The dynamic sampling result is reasonable: 108,000 rows estimated vs. 100,000 actual rows although we didn’t gather any statistics

Note: For your entertainment you can repeat the experiment and set the
Dynamic Sampling level to 0 to see what the result looked like if no
Dynamic Sampling was used. In such cases the Cost-Based Optimizer reverts to hard-coded defaults that might be way off from the actual statistics.

It’s important to understand that one of the most critical components of the cost calculation is the cardinality of the different row sources and joins. In general you can assume: If the cardinality estimates are in the right ballpark the Cost-Based Optimizer most of the time will come up with a reasonable execution plan – or putting it the other way around: It’s very likely that the resulting execution plan will be suboptimal if the cardinality estimates are not in the right ballpark

Checking the resulting trace file and searching for “/* OPT_DYN_SAMP */” we can identify the following snippet that shows the Dynamic Sampling details:

-----------------------------------------

BEGIN Single Table Cardinality Estimation

-----------------------------------------

*** 2012-01-24 19:18:01.297

** Performing dynamic sampling initial checks. **

Column (#2): ATTR1(NUMBER) NO STATISTICS (using defaults)

AvgLen: 13.00 NDV: 40511 Nulls: 0 Density: 2.4685e-005

** Dynamic sampling initial checks returning TRUE (level = 2).

** Dynamic sampling updated table stats.: blocks=15871

*** 2012-01-24 19:18:01.297

** Generated dynamic sampling query:

query text :

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE

NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false')

NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0)

FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T")

NO_PARALLEL_INDEX("T") */ 1 AS C1, CASE WHEN "T"."ATTR1"=1 THEN 1 ELSE 0

END AS C2 FROM "T" SAMPLE BLOCK (0.396950 , 1) SEED (1) "T") SAMPLESUB

*** 2012-01-24 19:18:01.453

** Executed dynamic sampling query:

level : 2

sample pct. : 0.396950

actual sample size : 4284

filtered sample card. : 429

orig. card. : 1296344 block cnt. table stat. : 15871

block cnt. for sampling: 15871

max. sample block cnt. : 64

sample block cnt. : 63

min. sel. est. : 0.01000000

** Using dynamic sampling card. : 1079228

** Dynamic sampling updated table card.

** Using single table dynamic sel. est. : 0.10014006

Table: T Alias: T

Card: Original: 1079228 Rounded: 108074 Computed: 108073.95 Non Adjusted: 108073.95

I won’t go into too many details in this part of the series but you can see from these lines:

** Dynamic sampling updated table stats.: blocks=15871

** Using dynamic sampling card. : 1079228

** Using single table dynamic sel. est. : 0.10014006

Card: Original: 1079228 Rounded: 108074 Computed: 108073.95 Non Adjusted: 108073.95

that Dynamic Sampling has been used to determine the size of the object in blocks (something that the optimizer had used anyway even without Dynamic Sampling by the way), the original cardinality of the segment before applying the filter predicate and the selectivity of the filter predicate to arrive at the final cardinality estimate of 108,074.

Note: This example was explicitly crafted to be Dynamic Sampling “friendly”, because we have a uniform data distribution across the table and the filter predicate applies to many rows (10 percent). We’ll see in one of the upcoming parts of the series what happens if the data distribution is more realistic and potentially less friendly to the optimizer.

I won’t leave you without the following odd result, which can be obtained by slightly modifying above test case:

17:41:02 SQL> -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN

17:41:02 SQL> -- in addition to the actual execution

17:41:02 SQL> set autotrace on

17:41:02 SQL>

17:41:02 SQL> -- Although we sampled all data

17:41:02 SQL> -- and the results passed the checks

17:41:02 SQL> -- the cardinality estimate is still way off

17:41:02 SQL> select

17:41:02 2 count(*) as cnt

17:41:02 3 from

17:41:02 4 t

17:41:02 5 where

17:41:02 6 attr1 = 1

17:41:02 7 and id > 0

17:41:02 8 ;

CNT

----------

100000

Elapsed: 00:00:02.80

Execution Plan

----------------------------------------------------------

Plan hash value: 2966233522

---------------------------------------------------------------------------

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 6 | 2 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 6 | | |

|* 2 | TABLE ACCESS FULL| T | 10 | 60 | 2 (0)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("ATTR1"=1 AND "ID">0)

Note

-----

- dynamic sampling used for this statement (level=10)

Now isn’t that an astonishing result: Although Dynamic Sampling has been performed on the highest available level which can be told from the “Notes” section (as outlined above level 10 means that the whole segment has been read during sampling, and since the level is mentioned this also indicates that this time a 11g version was used) and the results haven’t been rejected (otherwise we wouldn’t see it mentioned in the notes section) the cardinality estimate of 10 shown in the plan is way off in comparison to the actual one (still 100,000 as before, the modified where clause didn’t change the outcome) – so although Dynamic Sampling should help the optimizer to come up with better estimates, it didn’t quite work out in this particular case.

Note: I don’t show you yet how I modified the test case. I only show you the result which also uses a slightly modified query. It will become obvious in a later part of the series why the query had to be modified to arrive at these odd results of Dynamic Sampling.

Clearly there is something going on I haven’t explained yet and more analysis is necessary. So, as with almost any feature you can see that it doesn’t always work as expected and therefore it is important to gain a better understanding how it works and why it might not produce the expected results.

To be continued…

Go to Dynamic Sampling (I), an Introduction – Part 1

Register for Simple Talk

Dynamic Sampling (I), an Introduction – Part 2

About the author

Randolf Geist's contributions

Articles

Books

Top topics

Randolf Geist's latest contributions:

Parallel Execution Skew – Skew Caused By Outer Joins

Parallel Execution Skew – Addressing Skew Using Manual Rewrites

Parallel Execution Skew – 12c Hybrid Hash Distribution With Skew Detection

Recommended

About the author

Randolf Geist's contributions

Articles

Books

Top topics

Randolf Geist's latest contributions:

Parallel Execution Skew – Skew Caused By Outer Joins

Parallel Execution Skew – Addressing Skew Using Manual Rewrites

Parallel Execution Skew – 12c Hybrid Hash Distribution With Skew Detection

Cookies on red-gate.com