If you follow me on my blog you might wonder if I let you get away without an actual test case, and of course I won’t do so. Here is a simple test case that demonstrates Dynamic Sampling in action:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
set echo on timing on time on drop table t; purge table t; -- Create a sample table create table t as select rownum as id , mod(rownum, 10) + 1 as attr1 , rpad('x', 100) as filler from dual connect by level <= 1000000 ; -- Use the default value (from 10g on, in 9i it used to be 1) alter session set optimizer_dynamic_sampling = 2; -- Set a tracefile identifier for easier identification alter session set tracefile_identifier = 'dyn_sample1'; -- Enable the optimizer trace file in a 10g compatible way alter session set events '10053 trace name context forever, level 1'; -- This is the 11g way -- alter session set events 'trace[rdbms.SQL_Optimizer.*]'; -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN -- in addition to the actual execution set autotrace on select count(*) as cnt from t where attr1 = 1 ; set autotrace off -- Disable the optimizer trace alter session set events '10053 trace name context off'; -- The 11g way of disabling it -- alter session set events 'trace[rdbms.SQL_Optimizer.*] off'; -- This tells you where the trace file is -- Note that from 11g on this query can be simplified to -- select tracefile from v$process where addr = (select paddr from v$session where sid = userenv('sid')) column tracefile format a200 select value || case when instr(dbms_utility.port_string, 'WIN_NT') > 0 then '\' else '/' end || ( select instance_name from v$instance ) || '_ora_' || ( select spid || case when traceid is not null then '_' || traceid else null end from v$process where addr = ( select paddr from v$session where sid = ( select sid from v$mystat where rownum = 1 ) ) ) || '.trc' as tracefile from v$parameter where name = 'user_dump_dest' ; |
Note: Although for the sake of simplicity this test case uses the SQL*Plus AUTOTRACE feature to display the execution plan along with the actual execution of a SQL statement it should be mentioned that AUTOTRACE does have its quirks and is not always reliable, in particular when dealing with bind variables. Although it seems to be a handy feature for a quick comparison of SQL statement performance I strongly recommend using DBMS_XPLAN.DISPLAY_CURSOR (or Real-Time SQL Monitoring if the corresponding license is available) instead for such purposes as it offers a completely different level of insight for SQL statement performance measurement
Output snippet:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
19:18:01 SQL> -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN 19:18:01 SQL> -- in addition to the actual execution 19:18:01 SQL> set autotrace on 19:18:01 SQL> 19:18:01 SQL> select 19:18:01 2 count(*) as cnt 19:18:01 3 from 19:18:01 4 t 19:18:01 5 where 19:18:01 6 attr1 = 1 19:18:01 7 ; CNT ---------- <span style="color: #ff0000;">100000</span> Elapsed: 00:00:01.88 Execution Plan ---------------------------------------------------------- Plan hash value: 2966233522 ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 4324 (1)| 00:00:52 | | 1 | SORT AGGREGATE | | 1 | 13 | | | |* 2 | TABLE ACCESS FULL| T | <span style="color: #ff0000;">108K</span>| 1372K| 4324 (1)| 00:00:52 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ATTR1"=1) Note ----- <span style="color: #ff0000;"> - dynamic sampling used for this statement</span> |
Looking at the relevant part of the output generated (here taken from a 10.2.0.4 version) it becomes obvious that:
- We have a confirmation telling us that Dynamic Sampling results have been used for optimization
- The dynamic sampling result is reasonable: 108,000 rows estimated vs. 100,000 actual rows although we didn’t gather any statistics
Note: For your entertainment you can repeat the experiment and set the
Dynamic Sampling level to 0 to see what the result looked like if no
Dynamic Sampling was used. In such cases the Cost-Based Optimizer reverts to hard-coded defaults that might be way off from the actual statistics.It’s important to understand that one of the most critical components of the cost calculation is the cardinality of the different row sources and joins. In general you can assume: If the cardinality estimates are in the right ballpark the Cost-Based Optimizer most of the time will come up with a reasonable execution plan – or putting it the other way around: It’s very likely that the resulting execution plan will be suboptimal if the cardinality estimates are not in the right ballpark
Checking the resulting trace file and searching for “/* OPT_DYN_SAMP */” we can identify the following snippet that shows the Dynamic Sampling details:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
----------------------------------------- BEGIN Single Table Cardinality Estimation ----------------------------------------- *** 2012-01-24 19:18:01.297 ** Performing dynamic sampling initial checks. ** Column (#2): ATTR1(NUMBER) NO STATISTICS (using defaults) AvgLen: 13.00 NDV: 40511 Nulls: 0 Density: 2.4685e-005 ** Dynamic sampling initial checks returning TRUE (level = 2). ** Dynamic sampling updated table stats.: blocks=15871 *** 2012-01-24 19:18:01.297 ** Generated dynamic sampling query: query text : SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ 1 AS C1, CASE WHEN "T"."ATTR1"=1 THEN 1 ELSE 0 END AS C2 FROM "T" SAMPLE BLOCK (0.396950 , 1) SEED (1) "T") SAMPLESUB *** 2012-01-24 19:18:01.453 ** Executed dynamic sampling query: level : 2 sample pct. : 0.396950 actual sample size : 4284 filtered sample card. : 429 orig. card. : 1296344 block cnt. table stat. : 15871 block cnt. for sampling: 15871 max. sample block cnt. : 64 sample block cnt. : 63 min. sel. est. : 0.01000000 ** Using dynamic sampling card. : 1079228 ** Dynamic sampling updated table card. ** Using single table dynamic sel. est. : 0.10014006 Table: T Alias: T Card: Original: 1079228 Rounded: 108074 Computed: 108073.95 Non Adjusted: 108073.95 |
I won’t go into too many details in this part of the series but you can see from these lines:
1 2 3 4 |
** Dynamic sampling updated table stats.: blocks=15871 ** Using dynamic sampling card. : 1079228 ** Using single table dynamic sel. est. : 0.10014006 Card: Original: 1079228 Rounded: 108074 Computed: 108073.95 Non Adjusted: 108073.95 |
that Dynamic Sampling has been used to determine the size of the object in blocks (something that the optimizer had used anyway even without Dynamic Sampling by the way), the original cardinality of the segment before applying the filter predicate and the selectivity of the filter predicate to arrive at the final cardinality estimate of 108,074.
Note: This example was explicitly crafted to be Dynamic Sampling “friendly”, because we have a uniform data distribution across the table and the filter predicate applies to many rows (10 percent). We’ll see in one of the upcoming parts of the series what happens if the data distribution is more realistic and potentially less friendly to the optimizer.
I won’t leave you without the following odd result, which can be obtained by slightly modifying above test case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
17:41:02 SQL> -- Enable SQL*Plus AUTOTRACE to get an EXPLAIN PLAN 17:41:02 SQL> -- in addition to the actual execution 17:41:02 SQL> set autotrace on 17:41:02 SQL> 17:41:02 SQL> -- Although we sampled all data 17:41:02 SQL> -- and the results passed the checks 17:41:02 SQL> -- the cardinality estimate is still way off 17:41:02 SQL> select 17:41:02 2 count(*) as cnt 17:41:02 3 from 17:41:02 4 t 17:41:02 5 where 17:41:02 6 attr1 = 1 17:41:02 7 and id > 0 17:41:02 8 ; CNT ---------- <span style="color: #ff0000;">100000</span> Elapsed: 00:00:02.80 Execution Plan ---------------------------------------------------------- Plan hash value: 2966233522 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 2 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 6 | | | |* 2 | TABLE ACCESS FULL| T | 10 | 60 | 2 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ATTR1"=1 AND "ID">0) Note ----- <span style="color: #ff0000;">- dynamic sampling used for this statement (level=10)</span> |
Now isn’t that an astonishing result: Although Dynamic Sampling has been performed on the highest available level which can be told from the “Notes” section (as outlined above level 10 means that the whole segment has been read during sampling, and since the level is mentioned this also indicates that this time a 11g version was used) and the results haven’t been rejected (otherwise we wouldn’t see it mentioned in the notes section) the cardinality estimate of 10 shown in the plan is way off in comparison to the actual one (still 100,000 as before, the modified where clause didn’t change the outcome) – so although Dynamic Sampling should help the optimizer to come up with better estimates, it didn’t quite work out in this particular case.
Note: I don’t show you yet how I modified the test case. I only show you the result which also uses a slightly modified query. It will become obvious in a later part of the series why the query had to be modified to arrive at these odd results of Dynamic Sampling.
Clearly there is something going on I haven’t explained yet and more analysis is necessary. So, as with almost any feature you can see that it doesn’t always work as expected and therefore it is important to gain a better understanding how it works and why it might not produce the expected results.
To be continued…