Dynamic Sampling (I), an Introduction – Part 1

Welcome to my first post on AllThingsOracle.com! This is the initial part of a series highlighting different aspects of “Dynamic Sampling”. So what is Dynamic Sampling? Probably most of you have already heard of it as a feature that deals with tables that don’t have any statistics gathered for – so it’s a feature of the Cost-Based Optimizer that has been

Welcome to my first post on AllThingsOracle.com! This is the initial part of a series highlighting different aspects of “Dynamic Sampling”.

So what is Dynamic Sampling? Probably most of you have already heard of it as a feature that deals with tables that don’t have any statistics gathered for – so it’s a feature of the Cost-Based Optimizer that has been introduced quite a while ago with the Oracle 9i release. However, and that is probably something that most are not aware of, Dynamic Sampling offers much more than just dealing with missing statistics as we’ll see in the course of the series.

Let’s have a look at what exactly Dynamic Sampling  is: Whenever Oracle needs to determine the execution plan of a given SQL statement it (“it” in this case is the Cost-Based Optimizer component of the Oracle database server software, sometimes abbreviated with CBO) might determine – depending on the settings – that it could be beneficial to look at the actual data while doing so.

So Dynamic Sampling actually executes a part of the given SQL statement while generating the execution plan, usually only against a smaller subset (or sample set, hence the name), and looks at the results. If the results are deemed to be helpful, depending on the settings these results will be used as part of the calculation that determines the execution plan.

That provides already some fundamentals about the feature:

  • It adds overhead to an optimization phase since Oracle recursively actually runs a SQL statement Note: An optimization is not necessarily only triggered by an application PARSE call but can also be triggered by other events like invalidations of Shared Pool contents for example caused by DDL activity or simply pressure on the Shared Pool memory that leads to flushing of information from the cache. Also new features like Adaptive Cursor Sharing introduced in 11g can deliberately cause a re-optimization at execution time
  • The exact behaviour depends on settings and the available statistics information, something that will be covered in more depth in upcoming parts of this series
  • There are some built-in sanity checks meaning that sometimes although Dynamic Sampling has been performed its results are rejected and will not be used by the optimizer
  • If the sanity checks are passed the results will be used by the Cost-Based Optimizer to determine the execution plan. Note that again several things can influence to what extend the information will be used

In general Dynamic Sampling can provide the following information to the CBO:

  • Size: Size of segments in number of database blocks
  • Cardinality: Total number of rows in a segment
  • Selectivity: A numeric expression between 0 and 1 representing how selective a particular filter predicate is (if there is any). 0 means no rows pass the predicate, 1 means all rows
  • Column Statistics: Number of Distinct Values (NDV), number of rows with nulls

Furthermore Dynamic Sampling can take advantage of existing indexes respectively you can improve the Dynamic Sampling results by supporting it with corresponding indexes.

If you compare the above with what information is provided by conventional statistics it becomes obvious that Dynamic Sampling doesn’t provide the same level of information. For example, column value histograms or the minimum and maximum column values are currently not determined by Dynamic Sampling which is something that the optimizer can take advantage of for instance in case of joins.

Note: It’s important to understand that the main point of Dynamic Sampling is not to replace conventional statistics that can be gathered using DBMS_STATS but to allow dealing with situations that cannot (yet) be properly modelled using statistics otherwise available (some of these can now be covered using extended statistics in 11g). So you could think of Dynamic Sampling as an additional feature that can be very helpful in some specific situations. Certainly you want to have a sound statistics management in place and only revert to Dynamic Sampling where appropriate. As I will show in later parts of the series Dynamic Sampling surprisingly might not be as beneficial as expected in situations where it seems to be most appropriate – for example when dealing with very volatile data.

If you want to see whether Dynamic Sampling has been used or not the first place to check is the execution plan. If you use DBMS_XPLAN to format the output which I highly recommend you’ll find at the bottom of the output the “Notes” section.

Note: It’s possible to use DBMS_XPLAN format options that suppress that “Notes” section. I recommend getting familiar with the various formatting options that are offered by DBMS_XPLAN. Unfortunately some of the most useful options for troubleshooting are not documented but you can find information about them. In particular the OUTLINE and ADVANCED formatting options are very valuable but undocumented to date.

In this “Notes” section you’ll find a “dynamic sampling used for this statement”, if at least for one of the objects involved in the query the statistics has been amended by Dynamic Sampling. In more recent releases the note will also include the level of Dynamic Sampling used, something like “dynamic sampling used for this statement (level=2)”. Again the level is something that will be covered in an upcoming post, but here I only want to mention that there are 11 levels from 0 to 10, where each level can mean different things depending on the context, but in general it can be assumed that level 0 disables Dynamic Sampling, whereas level 10 means that the whole segment will be read during Dynamic Sampling – a setting that very likely does only make sense in very specific circumstances.

If you use your own custom scripts to evaluate execution plan information you’ll find the corresponding information in the OTHER_XML column of the plan table where it is represented by a XML fragment like ‘<info type=”dynamic_sampling”>…</info>’. The OTHER_XML column is normally only populated for the operation ID = 1. Now this only tells you whether Dynamic Sampling results have been used by the optimizer or not, but it doesn’t provide any details which objects exactly were subject to Dynamic Sampling and to what extent. If you think you need to find out more details, then it’s time to revert to the optimizer trace file which will be written to one of the dump directories (UDUMP usually) defined on the server side. This trace file could be enabled and disabled in previous releases using the 10053 event, for example in its most common form like this:

Since Oracle 11g you can take advantage of the new diagnostic framework that supports enhanced events syntax to enable or disable the optimizer trace file along with more options that weren’t available previously:

You can find more details about the new diagnostic framework at Tanel Poder’s site and on the Optimizer Development team blog.

Note: An optimization of the statement is required in order to have the expected output written to the trace file. If you don’t use EXPLAIN PLAN but actually execute or parse a statement and an already cached execution plan can be re-used this will prevent the optimization phase and hence no details about Dynamic Sampling will be found in the trace file. EXPLAIN PLAN will always cause an optimization no matter if the same EXPLAIN PLAN is already available in the cache or not.

Since Oracle 11.2 you can even get the optimizer trace file without the need for an optimization by using the DBMS_SQLDIAG.DUMP_TRACE procedure which unfortunately didn’t make it into the official documentation although it seems to be officially supported. More information can be found on the Optimizer Development team blog and here. In the resulting trace file you’ll find details about the Dynamic Sampling activity including the exact recursive SQL(s) used to gather the information. You will also be able to understand which objects and whether multiple objects were subject to Dynamic Sampling by looking into the trace file.

Go to Dynamic Sampling (I), an Introduction – Part 2