Snapshots

While looking at the results from a query against the dynamic performance view v$active_session_history during a recent trouble-shooting assignment I noticed a result that was so unusual that I decided I needed to get some information from the v$ash_info view over an interval of about 3 minutes – and it didn’t matter which three minutes in particular, any three (consecutive)

While looking at the results from a query against the dynamic performance view v$active_session_history during a recent trouble-shooting assignment I noticed a result that was so unusual that I decided I needed to get some information from the v$ash_info view over an interval of about 3 minutes – and it didn’t matter which three minutes in particular, any three (consecutive) minutes would do.

The drawback, of course, is that v$ash_info is one of those views that shows a summary since instance startup, so I needed to take a couple of snapshots and find changes in values. Just to make the task a little more awkward, I didn’t have access to the system so I needed to send in a script for someone else to run.

Luckily this was 12c and there’s a feature in 12c that makes it easy to write code that takes timed deltas of any particular dataset so, as a first step, I wrote a little bit code that took 20 consecutive snapshots with a 10 seconds gap between snapshots; this is what it looked like:

There are a few details worth noting in this query but the important 12c one is the feature of creating a PL/SQL function in a “WITH” clause of an SQL statement. Historically if you wanted to create and use a PL/SQL function from SQL the function had to be a standalone object (possibly packaged) that you called from SQL. One of the drawbacks to this was that you could run into privilege problems. In the above example my function calls the dbms_lock package. I have received all my privileges from a role (called test_role) that has been granted execute on the dbms_lock package, and the SQL above works (note: the role test_role has also been granted select on v_$ash_info). Look what happens, though, if I try to create a standalone PL/SQL function to replace the “factored subquery” function:

I can’t use the dbms_lock package in a permanent object in my schema if I’ve only got the privilege to execute it granted through a role. If I connect as SYS and grant execute on dbms_lock to my schema (called test_user) I can reconnect as test_user and the create procedure call will succeed.

Inevitably there are pros and cons to embedding PL/SQL functions withing SQL so don’t take this note as a suggestion that you should always inline any PL/SQL functions you want to use – it just happens to be very convenient in this particular case.

The other details worth a little mention in this script are the hinting and the use of the slash character to execute the statement.

The basic reason the script does what I want is because the dynamic performance views don’t produce read-consistent results. My query against dual produces N rows, one every M seconds; if v$ashinfo were a normal Oracle table (or view) it would produce the same result for every one of those N rows if I used it as the second table in a nested loop join, but because it’s a dynamic performance view what I actually get is a different result each time I visit it. In the absence of the two hints Oracle could take an path that scanned v$ashinfo once and did a merge join with the N rows I get from dual, giving me N copies of the same values. Both the leading() hint and the use_nl() hint are critical to getting the results I want.

I’ve used the “/” to execute the query – a semi-colon doesn’t work. Technically a semi-colon is a separator, but historically SQL*Plus has used it as an alternative “run” command; once you introduce a “with function” clause – which will contain semi-colons for the PL/SQL component of the text – the alternative meaning of the semi-colon ceases to be used and you have to use the “/” to run the statement. If you change the “/” to a semi-colon you’ll find that SQL*Plus treats the next bit of text as if it were part of the statement (and you’ll probably see error “ORA-00933: SQL command not properly ended” as Oracle tries to interpret it).

Interestingly, but unsurprisingly, blank lines in the SQL are automatically acceptable once you’ve introduced the “with function” clause irrespective of the setting the SQL*Plus environment variable SQLBLANKLINES.

Deltas

My first cut of the script simply lists all the absolute values from the view. As a starting point that’s probably good enough for what I wanted. (In fact, if I had had open access to the system I wouldn’t have bothered with the script, I’d simply have run “select * from V$ash_info” a few times while spooling to a file – after setting a suitable linesize and a few column formats, of course.) From the viewpoint of future convenience, though, it would be nice to have a script that got the data but showed the change from snapshot to snapshot. I had to increase the complexity of the SQL a little bit by using one of the simplest analytic functions to achieve this then, after a little stress testing, I decided to add a few enhancements to the first version of the code so currently my code looks like this:

Again there are a couple of little details to note. For demonstration purposes I’ve left a few blank lines in the middle of the code just to demonstrate the principle. More importantly I’ve got a “with function” and a “with subquery” in the same statement – you’ll notice that the function bit ends with a semi-colon (though technically that should be “is separated from the next block by” rather than “ends with”. If I wanted another function I could start it’s definition immediately before the definition of base_data factored subquery. It’s a quirky little detail but multiple functions are separated by semi-colons while multiple subqueries are separated by commas.

You can also see a less well-known detail of the lag() analytic function: I’ve set the 2nd (i.e. offset) parameter explicitly to 1 (the default) in each case and the third parameter to the value I want Oracle to supply if the offset takes Oracle outside the current window. I’ve supplied zero for most of the columns and this conveniently means that the first row of output shows the absolute starting values, but I’ve repeated the original column name in two places so that the output for the first row shows zeros.

An important enhancement to the output from the query is the time at which each query against v$ash_info was actually executed. You’ll see why I did this when we review the results. I’ve changed my function to return systimestamp as at the moment the function was called then included both the time of the call and the change in time since the previous call in my output. After including this detail I had to materialize the subquery because Oracle seemed to be copying the function call into the select list rather than calling the function once per row and then re-using the result – which may be a change in behaviour that could affect some other pieces of SQL that I’ve written in the past.

Here’s an example of the output:

I was trying to overload the CPUs by running up a set of CPU intensive queries on a small instance running on a VM when I ran this report. This introduced two problems for my report; the first is that when the CPU is overloaded (by about 400% in my case) my session might find it hard to get onto the CPU in a timely fashion, so the time between lines of the report might be more than the 10 seconds I requested. The second problem is the same problem – but for mmnl, the process that constructs the ASH data; it’s supposed to take a snapshot once per second but it, too, might not get onto the CPU when it needs to.

Looking at the output above we can see that my session had less of a problem than mmnl. My snapshot time is drifting by a few hundredths of a second on each sleep; on the other hand we know that mmnl should be taking samples once per second so with our 10 second snapshot we should see the sample count going up by 10 and the sample time going up by 10 seconds on each snapshot – but when we see 10 samples we see the sample time going up by around 10.4 seconds, and there’s a “self-correcting” element in place that means we sometimes see only 9 samples taken in our 10 second window. 

On the plus side, even though the system is under some pressure due to excess CPU utilisation we can see that we have no “dropped samples”. The column name is a little misleading – it’s not a count of been dropped samples, it’s a count of the number active sessions for a sample that were omitted because the sampler couldn’t get a self-consistent image of that session. Ironically, when the CPU is overloaded it’s probably the sessions that can’t be captured that are the most interesting and important sessions.

Conclusion

I’ve demonstrated a method of using a new 12c feature to report change over time from a dynamic performance view that holds only information accumulated since instance startup. A key point of the new feature is that it allows a user with role-based privileges to do things that were previously only possible to a user with schema-based privileges.

The method is generic, but the specific example I chose also allowed me to make a couple of points about performance monitoring, effects of CPU overload, and the need for care in cross-checking figures.