24 January 2017

7841 views

6 0

Snapshots

While looking at the results from a query against the dynamic performance view v$active_session_history during a recent trouble-shooting assignment I noticed a result that was so unusual that I decided I needed to get some information from the v$ash_info view over an interval of about 3 minutes – and it didn’t matter which three minutes in particular, any three (consecutive)

While looking at the results from a query against the dynamic performance view v$active_session_history during a recent trouble-shooting assignment I noticed a result that was so unusual that I decided I needed to get some information from the v$ash_info view over an interval of about 3 minutes – and it didn’t matter which three minutes in particular, any three (consecutive) minutes would do.

The drawback, of course, is that v$ash_info is one of those views that shows a summary since instance startup, so I needed to take a couple of snapshots and find changes in values. Just to make the task a little more awkward, I didn’t have access to the system so I needed to send in a script for someone else to run.

Luckily this was 12c and there’s a feature in 12c that makes it easy to write code that takes timed deltas of any particular dataset so, as a first step, I wrote a little bit code that took 20 consecutive snapshots with a 10 seconds gap between snapshots; this is what it looked like:

rem

rem Script: ash_info.sql

rem Author: Jonathan Lewis

rem Dated: Jan 2017

rem

define m_snapshot_time = 10

define m_snapshot_count = 20

set linesize 180

set pagesize 60

set trimspool on

set arraysize 60

prompt ===========================================

prompt Reporting v$ash_info &m_snapshot_count times

prompt with a snapshot every &m_snapshot_time seconds

prompt Please wait

prompt ===========================================

with

function wait_row_n (

i_secs number,

i_return number default -1

) return number

begin

dbms_lock.sleep(i_secs);

return i_return;

end wait_row_n;

select

/*+

leading(looper aio)

use_nl(aio)

wait_row_n(&m_snapshot_time, looper.ct) sleeper,

aio.*

from

(

select

rownum ct

from

dual

connect by

level <= &m_snapshot_count

) looper,

v$ash_info aio

There are a few details worth noting in this query but the important 12c one is the feature of creating a PL/SQL function in a “WITH” clause of an SQL statement. Historically if you wanted to create and use a PL/SQL function from SQL the function had to be a standalone object (possibly packaged) that you called from SQL. One of the drawbacks to this was that you could run into privilege problems. In the above example my function calls the dbms_lock package. I have received all my privileges from a role (called test_role) that has been granted execute on the dbms_lock package, and the SQL above works (note: the role test_role has also been granted select on v_$ash_info). Look what happens, though, if I try to create a standalone PL/SQL function to replace the “factored subquery” function:

SQL> create or replace procedure my_sleep(n in number) as

2 begin

3 sys.dbms_lock.sleep(n);

4 end;

5 /

Warning: Procedure created with compilation errors.

SQL> show errors

Errors for PROCEDURE MY_SLEEP:

LINE/COL ERROR

-------- -----------------------------------------------------------------

3/9 PL/SQL: Statement ignored

3/9 PLS-00201: identifier 'SYS.DBMS_LOCK' must be declared

I can’t use the dbms_lock package in a permanent object in my schema if I’ve only got the privilege to execute it granted through a role. If I connect as SYS and grant execute on dbms_lock to my schema (called test_user) I can reconnect as test_user and the create procedure call will succeed.

Inevitably there are pros and cons to embedding PL/SQL functions withing SQL so don’t take this note as a suggestion that you should always inline any PL/SQL functions you want to use – it just happens to be very convenient in this particular case.

The other details worth a little mention in this script are the hinting and the use of the slash character to execute the statement.

The basic reason the script does what I want is because the dynamic performance views don’t produce read-consistent results. My query against dual produces N rows, one every M seconds; if v$ashinfo were a normal Oracle table (or view) it would produce the same result for every one of those N rows if I used it as the second table in a nested loop join, but because it’s a dynamic performance view what I actually get is a different result each time I visit it. In the absence of the two hints Oracle could take an path that scanned v$ashinfo once and did a merge join with the N rows I get from dual, giving me N copies of the same values. Both the leading() hint and the use_nl() hint are critical to getting the results I want.

I’ve used the “/” to execute the query – a semi-colon doesn’t work. Technically a semi-colon is a separator, but historically SQL*Plus has used it as an alternative “run” command; once you introduce a “with function” clause – which will contain semi-colons for the PL/SQL component of the text – the alternative meaning of the semi-colon ceases to be used and you have to use the “/” to run the statement. If you change the “/” to a semi-colon you’ll find that SQL*Plus treats the next bit of text as if it were part of the statement (and you’ll probably see error “ORA-00933: SQL command not properly ended” as Oracle tries to interpret it).

Interestingly, but unsurprisingly, blank lines in the SQL are automatically acceptable once you’ve introduced the “with function” clause irrespective of the setting the SQL*Plus environment variable SQLBLANKLINES.

Deltas

My first cut of the script simply lists all the absolute values from the view. As a starting point that’s probably good enough for what I wanted. (In fact, if I had had open access to the system I wouldn’t have bothered with the script, I’d simply have run “select * from V$ash_info” a few times while spooling to a file – after setting a suitable linesize and a few column formats, of course.) From the viewpoint of future convenience, though, it would be nice to have a script that got the data but showed the change from snapshot to snapshot. I had to increase the complexity of the SQL a little bit by using one of the simplest analytic functions to achieve this then, after a little stress testing, I decided to add a few enhancements to the first version of the code so currently my code looks like this:

column delta_seconds format 999.999

column snap_time format a19

column sample_time format a19

alter session set statistics_level = all;

with

function wait_row(

i_secs number,

i_counter number default -1

) return timestamp

m_time timestamp;

begin

m_time := systimestamp;

dbms_lock.sleep(i_secs);

return m_time;

end wait_row;

base_data as (

select

/*+

leading(looper aio)

use_nl(aio)

materialize

wait_row(&m_snapshot_time, looper.ct) timer,

aio.*

from

(

select

rownum ct

from

dual

connect by

level <= &m_snapshot_count

) looper,

v$ash_info aio

)

select

to_char(timer,'hh24:mi:ssxff') snap_time,

extract(second from

(timer - lag(timer,1,timer) over (order by timer))

) delta_seconds,

latest_sample_id sample_id,

latest_sample_id -

lag(latest_sample_id,1,latest_sample_id) over (order by timer) delta,

to_char(latest_sample_time,'hh24:mi:ssxff') sample_time,

extract(second from

(

latest_sample_time -

lag(latest_sample_time,1,latest_sample_time) over (order by timer)

)

) delta_seconds,

sample_count -

lag(sample_count,1,0) over (order by timer) samples,

dropped_sample_count -

lag(dropped_sample_count,1,0) over (order by timer) dropped_samples,

sampled_bytes -

lag(sampled_bytes,1,0) over (order by timer) sampled_bytes,

sampler_elapsed_time -

lag(sampler_elapsed_time,1,0) over (order by timer) sampling_musec

from

base_data

order by

Again there are a couple of little details to note. For demonstration purposes I’ve left a few blank lines in the middle of the code just to demonstrate the principle. More importantly I’ve got a “with function” and a “with subquery” in the same statement – you’ll notice that the function bit ends with a semi-colon (though technically that should be “is separated from the next block by” rather than “ends with”. If I wanted another function I could start it’s definition immediately before the definition of base_data factored subquery. It’s a quirky little detail but multiple functions are separated by semi-colons while multiple subqueries are separated by commas.

You can also see a less well-known detail of the lag() analytic function: I’ve set the 2nd (i.e. offset) parameter explicitly to 1 (the default) in each case and the third parameter to the value I want Oracle to supply if the offset takes Oracle outside the current window. I’ve supplied zero for most of the columns and this conveniently means that the first row of output shows the absolute starting values, but I’ve repeated the original column name in two places so that the output for the first row shows zeros.

An important enhancement to the output from the query is the time at which each query against v$ash_info was actually executed. You’ll see why I did this when we review the results. I’ve changed my function to return systimestamp as at the moment the function was called then included both the time of the call and the change in time since the previous call in my output. After including this detail I had to materialize the subquery because Oracle seemed to be copying the function call into the select list rather than calling the function once per row and then re-using the result – which may be a change in behaviour that could affect some other pieces of SQL that I’ve written in the past.

Here’s an example of the output:

===========================================

Reporting v$ash_info 20 times

with a snapshot every 10 seconds

Please wait

===========================================

SNAP_TIME DELTA_SECONDS SAMPLE_ID DELTA SAMPLE_TIME DELTA_SECONDS SAMPLES DROPPED_SAMPLES SAMPLED_BYTES SAMPLING_MUSEC

------------------- ------------- ---------- ---------- ------------------- ------------- ---------- --------------- ------------- --------------

12:14:58.556150000 .000 4745434 0 12:14:57.825000000 .000 16432 3 7016168 4963931

12:15:08.593213000 10.037 4745444 10 12:15:08.215000000 10.390 10 0 14024 1327

12:15:18.619130000 10.026 4745454 10 12:15:18.605000000 10.390 10 0 15712 1393

12:15:28.634358000 10.015 4745463 9 12:15:27.885000000 9.280 9 0 14368 1552

12:15:38.722004000 10.088 4745473 10 12:15:38.255000000 10.370 10 0 20240 1408

12:15:48.725934000 10.004 4745483 10 12:15:48.676000000 10.421 10 0 20280 1185

12:15:58.768118000 10.042 4745492 9 12:15:58.036000000 9.360 9 0 18280 653

12:16:08.774180000 10.006 4745502 10 12:16:08.436000000 10.400 10 0 20432 959

12:16:18.776470000 10.002 4745511 9 12:16:17.796000000 9.360 9 0 15208 954

12:16:28.824398000 10.048 4745521 10 12:16:28.196000000 10.400 10 0 20832 1229

12:16:38.828178000 10.004 4745531 10 12:16:38.596000000 10.400 10 0 16816 1021

12:16:48.833060000 10.005 4745540 9 12:16:47.956000000 9.360 9 0 16112 1179

12:16:58.880889000 10.048 4745550 10 12:16:58.346000000 10.390 10 0 17888 1274

12:17:08.884090000 10.003 4745560 10 12:17:08.736000000 10.390 10 0 17904 1085

12:17:18.888500000 10.004 4745569 9 12:17:18.096000000 9.360 9 0 16128 1207

12:17:28.938052000 10.050 4745579 10 12:17:28.496000000 10.400 10 0 16224 1294

12:17:38.940824000 10.003 4745588 9 12:17:37.896000000 9.400 9 0 10544 830

12:17:49.009051000 10.068 4745598 10 12:17:48.292000000 10.396 10 0 10584 1136

12:17:59.030062000 10.021 4745608 10 12:17:58.612000000 10.320 10 0 8888 1286

12:18:09.039469000 10.009 4745618 10 12:18:08.822000000 10.210 10 0 6976 1065

20 rows selected.

I was trying to overload the CPUs by running up a set of CPU intensive queries on a small instance running on a VM when I ran this report. This introduced two problems for my report; the first is that when the CPU is overloaded (by about 400% in my case) my session might find it hard to get onto the CPU in a timely fashion, so the time between lines of the report might be more than the 10 seconds I requested. The second problem is the same problem – but for mmnl, the process that constructs the ASH data; it’s supposed to take a snapshot once per second but it, too, might not get onto the CPU when it needs to.

Looking at the output above we can see that my session had less of a problem than mmnl. My snapshot time is drifting by a few hundredths of a second on each sleep; on the other hand we know that mmnl should be taking samples once per second so with our 10 second snapshot we should see the sample count going up by 10 and the sample time going up by 10 seconds on each snapshot – but when we see 10 samples we see the sample time going up by around 10.4 seconds, and there’s a “self-correcting” element in place that means we sometimes see only 9 samples taken in our 10 second window.

On the plus side, even though the system is under some pressure due to excess CPU utilisation we can see that we have no “dropped samples”. The column name is a little misleading – it’s not a count of been dropped samples, it’s a count of the number active sessions for a sample that were omitted because the sampler couldn’t get a self-consistent image of that session. Ironically, when the CPU is overloaded it’s probably the sessions that can’t be captured that are the most interesting and important sessions.

Conclusion

I’ve demonstrated a method of using a new 12c feature to report change over time from a dynamic performance view that holds only information accumulated since instance startup. A key point of the new feature is that it allows a user with role-based privileges to do things that were previously only possible to a user with schema-based privileges.

The method is generic, but the specific example I chose also allowed me to make a couple of points about performance monitoring, effects of CPU overload, and the need for care in cross-checking figures.

Snapshots

Deltas

Conclusion

Subscribe for more articles

Rate this article

Jonathan Lewis

Related articles

A Guide to INSERT, UPDATE, and DELETE Statements in Oracle