Since Parallel Execution Skew is a common problem that can severely limit the scalability of Parallel Execution (see my previous article for a demonstration), Oracle has introduced a new feature in 12c that can address the problem automatically. That’s great, but unfortunately – at least in the initial 12c release – it only applies to a limited number of scenarios, so we still need to look at ways how to address the problem manually. Nevertheless it’s a great feature and a good starting point, maybe in future releases Parallel Execution Skew will be addressed automatically by the database in all the most common scenarios. Until that happens, we’ll have to resort to workarounds.

For the feature to be triggered automatically Oracle needs a histogram on the join expression. Oracle will use the histogram information to check for popular values – if the histogram shows that such popular values exist a special code path in the optimizer will be triggered that runs a recursive query on the table during optimization to generate a list of such popular values. These values will actually be hard-coded into the cursor information, and the HYBRID HASH distribution of “the other” row source at run time will check the data to distribute against this list of popular values in the cursor. If there is a match the value will be selectively distributed via BROADCAST to all receiving Parallel Execution Servers (all other values will still use the HASH distribution), and for the table with the skewed, popular values data that matches won’t be distributed via HASH, but simply in a random / round-robin fashion.

This way, any data having the popular values will be evenly distributed among the receiving Parallel Execution Servers, and due to the BROADCAST of the popular values in the other row source the join is still guaranteed to produce correct results, as the join will find a match no matter which worker process gets the values assigned.

Note: If you’re looking for more details how this feature works, see this accompanying post on my blog.

Let’s see that using our previous example. For the feature to be triggered we need to add a histogram on the corresponding column:

exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1 for columns fk_id_skew size 254', no_invalidate=>false)

and re-execute the parallel variant of our query:

select count(t_2_filler) from (
select  /*+ monitor
            leading(t_1 t_2)
            use_hash(t_2)
            no_swap_join_inputs(t_2)
            pq_distribute(t_2 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , t_2.filler as t_2_filler
from    t_1
      , t_2
where
        t_2.fk_id_skew = t_1.id
and     regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

On my test system the query is back to 15 seconds, so that’s a great improvement compared to the 45 seconds it took without the new feature.

We can see the good distribution confirmed by looking at the output of V$PQ_TQSTAT again. By looking at the computed sub-totals we can also see confirmed that Oracle has actually selectively distributed the popular value for the “other” row source T_1, as indicated by 2000003 rows being produced / consumed rather than the actual 2000000 rows that the row source consists of – so the popular value “1″ was broadcast to all four worker processes:

break on dfo_number nodup on tq_id nodup on server_type skip 1 nodup on instance nodup

compute sum label Total of num_rows on server_type

select
        /*dfo_number
      , */tq_id
      , cast(server_type as varchar2(10)) as server_type
      , instance
      , cast(process as varchar2(8)) as process
      , num_rows
      , round(ratio_to_report(num_rows) over (partition by dfo_number, tq_id, server_type) * 100) as "%"
      , cast(rpad('#', round(num_rows * 10 / nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type), 0)), '#') as varchar2(10)) as graph
      , round(bytes / 1024 / 1024) as mb
      , round(bytes / nullif(num_rows, 0)) as "bytes/row"
from
        v$pq_tqstat
order by
        dfo_number
      , tq_id
      , server_type desc
      , instance
      , process
;

     TQ_ID SERVER_TYP   INSTANCE PROCESS    NUM_ROWS          % GRAPH              MB  bytes/row
---------- ---------- ---------- -------- ---------- ---------- ---------- ---------- ----------
         0 Producer            1 P004         493866         25 ##########         51        109
                                 P005         489961         24 ##########         51        109
                                 P006         508088         25 ##########         53        109
                                 P007         508088         25 ##########         53        109
           ********** **********          ----------
           Total                             2000003

           Consumer            1 P000         499615         25 ##########         52        109
                                 P001         500735         25 ##########         52        109
                                 P002         499522         25 ##########         52        109
                                 P003         500131         25 ##########         52        109
           ********** **********          ----------
           Total                             2000003

         1 Producer            1 P004         482877         24 #########          49        107
                                 P005         497862         25 ##########         51        106
                                 P006         509934         25 ##########         52        107
                                 P007         509327         25 ##########         52        106
           ********** **********          ----------
           Total                             2000000

           Consumer            1 P000         500094         25 ##########         51        107
                                 P001         499886         25 ##########         51        107
                                 P002         499484         25 ##########         51        107
                                 P003         500536         25 ##########         51        107
           ********** **********          ----------
           Total                             2000000

         2 Producer            1 P000              1         25 ##########          0         36
                                 P001              1         25 ##########          0         36
                                 P002              1         25 ##########          0         36
                                 P003              1         25 ##########          0         36
           ********** **********          ----------
           Total                                   4

           Consumer            1 QC                4        100 ##########          0         36
           ********** **********          ----------
           Total                                   4

Here is the corresponding execution plan from 12c:

----------------------------------------------------------------------------------
| Id  | Operation                        | Name     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |          |        |      |            |
|   1 |  SORT AGGREGATE                  |          |        |      |            |
|   2 |   PX COORDINATOR                 |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10002 |  Q1,02 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE               |          |  Q1,02 | PCWP |            |
|*  5 |      HASH JOIN                   |          |  Q1,02 | PCWP |            |
|   6 |       PX RECEIVE                 |          |  Q1,02 | PCWP |            |
|   7 |        PX SEND HYBRID HASH       | :TQ10000 |  Q1,00 | P->P | HYBRID HASH|
|   8 |         STATISTICS COLLECTOR     |          |  Q1,00 | PCWC |            |
|   9 |          PX BLOCK ITERATOR       |          |  Q1,00 | PCWC |            |
|* 10 |           TABLE ACCESS FULL      | T_1      |  Q1,00 | PCWP |            |
|  11 |       PX RECEIVE                 |          |  Q1,02 | PCWP |            |
|  12 |        PX SEND HYBRID HASH (SKEW)| :TQ10001 |  Q1,01 | P->P | HYBRID HASH|
|  13 |         PX BLOCK ITERATOR        |          |  Q1,01 | PCWC |            |
|* 14 |          TABLE ACCESS FULL       | T_2      |  Q1,01 | PCWP |            |
----------------------------------------------------------------------------------

The new feature is indicated in operation ID = 12 by the keyword “(SKEW)” in the “PX SEND HYBRID HASH (SKEW)” operation. By the way, I believe that the “PQ Distrib” column is supposed to show the same “HYBRID HASH (SKEW)” distribution but is simply truncated in the output. According to my description above in my opinion the other PX SEND HYBRID HASH operation (the distribution of T_1) should also be flagged with “(SKEW)” because it doesn’t do a simple hash distribution, but does a mixed BROADCAST/HASH distribution.

But as pointed out before, the SKEW aware distribution feature at present does only work in certain scenarios. For example, if you make the join an outer join, or if you replace the full table scan on T_2 with a simple unmerged view, or make T_2 the build row source rather than then probe row source, then the feature isn’t triggered:

Outer Join example:

select count(t_1_filler) from (
select  /*+ monitor 
            leading(t_1 t_2)
            use_hash(t_2)
            no_swap_join_inputs(t_2)
            pq_distribute(t_2 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , t_2.filler as t_2_filler
from    t_1
      , t_2
where
        t_2.fk_id_skew (+) = t_1.id
and     regexp_replace(t_2.filler (+), '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

T_2 as build row source:

select count(t_2_filler) from (
select  /*+ monitor 
            leading(t_1 t_2)
            use_hash(t_2)
            swap_join_inputs(t_2)
            pq_distribute(t_2 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , t_2.filler as t_2_filler
from    t_1
      , t_2
where
        t_2.fk_id_skew = t_1.id
and     regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

T_2 as simple view row source:

select count(t_2_filler) from (
select  /*+ monitor 
            leading(t_1 t_2)
            use_hash(t_2)
            no_swap_join_inputs(t_2)
            pq_distribute(t_2 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , t_2.filler as t_2_filler
from    t_1
      , (select /*+ no_merge */ * from t_2) t_2
where
        t_2.fk_id_skew = t_1.id
and     regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(t_1.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

All these queries are back to 45 seconds on my test system and the execution plan doesn’t show any sight of the SKEW detection feature.

It’s also worth pointing out that the special SKEW aware distribution is only supported with hash joins, so, for example, using a parallel MERGE JOIN instead in the previous example won’t trigger the SKEW aware distribution either.

Furthermore if you happen to have joins to more than two tables then the join order dictates whether the feature might get used or not.

For example this three table join makes use of the feature:

select count(t_2_filler) from (
select  /*+ monitor
            leading(a b c)
            use_hash(b c)
            no_swap_join_inputs(b)
            no_swap_join_inputs(c)
            pq_distribute(b hash hash)
            pq_distribute(c hash hash)
            */
        a.id as t_1_id
      , a.filler as t_1_filler
      , c.id as t_2_id
      , c.filler as t_2_filler
from    t_1 a
      , t_1 b
      , t_2 c
where
        c.fk_id_skew = b.id
and     regexp_replace(c.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
and     a.id = b.id
and     regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

But this one here uses slightly different joins and a different join order that joins T_2 first to another instance of T_1, so it doesn’t make use of the feature and therefore suffers again from skew:

select count(t_2_filler) from (
select  /*+ monitor
            leading(a c b)
            use_hash(b c)
            swap_join_inputs(b)
            no_swap_join_inputs(c)
            pq_distribute(b hash hash)
            pq_distribute(c hash hash)
            */
        a.id as t_1_id
      , a.filler as t_1_filler
      , c.id as t_2_id
      , c.filler as t_2_filler
from    t_1 a
      , t_1 b
      , t_2 c
where
        c.fk_id_skew = b.id
and     regexp_replace(c.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(b.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
and     a.id = c.id
and     regexp_replace(a.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') >= regexp_replace(c.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c')
);

So what can we do to address Parallel Execution Skew manually, in cases where in 12c the new skew aware distribution doesn’t work yet or in versions prior 12c? We’ll have a look at that in the next part of the series.

Tags: , ,