In the final part of this series I want to cover one last source of Parallel Execution Skew: Skew caused by Outer Joins.

If you have Outer Joins where the majority of foreign key values is NULL, this can lead to the following symptoms: From a processing point of view for a parallel outer join, the NULL value in the foreign key is a popular value like a non-NULL popular value. All rows with NULLs in the join column will be sent to the same Parallel Execution Server, and the whole Parallel Execution will suffer again from a bad distribution problem.

Consider the following sample query:

select count(t_2_filler) from (
select  /*+ monitor
            leading(t_2 t_1)
            use_hash(t_1)
            swap_join_inputs(t_1)
            pq_distribute(t_1 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') || regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as t_2_filler
from    t_1
      , t_2
where
        t_2.fk_id_sparse = t_1.id (+)
);

This query is quite similar to my previous sample query for demonstrating skew but it uses now the FK_ID_SPARSE column of T_2 to join. Due to the outer join I had to move the CPU intensive expression to the projection part instead of using it as part of the join.

If I run that query on my test system I get a run time of 45 seconds, which is not much faster than the serial variant, that in this case just runs a couple of seconds longer (52 seconds on my test system).

As mentioned in the previous installments of this series, this outer join skew problem isn’t covered by the new Oracle database 12c automatic skew detection / handling code, so we need to address this problem manually in any version of Oracle currently available.

Similar to the initial approach used for the popular values in the previous part of the series we could come up with a solution like the following:

select count(t_2_filler) from (
-- The first part of the query inner joins the non-null FK values
select  /*+ monitor
            leading(t_2 t_1)
            use_hash(t_1)
            swap_join_inputs(t_1)
            pq_distribute(t_1 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') || regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as t_2_filler
from    t_1
      , t_2
where
        t_2.fk_id_sparse = t_1.id
and     t_2.fk_id_sparse is not null
---------
---------
union all
---------
---------
-- The second part processes the null FK values - since they don't find a match no join is required
select
        NULL as t_1_id
      , NULL as t_1_filler
      , t_2.id as t_2_id
      , regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') || regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as t_2_filler
from    t_2
where
        t_2.fk_id_sparse is null
);

Since the NULL foreign keys won’t find a match in the other row source anyway, the first part of the UNION ALL processes only those rows with a non-null foreign key in T_2, whereas the second part doesn’t join at all and simply substitutes the T_1 values with NULLs.

Running that query I’m back to my desired 15 seconds run time, but the solution has a familiar disadvantage: We need to access T_2 twice, and the whole thing is a bit clumsy.

So, can we come up with an improved solution? Yes, we can: Since we don’t need to find a match for the NULL foreign keys, an obvious approach could be to remap the NULLs to a range of values that are guaranteed to be non-matching, similar to the remapping we did in the previous post:

define distrib_size = 1e4

select count(t_2_filler) from (
select  /*+ monitor
            leading(t_2 t_1)
            use_hash(t_1)
            swap_join_inputs(t_1)
            pq_distribute(t_1 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') || regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as t_2_filler
from    t_1
      , t_2
where
        -- this case expression maps FK_ID_SPARSE NULLs to a range of negative integers based on the primary key of T_2 that are guaranteed to be non-matching to T_1.ID
        case when t_2.fk_id_sparse is null then -mod(t_2.id, &distrib_size) - 1 else t_2.fk_id_sparse end = t_1.id (+)
);

Remapping the NULLs to a range of values that are guaranteed to be non-matching doesn’t change the result of the query, so semantically this query is equal to the original one. We don’t need an additional table for remapping here since we just want to spread the NULLs across some other non-matching values.

Running this query I’m back to my optimal 15 seconds, and I don’t need to access T_2 twice, so we have already addressed one of the disadvantages of above approach.

And again, from 11g on we could make use of virtual columns so we don’t have to write the complex join expression:

alter table t_2 add new_fk_id_sparse as (case when t_2.fk_id_sparse is null then -mod(t_2.id, &distrib_size) - 1 else t_2.fk_id_sparse end);

exec dbms_stats.gather_table_stats(null, 't_2', method_opt=>'for all columns size 1 for columns fk_id_skew size 254, new_fk_id_skew size 254, new_fk_id_sparse size 254', no_invalidate=>false)

And our query now looks like this:

select count(t_2_filler) from (
select  /*+ monitor
            leading(t_2 t_1)
            use_hash(t_1)
            swap_join_inputs(t_1)
            pq_distribute(t_1 hash hash) */
        t_1.id as t_1_id
      , t_1.filler as t_1_filler
      , t_2.id as t_2_id
      , regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'c') || regexp_replace(t_2.filler, '^\s+([[:alnum:]]+)\s+$', lpad('\1', 10), 1, 1, 'i') as t_2_filler
from    t_1
      , t_2
where
        t_2.new_fk_id_sparse = t_1.id (+)
);

This looks pretty similar to our original query but performs much, much better.

It’s a wrap

I hope this series has given you some ideas how you might be able to speed up Parallel Execution in case you encounter Parallel Execution Skew, but keep in mind that I’ve focused here on joins only and there are other operations that might be affected by skew too, like sorts and aggregates.

For such operations things look a bit different as we don’t have two row sources to process, so what I’ve outlined here cannot be simply be applied one to one. However some of the ideas presented here will hopefully serve as a starting point.

Tags: , , ,