Massive Deletes – Part 4

In the previous installment of this series I produced some figures highlighting the main differences between doing a large delete by tablescan and doing a large delete by index range scan. Depending on the data patterns involved the correct choice of strategy could make a significant difference to the number of random I/Os, the volume of redo generated, and the

In the previous installment of this series I produced some figures highlighting the main differences between doing a large delete by tablescan and doing a large delete by index range scan. Depending on the data patterns involved the correct choice of strategy could make a significant difference to the number of random I/Os, the volume of redo generated, and the amount of CPU required for sorting – all of which could affect the time required to do the delete.

The trouble with a nice simple demonstration, though, is that production systems are often more complex than the models that we first investigate; so if you’re faced with a big job you need to think carefully about how you can model something that really represents the system you have to handle. There are actually two different scenarios that make this important.

  • When you’re looking at a very large one-off task you need to get it right first time and not discover some critical special case too late – especially if you’re not allowed to take the production system off-line to get the job done and you’re working to a tight deadline.
  • When you’ve got a regular, but infrequent, very large job it’s worth knowing what small, apparently unrelated, actions could make a big difference to the run-time; and it’s worth knowing what might go wrong on the next upgrade so that you can pre-empt any problems.

A simple example of the latter, of course, is the brief comment I made about 12c and its ability to drive a delete through an index fast full scan – a feature that didn’t work in earlier versions of Oracle. In my tiny example one test changed its execution plan from an index full scan in 11g to an index fast full scan in 12c, taking twice as much time to complete.

Following up that thought – how many things can you think of that might make a difference when you’re trying to get Oracle to delete by index range scan ?

Here’s one thought which sounds like a good idea for a busy production system. Sometimes you find that a long-running DML statement slows down as it runs because eventually it gets to the part of the data that is fairly recent and therefore subject to current change; from this point onwards Oracle finds it has to read the undo segment to get the undo data that make it possible to create read-consistent versions of data blocks – which it needs to do so that it can check that the current and read-consistent versions of the block agree on which rows should be deleted.

One of my examples was deleting through the “date_open” index – so how about forcing a descending range scan of the index so that the newest data is checked first before it has had much (or any) time to suffer collateral damage from other DML?

There’s a very quick way of checking the effectiveness of this idea. We saw that Oracle’s optimisation strategy as it deleted table rows by index range scan was to accumulate all the relevant index entries aas it went then sort them in the closing stages of the delete before doing array modifications to the indexes; so all we have to do is check the number of rows sorted against the number of rows deleted and we will be able to tell whether or not the optimization took place.

My test data set had 10M rows and 4 indexes (the primary key, client_ref, date_open, and date_closed indexes) so in the best possible case I should see: “sorts (rows)” = 4 * rows deleted. Here’s the summary of a test I did to find out what happens:

We deleted 5,000,000 rows and sorted (effectively) none. The optimization simply didn’t apply when we walked the index in descending order – and I did check that the execution plan showed the “index range scan descending” that I had dictated.

Given that walking the index in descending order causes problems we might then be prompted to check what happens if we drive the delete through a descending index? Let’s try it. Drop the index on date_sold and recreate it as a descending index.

Within the limits of little side effect “sorts (rows)” = 4 * rows deleted: so it’s okay to use a descending index to try and delete newer data first – that’s nice to know, and could be useful to remember as a general feature.

So let’s imagine a few other things that might go wrong.

  • I’ve got a primary key defined on this table – but you can make constraints deferrable; alternatively you could simply create a non-unique index to protect a unique (or primary key) constraint. What effect might that have if we try to delete through the primary key index ?
  • If we’re going to worry about constraints we might consider the effect of foreign key constraints – we have a client_ref column that, in a production system, would probably be a foreign key reference to a clients table. Let’s create that table and add the foreign key constraint.
  • While we’re making things harder – there’s a well known feature that turns array processing into “row by row” processing – triggers. If we add a row-level trigger to the table what effect does that have ? Does it make any difference what type of trigger (before/after, insert/update/delete) ?

Here are a few results – first the non-unique index on primary key constraint:

In this example Oracle has made my primary key index non-unique as a side effect of the constraint being deferrable, but the effect is the same even if the constraint isn’t deferrable and you simply create the index as non-unique. The stats tell us that we’ve applied the optimization to three of our four indexes – and a quick check of v$segment_statistics tells us that it’s the primary key index that wasn’t given the special treatment, it was subject to a little over 5 million “db block changes”.  It’s worth a quick check at this point to see if driving through one of the other indexes might change this – but it doesn’t, it’s a side effect of the combination of unique constraint with non-unique index.

Next the effect of referential integrity when the big table is the “child table”:

Again we’ve sorted roughly 15M rows when we would normally expect to sort 20M rows. Again we can check v$segment_statistics to find out which index has suffered 5M “db block changes”, and you probably won’t be surprised to hear that it’s the “foreign key” index that has been maintained row by row – and we might guess that this is some sort of pre-emptive code that Oracle has to have to deal with the “foreign key locking” threat.

A follow-on from this particular test, where we deleted through the primary key, is to consider what would happen if we deleted through the foreign key index itself, or even modified the constraint to be “on delete cascade” and deleted some of the parent rows. Driving a delete on t1 through the client_ref still optimizes on the three other indexes, but when you try to take advantage of the “on delete cascade” mechanism there is no opportunity for the trick to have a large scale effect at all. Behind the scenes you find things like this:

This output the consequence of a bulk delete of 3,000 rows from t2 – because of the “on delete cascade”, the delete operated row by row on t2 and for each row Oracle executed a delete statement against t1.

Technically the array-based optimization is working and giving us a tiny benefit because of the index range scan, but the scatter of the data is so great that each call gives us virtually no benefit. At some stage we will have to explore this parent/child relationship further.

Finally we come to triggers. It’s fairly well known that row-level triggers can turn array processing into single row processing – does the same thing happen to Oracle’s index maintenance optimization ?

The optimization has completely disappeared. The same thing happens with an “after delete for each row” trigger, though not if the triggers are insert or update (row-level) triggers. It’s worth mentioning that the index optimization also takes place on updates where the values of indexed columns change (see this blog post) so it’s left as an exercise to the interested reader to see which (if any) of the trigger types allows the optimization to survive an array update.

Conclusion

Although there is a mechanism that can be very effective at index maintenance in the face of massive deletes there are currently several features of Oracle that might (fully or partially) disable the optimization. So if you have written some code with the explicit intention of pushing Oracle into taking advantage of this optimization make sure you document what you’ve done and why you’ve done it so that everyone is reminded to check that your code still works as expected when they modify the structures supporting the data, or when they do an upgrade that might change some detail of how Oracle selects an execution plan.

In future articles not only will we be looking at other strategies for eliminating large volumes of data, we’ll also consider various points of infrastructure that become relevant as you move back from the data and take in the overview of the database.