Compression in Oracle – Part 2: Read-Only Data

In part one of this series we saw that the mechanics of basic table compression came in to play only for direct path inserts, CTAS, and “alter table move”. We also saw that Oracle would, by default, set the free space percentage (pctfree) for a table to zero if the table were defined as compressed – giving us a hint

In part one of this series we saw that the mechanics of basic table compression came in to play only for direct path inserts, CTAS, and “alter table move”. We also saw that Oracle would, by default, set the free space percentage (pctfree) for a table to zero if the table were defined as compressed – giving us a hint that basic compression should be viewed as a read-only strategy.

When we examined a block dump we found that Oracle didn’t “compress” data, it “deduplicated” the data on a block by block basis, creating a list of the duplicated values within each block, replacing duplicated items with tokens representing those items. Moreover Oracle could rearrange the column order for each block to improve its chances of being able to use a single token to represent multiple adjacent column values. This told us that Oracle didn’t need to “decompress” blocks when reading them, it merely needed to reconstruct rows by following pointers which, of course, is a CPU intensive task.

In this article we look at what happens if you don’t stick to the “read-only” guideline, and from this we can move on in the third article to the (separately licensed) “OLTP” compression option. As before, all examples come from an instance of Oracle 11.2.0.3.

Deduplication and Deletes

You will recall that in the last article I picked a row from a data block dump that consisted entirely of a single token, then found that Oracle had recursively applied deduplication to reduce that token to a combination of two tokens and two extra column values. Here is the row we examined:

And here is what we found when we looked up the value of the single token referenced::

The first five bytes of the bindmp tells use that the token is used 8 times in the block (00 08), is made up of 4 columns, and requires us to look up tokens 54 (0x36) and 64 (0x40), which are as follows:

With this picture in front of us, we can now recognize the extra work that has to be introduced if we delete the original row. Two things have to happen – the row has to be marked as deleted (in the normal way), but the “usage count” on token 49 also has to decrease by 1. Here’s a tiny extract from the block immediately after deleting our row, first the row entry itself:

And here’s the binary dump from token 49 – note, particularly, the second byte:

So one thing we can see is that the work done to maintain the block increases – even on a simple delete. But this token is used in seven other rows in the block, so you what happens if I delete those rows as well? The answer depends on how many concurrent sessions do the deletes. If I use a single process to delete all eight rows Oracle deletes the token as I delete the eighth row – at which point tokens 63 and 64 have to be updated to show that they have one less dependency. If I repeat the test using multiple sessions to delete the rows and don’t commit after each delete, I can get to a situation where a token shows zero usage but doesn’t disappear. (It’s possible that some later block tidying operation that I haven’t yet observed will get rid of tokens in this state.)

Until I mentioned the concurrency test I hadn’t said anything about commits (or rollbacks); the changes to the tokens take place on the delete, and don’t wait for a commit – so what happens if I commit, or rollback? On a commit the usual commit cleanout may take place, updating the transaction’s ITL slot with the commit SCN (in other words, nothing new or interesting happens). On a rollback the data is restored – the row is recreated from the undo information, any tokens that have been deleted will also be recreated, and any relevant token usage counts are incremented.

An important point to note is that on the rollback, compression is preserved. There will be a difference between the original block and the block after the rollback, though – the rows (and tokens) that are restored will be written back into the block’s free space – which may require the block to go through a process that coalesces the free space. So if you repeat a block dump after the rollback you may see that the contents of the block have moved. In my case (after deleting the 8 rows that referenced token 49 and rolling back) I saw the following difference:

Compression and Free space

The business of rows “moving” if you delete and roll back introduces an interesting point about free space. When you declare a table with basic compression, it defaults to pctfree 0 – no free space, and yet there was free space in my block for rows to move into when a rollback took place.

There is a minor complexity that can confuse the issue when using small rows (and “compressed” rows can be very small) but even allowing for this, I found that Oracle seemed to leave a little free space (a few tens of bytes, enough for roughly two full rows in my small test case). This may simply be an error in the estimates that Oracle used for space requirements while using the direct path mechanism to build blocks – it may have been deliberate.

This small amount of space allowed Oracle to restore the deleted rows, pushing down and tidying up the block to reclaim the little holes left from the prior deletes as it did so. In some cases you may find that this free space could even allow you to survive a tiny number of updates – so let’s look at a few numbers.

I’m going to change my starting data set slightly so that a typical row looks like this: (1000001, ‘AAAA’, ’AAAAAAAAAA’,’         1’), the first column is a sequence number, the second column cycles from As to Es (5 distinct values), the third column from As to Js (10 distinct values), and the last column, space padded to 10 characters, cycles from 1 to 50; and I’ve generated 800 rows of data. Because of the way I’ve created the data, there happen to be eleven rows in the first block where the 2nd and 3rd columns are all A’s, so I’ve going to run the following SQL statement and dump the first block of the table to see what has happened.

It turned out that there was enough space in the block for two rows to be updated and still fit in the block, but two of my rows had to be migrated. Here are the before and after images of one of the rows that stayed in the block; take a close look at the binary dumps:

On the update, Oracle has expanded the row to the full four columns before modifying it and writing it back to the block in the free space (note the change of length (tl: ) and row address). You’ll have to take my word for it that there were two tokens in the token table that could have been used to substitute for the first two columns of this updated row, but Oracle didn’t try to find them and use them. – so at first sight it looks as if updates to compressed data are likely to cause total chaos, a highly compressed row could end up being expanded to a huge row, that won’t fit in the tiny amount of free space and therefore has to be migrated.

In passing, although we now have a bit of a mess, with a couple of expanded rows and a couple of migrated rows, when I issued a rollback, Oracle cleaned up all the mess and (apart from the physical rearrangement of rows in the block) left all the rows in their original compressed, un-migrated state.

So how bad can updates get? Before getting too alarmed it’s worth looking more closely at what I did with my update. I modified a column that was part of a token which covered most of the row. What if my update had changed a column which hadn’t been tokenized – would Oracle still expand the row to deal with the update? The answer is no. The last column in the block dumps above are reporting the id column, here are the “before and after” dumps of a row when I repeated my test but updated the just the id column

The update took place in situ (and the total row length dropped from 11 bytes to 10 bytes) because the new value was actually smaller than the old value – but notice how the binary dumps show us that the row consists, in both cases, of token 1b (fifth byte) which represents the first three columns (fourth byte) reported, and one actual stored value. Row expansion has NOT taken place on this update.

It gets even better; my initial test rows happened to be rows where virtually the whole row was covered by a single token – what happens if I have a row that is represented by several tokens and my update affects only one of the columns that is tokenized ? Whatever you may have heard about compressed data and expansion, Oracle doesn’t expand the whole row – it only expands the tokens affected by the update. Again, here’s a “before and after” pair of dumps to make the point:

At the start of this test, the binary dump shows that this row consisted of three separate tokens (0x32, 0x37 and 0x45) followed by a stored value. I updated “col 1” which held the value ‘XXXXXXXXXX’ to ‘YYYYYYYYYY’ and, as you can see, the final binary dump shows that this row still includes tokens 0x32 and 0x45, but token 0x37 has been replaced by the actual value. You can also see that the total length of the row has increased by 10 bytes, from 13 to 23, which means Oracle had to move it into the (small amount of) free space and, during a subsequent repacking of the rows in the block, the final address of the row has changed.

So, when you update data that has been subject to basic compression, Oracle may have to do some expansion of tokens into column values, but it keeps this expansion to a minimum (and if the update is a “no-change” update, expansion simply doesn’t take place). Since there seems to be a tiny amount of free space left in a block after compression – even when pctfree has defaulted to 0 – you may be able to do a tiny number of updates to a compressed table without causing dramatic expansion and row migration, but you may find the effects impossible to predict.

If you do need to do a little data maintenance on compressed data, you really need to do some careful testing with real data to see if you can find a suitable setting for pctfree that will keep row migrations to an acceptable level.

Summary

When you delete rows from a table defined with basic compression, there may be a little extra CPU usage as Oracle maintains the token table to reduce the reference counts on related tokens, and deletes tokens when their count reaches zero; but apart from this, and apart from a little space wastage when tokens get to a zero usage but don’t get deleted, the overheads on deletion don’t seem to present a threat.

When you update rows in a table defined with basic compression, you have to remember that Oracle will have set the pctfree to zero by default when you first defined the table as compressed (and will keep resetting it every time you move the table) so there will be very little space for rows to grow unless you explicitly set pctfree to something larger.

To make things more difficult, if your update modifies columns that have been tokenized, Oracle will work on a copy of the row with those modified tokens expanded to their full value – and it will not attempt to recompress the modified row afterwards even if there are suitable tokens already in existence in the block. As a side effect, you are likely to find that updates to compressed tables result in significant increases in row length and excessive row migration. Ironically, the smarter Oracle has been in compressing your data (resulting, typically, in multiple consecutive columns turning into a single token) the worse the explosion is likely to be when you start to update the data.

As a basic guideline – unless you are extremely well-informed about your data and its usage then basic compression should be used only for read-only data. In the next article we’ll be looking at “OLTP” compression and see what Oracle does (if anything) to improve this situation.

There is a catalogue of all five items in this series (and a few others) at this URL