Compression in Oracle – Part 3: OLTP Compression

In part two of this series we examined the effect of updates on compressed data when using basic compression, and saw how Oracle will do some “decompression” of a row before updating it with the effect that highly compressed rows might easily become so much larger that even a small amount of change could lead to a large number of

In part two of this series we examined the effect of updates on compressed data when using basic compression, and saw how Oracle will do some “decompression” of a row before updating it with the effect that highly compressed rows might easily become so much larger that even a small amount of change could lead to a large number of rows migrating to different blocks. On the positive side we saw that Oracle kept this decompression to a minimum – expanding only those tokens that contained the columns being updated.

There is even a special case that a “no-change” update doesn’t do any decompression, although a comment from Randolf Geist on the previous article pointed out that this special case wasn’t implemented until 11.2.0.3; and a follow-up from “DBA” showed that the special case code seems to fail sometimes – possibly if you update a large number of rows in a block.

We also saw that Oracle made no attempt to “ re-compress” the row after update even if there were other preexisting tokens that could be used to reduce the size of the row.

In this article we move on to (separately licensed) “OLTP” compression (originally known as “compress for all operations”). We will re-run some of our tests to see how differently Oracle can behave with this option in play. I’ll be using 11.2.0.3 in my tests, with a locally managed tablespace, uniform 1MB extents, and (for reasons that I will mention later) freelist management rather than automatic segment space management (ASSM).

PCTFREE

Our very first test of basic compression involved defining a table with compression, and seeing what the data looked like on an initial load. We’ll do the same again, but use the option “compress for OLTP” (or “compress for all operations” for those using a slightly older version of Oracle). We’ll start by creating a table with 50,000 rows selected from view all_objects, and check to see if different strategies produce different results. The strategies I used in the original article were:

  • Create table as select from all_objects
  • Create table with compression as select from all_objects
  • Create empty table with compression enabled, insert rows
  • Create empty table with compression enabled, insert rows with append hint
  • Create table (no compression) as select; enable compression; move table

To this I’ve added one more test, creating the table then inserting one row at a time, with commits, from a pl/sql loop. The results we get through a select from view user_tables after each test are as follows:

Test

BLOCKS

PCT_FREE

COMPRESSION

COMPRESS_FOR

1 (CTAS)

714

10

DISABLED

 

2 (CTAS compress)

211

10

ENABLED

OLTP

3 Insert

227

10

ENABLED

OLTP

4 Insert append

211

10

ENABLED

OLTP

5a Compress enabled

714

10

ENABLED

OLTP

5b Move

211

10

ENABLED

OLTP

6 Insert (row by row)

230

10

ENABLED

OLTP

The most obvious point we can see here is that for OLTP compression pctfree defaults to the same 10 that is does for non-compressed tables (which explains why we see 211 blocks in many cases compared to the 189 blocks we saw with basic compression); we can also see that even normal inserts result in data being compressed on arrival – recall that for basic compression we had to use direct path loading; however we also note that the compression we get with normal inserts isn’t quite as good as the compression we get with direct path loading – and that’s something worth reviewing. It’s also worth reminding ourselves that simply enabling compression doesn’t change existing data (test 5a), we do have to recreate the table to compress the data. The single row insert strategy produced results that were very close to the array insert level of compression, so I didn’t pursue that method in subsequent tests.

There are various statistics relating to compression that appeared in 11g that can be helpful when we’re trying to understand what’s happening (or has happened) in some type of batch process, and I took a snapshot of the session statistics during the simple insert test (number 3), with the following significant results:

Some of these numbers are easy to explain – our final table has 227 blocks, which accounts for the “HSC OLTP Compressed Blocks”; (it’s a reasonable guess that HSC is “heap segment compression”); and the table would have been 712 blocks if we hadn’t been using compression which, at 8KB * (714 – 227), equates (very roughly and allowing for the pctfree) with the 3.5MB of “HSC OLTP Space Saving”.

We then have to explain the appearance of the “heap block compress” and “HSC OLTP inline compression”, which both record values of 1,521. It’s not very well known but “heap block compress” (which first appeared in 10g) has nothing to do with compression – it simply counts the number of times that a block has been “tidied” to move all the available free space up into the free space gap by pushing all the row pieces down to the bottom of the block.

If you delete a few rows from a block, or update a few rows so that they become longer and have to be moved into the free space gap, you leave holes in the row heap. If Oracle needs to do something that requires more space than is currently in the free space gap it can re-arrange the contents of the block, moving the rows downwards to the end of the block (adjusting the row directory as it does so) so that all the holes “bubble up” into the free space gap. This is the action recorded as a “heap block compress”. This also explain why a block dump shows two measures of free space, the “tosp” (total space free in block) and the “avsp” (available space in the free space gap) – ignoring a couple of anomalies the tosp is the avsp plus the sum of all the little holes. (see also: jonathanlewis.wordpress.com/2010/03/30/heap-block-compress)

So our bulk insert has caused Oracle to tidy a block 1,521 times – and since we have only 227 blocks in total this is an important clue about how OLTP compression works. There are (or, at least, ought to be) two mechanisms to consider – inserting rows and updating rows so that they increase in size.

While inserting rows, Oracle doesn’t worry about compressing them until an insert would push the block’s space usage beyond the pctfree limit; at this point Oracle “pauses” to run its compression algorithm on the data currently in the block (recording the second statistic “HSC OLTP incline compression”). With a little luck this will reduce the volume of data, leaving enough space below the pctfree mark for the new row to be inserted. (Note – existing data is compressed before the new row is inserted: the new row (or rows, on an array insert) won’t be compressed until compression is triggered by another insert that takes the block over the limit.) This is what we see in the statistics – we get a few rows into a block, then compress it, add a few more, then re-compress it, add a few more, and so on.

We might expect the mechanism for updates to be similar in principle, though with a difference in its target. The SQL Reference manual (E10592-04 p16-34), under “Create Table” tells us: “When you enable table compression by specifying COMPRESS FOR OLTP, you enable OLTP table compression. Oracle Database compresses data during all DML operations on the table.” The question then is what triggers compression on updates (or deletes). The answer seems to be “nothing” – it looks as if OLTP compression is only triggered by inserts crossing the pctfree threshold. (This observation is also noted by Randolf Geist on his blog at here: oracle-randolf.blogspot.co.uk/2011/05/assm-bug-reprise-part-2).

Of the various attempts I made to trigger compression on updates, the final test I tried was to create a block with two rows holding a column of 100 ‘Y’s (giving me one token), and 10 rows holding a column of 60 ‘X’s (giving me another token) and various other rows to fill the block to the limit of pctfree. The numbers of rows and sizes of columns were carefully calculated to test several options. I then tried the following – recreating the data for each test:

  • updated all the X rows to the appropriate Y’s value
  • update 9 of the X rows, commit, update the last X row
  • update 9 of the X rows, commit, delete 100 “spare” rows, commit, update last X row

In the first two cases the 10th “X” row migrated as it was updated – Oracle didn’t recompress to save space, even though there was a suitable token that it could have re-used. In the last case a “heap block compress” took place, tidying the block so that all the free space bubbled up to the free space gap so that Oracle could use it for the updated row, but again, recompression didn’t take place.

Sadly it seems that “compress for OLTP” (formerly “compress for all operations”) doesn’t compress for all operations, it compresses only for inserts, and the benefits it has over basic compression are that (a) it leaves 10% of the block free for updates, and (b) it doesn’t require direct path inserts to trigger compression. Given the limitations on how it works you may find that the problems it brings might make it something you want to avoid.

Problems

As we have seen, compress for OLTP doesn’t work for any operation other than inserts. But even then it doesn’t seem to work effectively. Our simple example of “insert as select” resulted in a table with 227 blocks in it, compared to 211 blocks when we used a direct path insert. When I dumped the first few blocks of the table, I found that the last 7 or 8 rows in each block hadn’t been compressed, and that the free space in the block was actually larger than the 10% indicated by the pctfree – somewhere along the line the logic wasn’t compressing quite as much as we would hope. (It’s possible that Oracle has an algorithm that says something like “If I recompress the block the space saving is likely to be less than x% so I won’t do it”; you could imagine that when you have 8 rows uncompressed in a block currently holding 240 rows then the extra space gained by compression would probably appear to be quite small, especially when you also factor in the CPU time required to apply the compression algorithm.

You may recall, however, that I started this article with the comment that I was using freelist management rather than ASSM – when I repeated my simple “insert 50,000 rows from all_objects”  using ASSM the size of the resulting table jumped from 227 blocks to 250 blocks. Of these 250 blocks, it looked as if 33 had had no compression applied at all, and a further 15 had stopped applying compression after they were roughly half full. Compression and ASSM don’t seem to work particularly well together – of course, 250 blocks is still a lot better than the 714 blocks needed without compression, but that’s not really the point.

Since updates result in token expansion, and recompression doesn’t happen on updates, you have to worry about how well your data has compressed, and how much it will be updated. Taking my copy of all_objects again, a typical block of non-compressed data held somewhere between 66 and 70 rows; but when compressed (for OLTP) the blocks held anything between 156 and 301 rows with a little over half of them holding 220 to 230 rows.

Looking on the pessimistic side, you have three times as many rows per block, which means you are three times as likely to do an update to a compressed block compared to an uncompressed block – and the better the compression you’ve achieved the worse these odds get. However, the real threat appears when you think about the compression method and the update strategy. In the first block of my table I had nineteen tokens that covered 11 consecutive columns – that means one byte in a “real” row representing 11 columns of data – and if you update just one of the those columns Oracle expands the one byte to the full eleven columns ! Checking the usage counts on the tokens I could see that of the 242 rows in that particular block, 182 of them referenced one of these tokens – that’s a lot of rows where a “single column update” could turn into an 11 column expansion, leading to a large amount of row migration. Because you could have a lot more processes updating data then inserting data, you may also the the small number of target blocks where the rows migrate to also end up suffering from more collisions, showing up as increased time spent on buffer busy waits.

Another unexpected consequence of compression is that when a row migrates out of a block it’s likely to increase the block’s free space by a very small amount (because it was a row that had been compressed to a few tokens) so, unlike “normal” migration you’re unlikely to find one row migration protecting the next few updates from migrating as well.

To date I’ve only heard complaints about OLTP compression (there’s an element of self-selection there as no-one ever calls me to look at their system because it’s running so well and has no problems). A common thread in the complaints I have heard, though, is about the significant amount of row migration (once it has been noticed), the extra CPU, and “buffer busy waits”.

You can minimize the migration, of course, by setting a suitable value for pctfree – but with the cunning tricks that Oracle uses to maximize compression it’s very hard to decide on a good value unless you know the data and the business operation very well.

The thing you need to hope for is that the data that’s not going to change is very repetitive and that the columns that are subject to updates are virtually unique so that they don’t end up sharing tokens with long collections of columns – but you probably won’t be able to see that until after you’ve tried compressing and analyzing a large volume of data. Unfortunately I’ve seen a lot of applications where every table has a column with a name like last_updated_by that is very repetitious but very likely to change over time; it is also quite likely to get into a multi-column token, so that a significant expansion takes place even when the “real” data change was against columns that you weren’t expecting to be compressed. (Of course, as with basic compression, if the column is “updated” but doesn’t actually change then token expansion won’t occur.)

Summary

Compression for OLTP is (according to the manuals) supposed to be able to compress during updates – but it doesn’t (at least, as far as I can tell); this means that you can easily end up suffering a large number of row migrations on updates, which can result in extra random I/Os, buffer busy waits, and increased CPU and latch activity.

If you’re going to use OLTP compression you need to work out (probably by trial and error) a suitable value of pctfree for each table that keeps row migration to an acceptable level.

Since OLTP compression does allow normal inserts to trigger compression, though, you might work out a strategy with partitioned tables that uses OLTP compression and a large setting for pctfree for “new” partitions and then rebuilds older partitions using basic compression. If you can work out a good strategy for using OLTP compression, though, think carefully about making a choice between freelist management and ASSM – there seem to be some undesirable side effects that appear when you mix OLTP compression with ASSM.

There is a catalogue of all five items in this series (and a few others) at this URL