Divide and conquer is what Oracle was thinking with this one. Table partitioning is about optimizing “medium selectivity queries”. The Oracle database has optimization techniques for high selectivity queries, with the use of indexes. If we need to process all data in a big table, we have to live with the fact it will take a while, but the engine will process the data as fast as possible. However, a medium selectivity query needs just a portion of the data, for instance a tenth. This is to much data for indexes, to little data for full table scans so the processing time might become rather long regarding the outcome.

Let me give you a brief explanation of how Oracle collects its data, and how table partitioning can help for these queries. Do note this is not for everyone; “Oracle Partitioning” is an extra cost option, for Enterprise Edition only.

Access methods in a nutshell

Oracle has two commonly used table access methods, “full table access” and “access by rowid”. For the first method, Oracle reads all the blocks in a table, and applies filters afterwards. This might look like a lot of overhead, but multiblock reads are used for reading large amounts of data in bulk. The access by rowid is mostly used in conjunction with indexes. The index scan returns a rowid, and a single block read is used to get the block we’re interested in. This looks like an interesting method, because we read a lot less data. But when we need a lot of records, this means a lot of small operations that need to be set up, which brings a lot of overhead. Depending on your data, you’re better off with multi block reads when you’re fetching more than 5% to 10% of the table.

Screenshot: Oracle multi block and single block reads

Table partitioning to the rescue

When we store our data in a non-partitioned table, without indexes, we will always have a full table scan. Consider following code:

CREATE TABLE big_t AS 
  SELECT ROWNUM AS n_uniq 
       , MOD(ROWNUM,10) AS n10
       , MOD(ROWNUM,100000) AS n100k
       , RPAD(rownum,1000,'A') AS filler 
    FROM all_source, all_source 
   WHERE rownum<= 100000; 

SELECT COUNT(1) FROM big_t;

COUNT(1)
--------
  100000

Task completed in 2,249 seconds

SELECT COUNT(1)
  FROM big_t
 WHERE n10 = 1;

COUNT(1)
--------
   10000

Task completed in 2,328 seconds

With table partitioning, a table can be physically divided into multiple smaller tables, called partitions, while logically it stays one table. This means your code stays the same, but full partition scans will be executed instead of a full table scan.

These partitions are created based on a key. Depending on which value a certain column has, the record will be stored in a certain partition. It’s important to choose a column that is often used in queries as our key.

CREATE TABLE big_t_list 
 PARTITION BY LIST(n10) 
   (partition part1 VALUES (1)
   ,partition part2 VALUES (2,3,4)
   ,partition part3 VALUES (DEFAULT))
  AS SELECT * 
       FROM big_t;

SELECT table_name, tablespace_name, blocks, num_rows
  FROM user_tables
 WHERE table_name LIKE 'BIG_T%';

TABLE_NAME TABLESPACE_NAME BLOCKS NUM_ROWS
---------- --------------- ------ --------
BIG_T      USERS            14449   100000
BIG_T_LIST                  14536   100000

SELECT table_name, partition_name, high_value, tablespace_name, blocks, num_rows
  FROM user_tab_partitions
 WHERE table_name LIKE 'BIG_T%';

TABLE_NAME PARTITION_NAME  HIGH_VALUE TABLESPACE_NAME BLOCKS NUM_ROWS
---------- --------------- ---------- --------------- ------ --------
BIG_T_LIST PART1           1          USERS             1461    10000
BIG_T_LIST PART2           2, 3, 4    USERS             4364    30000
BIG_T_LIST PART3           DEFAULT    USERS             8711    60000

The big advantage of partitioning is the possibility for “partition pruning”. When we look for a value of “1” in column “N10”, we know we will only find this in partition “PART1”, so we don’t need to access the other partitions, and our query will execute ten times as fast. When we need all values, it will scan all partition, and there will only be a small overhead.

SELECT COUNT(1) FROM big_t_list; 

COUNT(1)
--------
  100000

Task completed in 3,601 seconds

SELECT COUNT(1) FROM big_t_list WHERE n10 = 1;

COUNT(1)
--------
   10000

Task completed in 0,537 seconds

As you can see, the partition pruning makes an enormous difference, going from 2.3 seconds to 0.5 seconds.

Table partitioning methods

Table partitioning already exists since 8i, but a lot of new possibilities were added since 11g. The different types of partitioning are:

Range partitioning

The table is divided in ranges, typically used for date ranges. This is beneficial when the filters using inbetween, greater than or less than. It is able to skip all partitions not in the range.

CREATE TABLE orders_range(order_id NUMBER 
                         ,client_id NUMBER 
                         ,order_date DATE) 
  PARTITION BY RANGE(order_date) 
    (PARTITION orders2011 VALUES LESS THAN (to_date('1/1/2012','dd/mm/yyyy')) 
    ,PARTITION orders2012 VALUES LESS THAN (to_date('1/1/2013','dd/mm/yyyy')) 
    ,PARTITION orders2013 VALUES LESS THAN (MAXVALUE));

When inserting a record that does not belong to any of the partitions, an ORA-14400 error will be raised. If you create a partition with “MAXVALUE”, it will be used as default partition. New partitions can be added or dropped manually when needed, or they can be created automatically with interval partitioning.

CREATE TABLE orders_range(order_id NUMBER 
                         ,client_id NUMBER 
                         ,order_date DATE)  
  PARTITION BY RANGE(order_date)
    INTERVAL (NUMTOYMINTERVAL(1,'year'))
    (PARTITION orders2011 VALUES LESS THAN (to_date('1/1/2012','dd/mm/yyyy'))
    ,PARTITION orders2012 VALUES LESS THAN (to_date('1/1/2013','dd/mm/yyyy')));

List partitioning

A limited set of possible values is given; rows containing the same value are grouped. This can be used for columns with few distinct values, when all values are known, like department or country. It can be beneficial to group values together that are combined in filters.

To avoid the ORA-14400 error, one can use the “DEFAULT”-keyword.

CREATE TABLE clients_list(client_id NUMBER
                         ,name VARCHAR2(50)
                         ,country VARCHAR2(2))
  PARTITION BY LIST(country)
    (PARTITION clients_benelux VALUES ('BE','NE','LU')
    ,PARTITION clients_uk      VALUES ('UK')
    ,PARTITION clients_other   VALUES (DEFAULT));

Hash partitioning

A value is hashed, and random distribution occurs. This is used with many distinct values, when there are no searches on ranges. The advantages can be a more evenly distribution than with range partitioning.

CREATE TABLE clients_hash(client_id NUMBER
                         ,name VARCHAR2(50)
                         ,country VARCHAR2(2))
  PARTITION BY HASH(name)
  PARTITIONS 5;

Composite partitioning

A partition can be subpartitioned by any of the previous methods, in any combination since 11g. This will allow more queries to benefit from table partitioning. Partition pruning will occur when filtering on both keys or only one of the keys.

CREATE TABLE clients_hl(client_id NUMBER
                       ,name      VARCHAR2(50)
                       ,country   VARCHAR2(2))
  PARTITION BY LIST(country)
  SUBPARTITION BY HASH(name)
    SUBPARTITIONS 5
    (PARTITION clients_benelux VALUES ('BE','NE','LU')
    ,PARTITION clients_uk      VALUES ('UK')
    ,PARTITION clients_other   VALUES (DEFAULT));

Refpartitioning

Partitioning is also possible on parent and child relations. When the parent table is partitioned, the child table can be partitioned based on the foreign key. For instance when the “ORDERS” table is partitioned on “ORDER_DATE”, the “ORDER_LINES” table can be partitioned on the order date as well, without storing the actual value in the “ORDER_LINES” table. This is very beneficial when joining the two tables on the foreign key.

CREATE TABLE orders_range(order_id NUMBER PRIMARY KEY
                         ,order_date DATE)  
  PARTITION BY RANGE(order_date) 
    (PARTITION orders2011 VALUES LESS THAN (to_date('1/1/2012','dd/mm/yyyy')) 
    ,PARTITION orders2012 VALUES LESS THAN (to_date('1/1/2013','dd/mm/yyyy')) 
    ,PARTITION orders2013 VALUES LESS THAN (MAXVALUE));

CREATE TABLE order_lines (order_line_id NUMBER PRIMARY KEY
                         ,order_id NUMBER NOT NULL
                         ,line VARCHAR2(50)
                         ,CONSTRAINTo_ol_fk FOREIGN KEY (order_id) REFERENCES orders_range(order_id))
  PARTITION BY REFERENCE (o_ol_fk);

Partitioned indexes

Not only tables can be partitioned, but indexes can become very large objects as well. The indexes for partitioned tables can be stored on three different ways:

  • Non partitioned: No partitioning applied
  • Globally partitioned: The index is not partitioned on the same key as the table
  • Locally partitioned: The index is partitioned on the same key as the partitioned table

Which type of index partitioning type you chose is depended on the query it needs to support. If the filter always contains the key, local indexes can be used, which are easier to manage. Globally partitioned indexes offer higher flexibility.

Now let’s start partitioning!

If you can afford this option, it will speed up your queries dramatically, without changing anything to your code. Since 11g partitions have been greatly enhanced, but they can be used as from Oracle 8i.
If you cannot use this option, the concept might still be a life saver, dividing your big tables into smaller ones. This will be very handy for recent data and archived data, but your code will become very complex, very quickly.