Oracle Data Pump in RAC

Oracle Data Pump was introduced in Oracle Database 10g to enable very high-speed transfer of data and metadata between databases. A salient feature of Data Pump is that it can parallelize the export and import jobs for maximum performance. However, prior to Oracle Database 11g release 2 (11.2), if a Data Pump job is executed in parallel on an instance

Oracle Data Pump was introduced in Oracle Database 10g to enable very high-speed transfer of data and metadata between databases. A salient feature of Data Pump is that it can parallelize the export and import jobs for maximum performance.

However, prior to Oracle Database 11g release 2 (11.2), if a Data Pump job is executed in parallel on an instance of an Real Application Cluster (RAC) database, the parallelism is employed only on that instance without utilizing the potentially valuable idle resources of other instances.

From Oracle Database 11.2 onwards, parallelism is no longer confined to a single instance. Rather worker processes can be distributed across multiple Oracle RAC instances to better utilize Oracle RAC resources. The newly-introduced client parameter CLUSTER can be employed to specify whether you want to use the cluster resources. When CLUSTER=Y (default) is specified, Data Pump attempts to use all active Oracle RAC instances.

To exercise more control over the instances on which the job should run, the SERVICE_NAME parameter can be mentioned where SERVICE_NAME must be a valid known service of the database. In this case, the Data Pump job will run only on those instance where the service is available.

If any Oracle RAC instance where the job that is running dies or leaves the cluster, the job aborts and can be restarted at some future time.

In this article, I will demonstrate:

  • Case-I: Parallelization of an export job across all active Oracle RAC instances
  • Case-II: Parallelization of an export job across a subset of Oracle RAC instances

Current scenario:

  • Name of the cluster: cluster01
  • Number of nodes: 3 (host01, host02, host03)
  • RAC Database version: 11.2.0.3
  • Name of RAC database: orcl
  • Number of instances: 3
  • Names of instances: orcl1, orcl2, orcl3

In order that the Data Pump job is distributed across multiple instances of an RAC database, the following prerequisites should be met:

  • The directory object should point to shared storage that is accessible by any Oracle RAC instances designated to run Data Pump worker processes.
  • The PARALLEL parameter should be more than 1 and less than or equal to the number of files in the dump file set.
  • To specify enough dump files by specifying DUMPFILE parameter, the preferred approach is to use the wildcard option by using a substitution variable (%U) in the filename.
  • LOGFILE parameter should point to a location on the local file system. Log file is created on the node hosting the instance where master process is running.

Demonstration

Case-I: Export job distributed across all active Oracle RAC instances

  • First, let’s create a directory object which points to shared storage and is accessible by all the three instances of the database:

  • Issue the command to export SH schema in parallel across all active Oracle RAC instances:
  • Degree of parallelism = 6
  • Dumpfile – Create in the newly created directory DP_SHARED_DIR with names expsh%U.dmp beginning with 01 for %U, then using 02, 03, and so on.
  • Logfile – Create in the location on local file system pointed to by the directory object DATA_PUMP_DIR

  • Verify that the Data Pump job is distributed across 6 worker processes on all the three instances of the RAC database:

  • While the export is running, let us abort the database instance on host01 to verify that the Data Pump job can be restarted in case any instance where the job is running aborts.

As the instance orcl1 has crashed, the Data Pump job aborts.

  • Let us restart the job by ATTACHing to the job using the name of the job (as identified in expdp command output) as a parameter to the expdp command:

  • We will restart the job with a different degree of parallelism, say 4 (earlier it was 6):

It can be seen that four worker processes have been started on instances 2 and 3 as instance 1 has been aborted.

To show progress again, let us issue the CONTINUE_CLIENT command:

Thus, a Data Pump job:

  • Can be parallelized across all the available instances of a RAC database.
  • Restarted in future if any instance, where the job is running, aborts.

Case-II: Parallelization of export job across a subset of Oracle RAC instances

In case we want to run the job against a subset of instances, we can specify the SERVICE_NAME parameter which points to a service of the database.

  • In order to demonstrate this, let us first create and start a service called orcls for the database with preferred instances orcl1, orcl2 and available instance orcl3. It can be observed that the newly-created service orcls is currently available on instances orcl1 and orcl2.

  • Let us once again export the SH schema with a parallelism of 6 as earlier, but this time I am going to specify an additional parameter of SERVICE_NAME on the command line with points to service orcls, which is currently running on orcl1 and orcl2.

  • While the job is running, it can be verified that this time parallelism has been constrained across instances orcl1 and orcl2 only where the service is currently available.

  • Abort the RAC database instance on one of the nodes, say host01, where Data Pump export is currently running:

  • As the instance aborts, the export also gets terminated:

    • Restart the export by ATTACHing to the job by specifying its name:

 

Thus, a Data Pump job:

  • Can be parallelized across a subset of the available instances of a RAC database by specifying the SERVICE_NAME parameter.
  • Restarted in future if any instance, where the job is running, aborts.

Note: Since there is some overhead involved with distributing Data Pump processes across an Oracle RAC environment, there will be cases where there is no performance benefit. In those cases, you can

specify CLUSTER=N to force all Data Pump processes to run only on the instance where the job is started.

Summary:

  • Prior to Oracle Database 11g release 2 (11.2), if a Data Pump job is executed in parallel on an instance of a RAC database, the parallelism is employed only on that instance without utilizing the potentially valuable idle resources of other instances.
  • From Oracle Database 11.2 onwards, CLUSTER parameter can be employed to distribute the worker processes across multiple Oracle RAC instances to better utilize Oracle RAC resources.
  • To exercise more control over the instances on which the job should run, SERVICE_NAME parameter can be specified so that the Data Pump job runs only on those instances where the service is available.
  • If any Oracle RAC instance where the job is running dies or leaves the cluster, the job aborts and can be restarted at some future time.
  • For more information, Oracle have a white paper on the subject.