19 August 2011

4394 views

1 0

Can PL/SQL be Faster than SQL?

PL/SQL and SQL both are incredibly powerful languages that allow us to do some pretty amazing things with data. If you’ve been around the Oracle community for some time you’ve probably realized, either through your own experience or by reading articles, if you are able to accomplish a task using a single SQL statement then that is preferred over using procedural PL/SQL code that would run slower. I fully endorse this approach and both use and reiterate it daily to my teams. Here are but a few of the benefits of this brand of thinking:

It forces developers to think in “sets”. The Oracle database works incredibly efficiently with sets of data and learning how to harness that power will benefit developers and customers alike.
It promotes an exploration and understanding of the huge wealth of SQL functions available in Oracle.
It allows developers to get creative with their solutions while at the same time writing elegant and simple code that is easily understood and maintained.

So, should we always use SQL over PL/SQL if the SQL language permits us to do so? As you can probably guess, there are situations where using PL/SQL can be the better option. Such as in life, in technology the only guarantee is that there are no guarantees.

The following is an example of a real-world scenario that I have encountered a few times already in my career and expect to see a few times more before I hang up the keyboard and mouse for a set of golf clubs.

Consider the following two tables:

SQL> DESC EMPLOYEES

Name Null? Type

--------------------------------------- -------- ----------------

EMPLOYEE_ID NOT NULL NUMBER(22)

EMPLOYEE_SALARY NUMBER(15,2)

SQL> DESC NEW_SALARY_INFO

Name Null? Type

--------------------------------------- -------- ----------------

EMPLOYEE_ID NOT NULL NUMBER(22)

SALARY NUMBER(15,2)

Some important facts about these tables:

There is a primary key on the EMPLOYEE_ID column of the EMPLOYEES table
There are no keys or indexes on the NEW_SALARY_INFO table.
The NEW_SALARY_INFO table is in 3rd party vendor system that you cannot change as it would result in the loss of support from the vendor
The EMPLOYEES table and NEW_SALARY_INFO tables are “big”, having 1,000,000 rows and approximately 4,000,000 rows respectively
Not all EMPLOYEE_ID values in the EMPLOYEES table exist in the NEW_SALARY_INFO table

You have been tasked with updating the employee salaries in the EMPLOYEES table with the maximum salary for the same employee, based on EMPLOYEE_ID, in the NEW_SALARY_INFO table. If no record exists in the NEW_SALARY_INFO table for an employee, that employee’s data should remain the same.

At first glance, this seems relatively simple. We’ll just use an UPDATE statement to modify the rows in the EMPLOYEES table:

UPDATE employees e

SET employee_salary = (

SELECT MAX(salary)

FROM new_salary_info nsi

WHERE nsi.employee_id = e.employee_id

)

WHERE EXISTS (

SELECT NULL

FROM new_salary_info nsi2

WHERE nsi2.employee_id = e.employee_id

);

That should do it. Let’s have a look at the Explain Plan for this statement.

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------

| 0 | UPDATE STATEMENT | | 999K| 17M| | 2772M (2)| 999:59:59|

|* 2 | HASH JOIN SEMI | | 999K| 17M| 23M| 7887 (1)| 00:01:35|

| 3 | TABLE ACCESS FULL| EMPLOYEES | 1000K| 12M| | 690 (2)| 00:00:09|

| 4 | TABLE ACCESS FULL| NEW_SALARY_INFO| 3996K| 19M| | 2764 (2)| 00:00:34|

| 5 | SORT AGGREGATE | | 1 | 13 | | | |

|* 6 | TABLE ACCESS FULL| NEW_SALARY_INFO| 4 | 52 | | 2772 (2)| 00:00:34|

--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("NSI2"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")

6 - filter("NSI"."EMPLOYEE_ID"=:B1)

Hmm, that doesn’t look very promising at all. Look at the estimated cost and time. It makes sense though. There is no index on the NEW_SALARY_INFO table so the database will have to full scan a four million row table for each of the one million employee records to find the maximum salary.

Well, SQL is pretty awesome, so there must be another way to do this. Maybe we can update a view?

UPDATE

(

SELECT e.employee_id,

e.employee_salary,

NVL((

SELECT MAX(nsi.salary)

FROM new_salary_info nsi

WHERE e.employee_id = nsi.employee_id

e.employee_salary) AS new_sal

FROM employees e

)

SET employee_salary = new_sal;

Again, let’s have a look at the Explain Plan prior to executing this statement.

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

| 0 | UPDATE STATEMENT | | 1000K| 12M| 2772M (2)|999:59:59 |

| 2 | TABLE ACCESS FULL | EMPLOYEES | 1000K| 12M| 690 (2)| 00:00:09 |

| 3 | SORT AGGREGATE | | 1 | 13 | | |

|* 4 | TABLE ACCESS FULL| NEW_SALARY_INFO | 4 | 52 | 2772 (2)| 00:00:34 |

-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

4 - filter("NSI"."EMPLOYEE_ID"=:B1)

Once again we’ve got a very high estimated cost and time, for the same reason as the previous update. Surely we are not out of SQL options yet? No, we’re not. Let’s try a MERGE statement.

MERGE INTO employees e USING

(

SELECT e.rowid AS e_rowid,

MAX(nsi.salary) AS new_sal

FROM employees e,

new_salary_info nsi

WHERE e.employee_id = nsi.employee_id

GROUP BY e.rowid

) new_salaries

ON (e.rowid = new_salaries.e_rowid)

WHEN MATCHED THEN UPDATE

SET e.employee_salary = new_salaries.new_sal;

But when I try to view the Explain Plan for the MERGE, I receive the following error

1	ORA-28132: The MERGE INTO syntax does not support the security policy.

Curses! Foiled again! It turns out there is a VPD security policy on the EMPLOYEES table that prevents us from using the MERGE. So what now? All of our attempts at using SQL have been stymied.

PL/SQL to the rescue! By making use of bulk processing in PL/SQL we can perform our task rather quickly and with minimal code. The Oracle bulk collect tells the SQL engine to fetch multiple rows at once and store them in a collection defined by the developer. With the addition of the use of the FORALL statement, this greatly reduces the amount of context switching between PL/SQL and SQL by sending DML statements from PL/SQL to SQL in batches rather than one at a time, resulting in dramatic performance improvements over row-by-row processing.

The first thing we’ll need to do is write a SQL query that returns all of the data we need to update every row in the EMPLOYEES table that has a matching record(s) in the the NEW_SALARY_INFO table, keeping in mind that we need the maximum salary. This is rather straightforward.

SELECT e.employee_id employee_id,

MAX(nsi.salary) new_salary

FROM employees e,

new_salary_info nsi

WHERE e.employee_id = nsi.employee_id

GROUP BY e.employee_id;

That’s it. That gives us exactly what we are looking for. Now we can turn that query into a PL/SQL cursor, process the results in bulk, and perform the required update using a FORALL statement. Incidentally, the above query returns 999,000 rows.

SQL> DECLARE

2 CURSOR c_new_salary_info IS

3 SELECT e.employee_id employee_id,

4 MAX(nsi.salary) new_salary

5 FROM employees e,

6 new_salary_info nsi

7 WHERE e.employee_id = nsi.employee_id

8 GROUP BY e.employee_id;

10 TYPE t_new_salary_info IS TABLE OF c_new_salary_info%rowtype

11 INDEX BY PLS_INTEGER;

12 lt_new_salary_info t_new_salary_info;

14 v_rowcount PLS_INTEGER := 0;

15 BEGIN

16 OPEN c_new_salary_info;

18 LOOP

19 FETCH c_new_salary_info BULK COLLECT

20 INTO lt_new_salary_info LIMIT 1000;

22 EXIT WHEN lt_new_salary_info.COUNT = 0;

24 FORALL i IN 1..lt_new_salary_info.COUNT

25 UPDATE employees

26 SET employee_salary = lt_new_salary_info(i).new_salary

27 WHERE employee_id = lt_new_salary_info(i).employee_id;

29 v_rowcount := v_rowcount + SQL%ROWCOUNT;

30 END LOOP;

32 CLOSE c_new_salary_info;

34 DBMS_OUTPUT.PUT_LINE(v_rowcount||' rows updated.');

35 END;

36 /

999000 rows updated.

PL/SQL procedure successfully completed.

Elapsed: 00:00:29.91

SQL>

In approximately 30 seconds we’ve updated nearly one million rows with data from a four million row table. I’d say that’s pretty good, if you asked me.

So why does this work so well? The first reason has already been noted, and that is the use of bulk operations and FORALL to reduce context switching between PL/SQL and SQL. The second is how the cursor query is executed. The optimizer no longer has to worry about performing an update as part of the query and is able to simply join the two tables in a very efficient manner. We’ve gone from a series of SQL update statements that would either have taken hours to complete or produced an Oracle error, to a small PL/SQL block that does exactly we need in a very reasonable amount of time.

To satisfy our curiosity however, let’s see how the PL/SQL block stacks up against the UPDATE and MERGE statements if we were allowed to add indexes to the NEW_SALARY_INFO table and if the EMPLOYEES table did not have a security policy against it.

SQL> CREATE INDEX test_idx ON new_salary_info (employee_id);

Index created.

Elapsed: 00:00:06.04

SQL> BEGIN

2 DBMS_RLS.DROP_POLICY

3 (

4 object_name => 'EMPLOYEES',

5 policy_name => 'EMP_VIEW_POLICY'

6 );

7 END;

8 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01

SQL>

Now let’s look at the respective plans of our previous UPDATE and MERGE statements.

UPDATE employees e

SET employee_salary = (

SELECT MAX(salary)

FROM new_salary_info nsi

WHERE nsi.employee_id = e.employee_id

)

WHERE EXISTS (

SELECT NULL

FROM new_salary_info nsi2

WHERE nsi2.employee_id = e.employee_id

);

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------

| 0 | UPDATE STATEMENT | | 999K| 17M| | 8007K (13)| 26:41:31 |

|* 2 | HASH JOIN SEMI | | 999K| 17M| 23M| 7557 (1)| 00:01:31 |

| 3 | TABLE ACCESS FULL | EMPLOYEES | 1000K| 12M| | 690 (2)| 00:00:09 |

| 4 | INDEX FAST FULL SCAN | TEST_IDX | 3996K| 19M| | 2433 (1)| 00:00:30 |

| 5 | SORT AGGREGATE | | 1 | 13 | | | |

| 6 | TABLE ACCESS BY INDEX ROWID| NEW_SALARY_INFO | 4 | 52 | | 7 (0)| 00:00:01 |

|* 7 | INDEX RANGE SCAN | TEST_IDX | 4 | | | 3 (0)| 00:00:01 |

--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("NSI2"."EMPLOYEE_ID"="E"."EMPLOYEE_ID")

7 - access("NSI"."EMPLOYEE_ID"=:B1)

UPDATE

(

SELECT e.employee_id,

e.employee_salary,

NVL((SELECT MAX(nsi.salary) FROM new_salary_info nsi WHERE e.employee_id =

nsi.employee_id), e.employee_salary) AS new_sal

FROM employees e

)

SET employee_salary = new_sal;

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------

| 0 | UPDATE STATEMENT | | 1000K| 12M| 8000K (13)| 26:40:09 |

| 2 | TABLE ACCESS FULL | EMPLOYEES | 1000K| 12M| 690 (2)| 00:00:09 |

| 3 | SORT AGGREGATE | | 1 | 13 | | |

| 4 | TABLE ACCESS BY INDEX ROWID| NEW_SALARY_INFO | 4 | 52 | 7 (0)| 00:00:01 |

|* 5 | INDEX RANGE SCAN | TEST_IDX | 4 | | 3 (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

5 - access("NSI"."EMPLOYEE_ID"=:B1)

MERGE INTO employees e USING

(

SELECT e.rowid AS e_rowid,

MAX(nsi.salary) AS new_sal

FROM employees e,

new_salary_info nsi

WHERE e.employee_id = nsi.employee_id

GROUP BY e.rowid

) new_salaries

ON (e.rowid = new_salaries.e_rowid)

WHEN MATCHED THEN UPDATE

SET e.employee_salary = new_salaries.new_sal;

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------

| 0 | MERGE STATEMENT | | 3991K| 79M| | 14844 (2)| 00:02:59 |

| 2 | VIEW | | | | | | |

| 3 | MERGE JOIN | | 3991K| 144M| | 14844 (2)| 00:02:59 |

| 4 | SORT JOIN | | 3991K| 95M| | 9392 (3)| 00:01:53 |

| 5 | VIEW | | 3991K| 95M| | 9392 (3)| 00:01:53 |

| 6 | SORT GROUP BY | | 3991K| 98M| | 9392 (3)| 00:01:53 |

|* 7 | HASH JOIN | | 3991K| 98M| 23M| 9228 (1)| 00:01:51 |

| 8 | INDEX FAST FULL SCAN| SYS_C008302 | 1000K| 12M| | 516 (2)| 00:00:07 |

| 9 | TABLE ACCESS FULL | NEW_SALARY_INFO | 3996K| 49M| | 2767 (2)| 00:00:34 |

|* 10 | SORT JOIN | | 1000K| 12M| 45M| 5452 (1)| 00:01:06 |

| 11 | TABLE ACCESS FULL | EMPLOYEES | 1000K| 12M| | 690 (2)| 00:00:09 |

------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

7 - access("E"."EMPLOYEE_ID"="NSI"."EMPLOYEE_ID")

10 - access("E".ROWID="NEW_SALARIES"."E_ROWID")

filter("E".ROWID="NEW_SALARIES"."E_ROWID")

Those look much better. The optimizer is making use of the new index when it should, and the MERGE statement is now permissible. The following table summarizes the average of 10 executions of each of the UPDATE statements, the MERGE statement, and the PL/SQL block. The results are rather interesting:

	Update	Update View	Merge	PL/SQL
Average time (seconds) over 10 executions	56.753	40.924	27.013	31.108

The PL/SQL block out-performed both of the updates and was nearly on par with the MERGE statement, even in an ideal world.

What does this teach us? I think it is a reminder that we should be both aware of and willing to explore all of our options as developers. PL/SQL has the ability to work very efficiently for you if you understand what is available within the language. It could be your best option.

Can PL/SQL be Faster than SQL?

Subscribe for more articles

Rate this article

Chris Hritzuk

Related articles

A Guide to INSERT, UPDATE, and DELETE Statements in Oracle