30 September 2015

9149 views

6 0

Introduction to User-Defined Aggregate Functions

OK, let’s speed past the easy bits, the parts we all already know: standard aggregate functions. Aggregate functions, unlike regular functions, take values from multiple rows as their input. The category includes those aggregate functions that are so ordinary they’re almost invisible – SUM, COUNT, MAX – and a couple that most of us never use – such as APPROX_COUNT_DISTINCT

In spite of such arcane functions there are always gaps that need plugging (indeed functions as un-extraordinary as median and mode were only added at 10g). I’m sure the guys at Oracle are all coffee-addled geniuses who work 24 and a half hours every day, but they’ll never be able to add every aggregate function to meet every single need.

User-Defined Aggregate Functions

Which is why, with 9i, the ability to create our own aggregate functions was added to Oracle. Before this, you’d have needed to write PL/SQL to replicate the capabilities of an aggregate function. Now you can create your own aggregate function, and call it directly from your SQL.

However, before we go crazy and dive right in, let’s agree on a problem to solve with our user-defined aggregate function. Let’s imagine that the standard EMP and DEPT tables belong to a small company that we run together, Scott Tiger Enterprises. Let’s also imagine that the government has just introduced a new law stating that the tax our company will pay is equal to the sum of the three largest salaries we pay our employees. An aggregate function will be perfect for determining the figure.

The ODCI Aggregate Routines

To write a user-defined aggregate function, we must have a passing acquaintance with Oracle Data Cartridges. Data cartridges, using the Oracle Extensibility Architecture framework, extend the abilities of the Oracle server. Don’t worry too much about it at this point; just know that we’ll be using an API from the data cartridge (ODCI) arsenal to build our aggregate function.

The 4 ODCI functions we will use are as follows:

ODCIAggregateInitialize(ctx IN OUT our_aggregate_object_type) RETURN NUMBER
This is the function within which we initialize our persistent variables. It executes once for each distinct member of the group by field (or just once, if your eventual query does not contain a group by).
ODCIAggregateIterate(self IN OUT our_aggregate_object_type, our_parameter_to_aggregate IN some_data_type) RETURN NUMBER
This function executes once for each row returned by the query. This is where the meat of our aggregate function will live.
ODCIAggregateMerge(self IN OUT our_aggregate_object_type, ctx2 IN our_aggregate_object_type) RETURN NUMBER
For our purposes here, let’s just say that the merge function simply returns ODCIconst.success (a numeric constant). There’s more to it, but it’s outside the scope of what we’re talking about today. You can build your aggregate function without knowing it.
ODCIAggregateTerminate(self IN our_aggregate_object_type, returnvalue OUT some_data_type, flags IN NUMBER) RETURN NUMBER
This function executes once for each distinct member of the group by field.

To use these functions, we will need to create an object type to implement them. And then, finally, we will create our user-defined aggregate function using our object.

It might all seem a little overwhelming at first, but, in truth, the steps are simple:

Create an object spec containing any persistent variables we might need, and declaring ODCIAggregateInitialize, ODCIAggregateIterate, ODCIAggregateMerge and ODCIAggregateTerminate.
Create an object body containing the bodies of your ODCI functions. Bear in mind that:
- ODCIAggregateInitialize will run once at the start of the process;
- ODCIAggregateIterate runs for each row, and is therefore where you will do all of your work;
- and ODCIAggregateTerminate runs once at the end of the whole process.
Create your aggregate function based on your object.

Example User-Defined Aggregate Function

An example will probably help. Let’s create an aggregate function, Taxbill, to find out how much Scott Tiger Enterprises will have to pay. Remember it’s the sum of the three highest salaries on our books.

First we need to create our object spec. Let’s call it newTaxFncs

CREATE OR REPLACE TYPE newTaxFncs AS OBJECT (

vTaxBill NUMBER,

vHighest NUMBER,

v2ndHigh NUMBER,

v3rdHigh NUMBER,

STATIC FUNCTION ODCIAggregateInitialize (ctx IN OUT newTaxFncs) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateIterate (self IN OUT newTaxFncs, pSal IN NUMBER) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateMerge(self IN OUT newTaxFncs, ctx2 newTaxFncs) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateTerminate (self IN newTaxFncs, taxVal OUT NUMBER, flags IN NUMBER) RETURN NUMBER

);

Now we need to write our object type body. Remember that ODCIAggregateInitialize initializes our variables, vTaxBill, vHighest, v2ndHigh and v3rdHigh; ODCIAggregateIterate does all the legwork; and ODCIAggregateTerminate returns the final figure.

CREATE OR REPLACE TYPE BODY newTaxFncs IS

STATIC FUNCTION ODCIAggregateInitialize (ctx IN OUT newTaxFncs) RETURN NUMBER IS

BEGIN

-- Initialize our variables: vTaxBill, vHighest, v2ndHigh, v3rdHigh

ctx := newTaxFncs(0,0,0,0);

RETURN ODCIConst.success;

END ODCIAggregateInitialize;

MEMBER FUNCTION ODCIAggregateIterate (self IN OUT newTaxFncs,

pSal IN NUMBER)

RETURN NUMBER IS

BEGIN

-- If this is one of the top 3 salaries, save it in one of the variables.

IF pSal >= self.vHighest THEN

self.v3rdHigh := self.v2ndHigh;

self.v2ndHigh := self.vHighest;

self.vHighest := pSal;

ELSIF pSal >= self.v2ndHigh THEN

self.v3rdHigh := self.v2ndHigh;

self.v2ndHigh := pSal;

ELSIF pSal >= self.v3rdHigh THEN

self.v3rdHigh := pSal;

END IF;

RETURN ODCIconst.success;

END ODCIAggregateIterate;

MEMBER FUNCTION ODCIAggregateMerge (self IN OUT newTaxFncs,

ctx2 IN newTaxFncs)

RETURN NUMBER IS

BEGIN

RETURN ODCIconst.success;

END ODCIAggregateMerge;

MEMBER FUNCTION ODCIAggregateTerminate (self IN newTaxFncs,

taxVal OUT NUMBER,

flags IN NUMBER)

RETURN NUMBER IS

BEGIN

-- The variable returned OUT by Terminate is what the aggregate function will use.

taxVal := self.vHighest + self.v2ndHigh + self.v3rdHigh;

RETURN ODCIconst.success;

END ODCIAggregateTerminate;

END;

Finally, we need to create our aggregate function using the object that we’ve just created. Pay attention to the syntax ‘cos it’s slightly different from that of a standard function.

CREATE OR REPLACE FUNCTION taxbill (pSal NUMBER) AS "Taxbill"

RETURN NUMBER

AGGREGATE USING newTaxFncs;

Now we can use it. Just like any other aggregate function. How cool is that?

1 2	SELECT taxbill(sal) FROM emp;

Taxbill

11000

SELECT deptno, taxbill(sal) AS "Taxbill"

FROM emp

GROUP BY deptno;

DEPTNO	Taxbill
10	8750
20	8975
30	5950

Finally

I feel there are a couple of things I need to mention before we move on.

Our example user-defined aggregate function was number-based, but Oracle allows them accept VARCHAR2, dates, and other scalar data types, and complex ones too.
Our object type is built on PL/SQL; however, we could have used any supported language, such as C, C++ or Java. I’m illiterate in those languages, so it was an easy decision for me.

Analytic Functions

Remember back in the day when all new televisions had stickers that read HD-ready plastered all over them? Well, I feel that user-defined aggregate functions should come with an “Analytic Function-ready” sticker on them. That is because you can use any aggregate functions you build as analytic functions.

Imagine we wanted to see how our tax bill has changed over the years with each new member of staff.

SELECT ename, job, hiredate, sal, taxbill(sal) OVER (order by hiredate) AS "Taxbill"

FROM emp

ORDER BY hiredate;

ENAME	JOB	HIREDATE	SAL	Taxbill
SMITH	CLERK	12/17/1980	800	800
ALLEN	SALESMAN	02/20/1981	1600	2400
WARD	SALESMAN	02/22/1981	1250	3650
JONES	MANAGER	04/02/1981	2975	5825
BLAKE	MANAGER	05/01/1981	2850	7425
CLARK	MANAGER	06/09/1981	2450	8275
TURNER	SALESMAN	09/08/1981	1500	8275
MARTIN	SALESMAN	09/28/1981	1250	8275
KING	PRESIDENT	11/17/1981	5000	10825
FORD	ANALYST	12/03/1981	3000	10975

Or imagine we wanted to pay our tax department by department.

SELECT ename, job, sal, deptno, taxbill(sal) OVER (PARTITION BY deptno) AS "Taxbill"

FROM emp

ORDER BY deptno;

ENAME	JOB	SAL	DEPTNO	Taxbill
CLARK	MANAGER	2450	10	8750
MILLER	CLERK	1300	10	8750
KING	PRESIDENT	5000	10	8750
FORD	ANALYST	3000	20	8975
SCOTT	ANALYST	3000	20	8975
JONES	MANAGER	2975	20	8975
SMITH	CLERK	800	20	8975
ADAMS	CLERK	1100	20	8975
WARD	SALESMAN	1250	30	5950
MARTIN	SALESMAN	1250	30	5950

Conclusion

So that’s it.

Once created, your aggregate functions will slide in right beside Oracle’s standard ones, and enjoy all the advantages – and disadvantages – that they have. Try running one against an unindexed column in a very large table, and you’ll suffer a performance hit. Unsurprisingly. Use one to condense a very complicated process into a single line of SQL, and you’ll enjoy a readability bonus. Again, unsurprisingly.

So, go forth, and use your new knowledge wisely.

Introduction to User-Defined Aggregate Functions

User-Defined Aggregate Functions

The ODCI Aggregate Routines

Example User-Defined Aggregate Function

Finally

Analytic Functions

Conclusion

Subscribe for more articles

Rate this article

David Njoku

Related articles

A Guide to INSERT, UPDATE, and DELETE Statements in Oracle

Tags