Introduction to User-Defined Aggregate Functions

OK, let’s speed past the easy bits, the parts we all already know: standard aggregate functions. Aggregate functions, unlike regular functions, take values from multiple rows as their input. The category includes those aggregate functions that are so ordinary they’re almost invisible – SUM, COUNT, MAX – and a couple that most of us never use – such as APPROX_COUNT_DISTINCT

OK, let’s speed past the easy bits, the parts we all already know: standard aggregate functions. Aggregate functions, unlike regular functions, take values from multiple rows as their input. The category includes those aggregate functions that are so ordinary they’re almost invisible – SUM, COUNT, MAX – and a couple that most of us never use – such as APPROX_COUNT_DISTINCT.

In spite of such arcane functions there are always gaps that need plugging (indeed functions as un-extraordinary as median and mode were only added at 10g). I’m sure the guys at Oracle are all coffee-addled geniuses who work 24 and a half hours every day, but they’ll never be able to add every aggregate function to meet every single need.

User-Defined Aggregate Functions

Which is why, with 9i, the ability to create our own aggregate functions was added to Oracle. Before this, you’d have needed to write PL/SQL to replicate the capabilities of an aggregate function. Now you can create your own aggregate function, and call it directly from your SQL.

However, before we go crazy and dive right in, let’s agree on a problem to solve with our user-defined aggregate function. Let’s imagine that the standard EMP and DEPT tables belong to a small company that we run together, Scott Tiger Enterprises. Let’s also imagine that the government has just introduced a new law stating that the tax our company will pay is equal to the sum of the three largest salaries we pay our employees. An aggregate function will be perfect for determining the figure.

The ODCI Aggregate Routines

To write a user-defined aggregate function, we must have a passing acquaintance with Oracle Data Cartridges. Data cartridges, using the Oracle Extensibility Architecture framework, extend the abilities of the Oracle server.  Don’t worry too much about it at this point; just know that we’ll be using an API from the data cartridge (ODCI) arsenal to build our aggregate function.

The 4 ODCI functions we will use are as follows:

  • ODCIAggregateInitialize(ctx IN OUT our_aggregate_object_type) RETURN NUMBER
    This is the function within which we initialize our persistent variables.  It executes once for each distinct member of the group by field (or just once, if your eventual query does not contain a group by).
  • ODCIAggregateIterate(self IN OUT our_aggregate_object_type, our_parameter_to_aggregate IN some_data_type) RETURN NUMBER
    This function executes once for each row returned by the query. This is where the meat of our aggregate function will live.
  • ODCIAggregateMerge(self IN OUT our_aggregate_object_type, ctx2 IN our_aggregate_object_type) RETURN NUMBER
    For our purposes here, let’s just say that the merge function simply returns ODCIconst.success (a numeric constant). There’s more to it, but it’s outside the scope of what we’re talking about today. You can build your aggregate function without knowing it.
  • ODCIAggregateTerminate(self IN our_aggregate_object_type, returnvalue OUT some_data_type, flags IN NUMBER) RETURN NUMBER
    This function executes once for each distinct member of the group by field.

To use these functions, we will need to create an object type to implement them.  And then, finally, we will create our user-defined aggregate function using our object.

It might all seem a little overwhelming at first, but, in truth, the steps are simple:

  1. Create an object spec containing any persistent variables we might need, and declaring ODCIAggregateInitialize, ODCIAggregateIterate, ODCIAggregateMerge and ODCIAggregateTerminate.
  2. Create an object body containing the bodies of your ODCI functions. Bear in mind that:
    • ODCIAggregateInitialize will run once at the start of the process;
    • ODCIAggregateIterate runs for each row, and is therefore where you will do all of your work;
    • and ODCIAggregateTerminate runs once at the end of the whole process.
  3. Create your aggregate function based on your object.
Example User-Defined Aggregate Function

An example will probably help. Let’s create an aggregate function, Taxbill, to find out how much  Scott Tiger Enterprises will have to pay. Remember it’s the sum of the three highest salaries on our books.

First we need to create our object spec. Let’s call it newTaxFncs

Now we need to write our object type body. Remember that ODCIAggregateInitialize initializes our variables, vTaxBill, vHighest, v2ndHigh and v3rdHigh; ODCIAggregateIterate does all the legwork; and ODCIAggregateTerminate returns the final figure.

Finally, we need to create our aggregate function using the object that we’ve just created. Pay attention to the syntax ‘cos it’s slightly different from that of a standard function.

Now we can use it. Just like any other aggregate function. How cool is that?

Taxbill
11000

DEPTNO Taxbill
10 8750
20 8975
30 5950
Finally

I feel there are a couple of things I need to mention before we move on.

  1. Our example user-defined aggregate function was number-based, but Oracle allows them accept VARCHAR2, dates, and other scalar data types, and complex ones too.
  2. Our object type is built on PL/SQL; however, we could have used any supported language, such as C, C++ or Java. I’m illiterate in those languages, so it was an easy decision for me.
Analytic Functions

Remember back in the day when all new televisions had stickers that read HD-ready plastered all over them? Well, I feel that user-defined aggregate functions should come with an “Analytic Function-ready” sticker on them. That is because you can use any aggregate functions you build as analytic functions.

Imagine we wanted to see how our tax bill has changed over the years with each new member of staff.

ENAME JOB HIREDATE SAL Taxbill
SMITH CLERK 12/17/1980 800 800
ALLEN SALESMAN 02/20/1981 1600 2400
WARD SALESMAN 02/22/1981 1250 3650
JONES MANAGER 04/02/1981 2975 5825
BLAKE MANAGER 05/01/1981 2850 7425
CLARK MANAGER 06/09/1981 2450 8275
TURNER SALESMAN 09/08/1981 1500 8275
MARTIN SALESMAN 09/28/1981 1250 8275
KING PRESIDENT 11/17/1981 5000 10825
FORD ANALYST 12/03/1981 3000 10975

Or imagine we wanted to pay our tax department by department.

ENAME JOB SAL DEPTNO Taxbill
CLARK MANAGER 2450 10 8750
MILLER CLERK 1300 10 8750
KING PRESIDENT 5000 10 8750
FORD ANALYST 3000 20 8975
SCOTT ANALYST 3000 20 8975
JONES MANAGER 2975 20 8975
SMITH CLERK 800 20 8975
ADAMS CLERK 1100 20 8975
WARD SALESMAN 1250 30 5950
MARTIN SALESMAN 1250 30 5950
Conclusion

So that’s it.

Once created, your aggregate functions will slide in right beside Oracle’s standard ones, and enjoy all the advantages – and disadvantages – that they have.  Try running one against an unindexed column in a very large table, and you’ll suffer a performance hit. Unsurprisingly.  Use one to condense a very complicated process into a single line of SQL, and you’ll enjoy a readability bonus. Again, unsurprisingly.

So, go forth, and use your new knowledge wisely.