[Chapter 17] 17.8 Examples of Embedded PL/SQL

17.8 Examples of Embedded PL/SQL

The more you think about stored functions in SQL, the more you come up with ways to put them to use in every single one of your applications. To prod your creativity and get you started, here are a number of examples of the ways stored functions in SQL can change the way you build Oracle-based systems.

17.8.1 Encapsulating Calculations

In just about any and every application, you will need to perform the same calculations over and over again. Whether it is a computation of net present value, mortgage balance, the distance between two points on a Cartesian plane, or a statistical variance, with native SQL you have to recode those computations in each of the SQL statements in which they are needed.

You can pay a big price for this kind of redundancy. The code that implements your business rules is repeated throughout the application. Even if the business rule doesn't change, the way you should implement the rule is almost sure to require modification. Worse than that, the business rule itself might evolve, which could necessitate fairly significant alterations.

To solve this problem, you can hide or encapsulate all of your formulas and calculations into stored functions. These functions can then be called from within both SQL statements and also PL/SQL programs.

One fine example of the value of encapsulated calculations arose when an insurance company needed to perform date-based analyses on its accounts. The last day of the month is, of course, a very important date for most financial institutions. To manipulate dates, the company's IS department planned to make use of the built-in LAST_DAY function to obtain the last day of the month, and ADD_MONTHS to move from one month to the next. It soon uncovered a very interesting nuance to the way ADD_MONTHS worked: if you pass a day to ADD_MONTHS which is the last day in the month, SQL always returns the last day in the resulting month, regardless of the number of actual days in each of the months.

In other words:

ADD_MONTHS ('28-FEB-1994', 2) ==> 30-APR-1993

This approach might make sense for some applications and queries. The requirement at the insurance company, however, was that when you move a month, you must always land on the same day of the month (or the last day, if the original month's day was past the last day of the target month). Without stored functions, the SQL required to perform this calculation is as follows:

SELECT
   DECODE (payment_date,
           LAST_DAY (payment_date),
               LEAST (ADD_MONTHS (payment_date, 1),
                      TO_DATE (TO_CHAR (ADD_MONTHS (payment_date, 1),
                                        'MMYYYY') ||
                               TO_CHAR (payment, 'DD'),
                              'MMYYYYDD')),
           ADD_MONTHS (payment_date, 1))
  FROM premium_payments;

which may be read as, "If the last payment date falls on the last day of the month, then return as the next payment date the earliest of either the result of adding one month to payment date (using ADD_MONTHS) or the same day in the new month as the day in the month of the last payment date. If the last payment was not made on the last day of the month, simply use ADD_MONTHS to get the next payment date."

Not only is that difficult to understand, but it required three different calls to the ADD_MONTHS built-in. And remember that this complex SQL would have to be repeated in every SELECT list where ADD_MONTHS was used to increment or decrement dates. You can well imagine how happy the programmers in this company became when they installed Oracle Server Version 7.1 and were able to use the following function inside their SQL statements (for a full explanation of the function's logic, see Chapter 12, Date Functions):

FUNCTION new_add_months (date_in IN DATE, months_shift IN NUMBER)
   RETURN DATE
IS
   return_value DATE;
BEGIN
   return_value := ADD_MONTHS (date_in, months_shift);
   IF date_in = LAST_DAY (date_in)
   THEN
      return_value :=
         LEAST (return_value,
                TO_DATE (TO_CHAR (return_value, 'MMYYYY') ||
                         TO_CHAR (date_in, 'DD') ,
                         'MMYYYYDD'));
   END IF;
   RETURN return_value;
END new_add_months;

With the stored function, the SELECT statement to obtain the next payment date is simply:

SELECT new_add_months (payment_date,1)
  FROM premium_payments;

The more you look through your SQL statements, the more opportunities you find for stored functions that can hide calculations and therefore improve the long-term stability of your code. While it is unlikely that you will have the time and resources to go back and rewrite wholesale the SQL underpinnings of your application, you can at least build the functions and roll them into any new product development.

17.8.2 Combining Scalar and Aggregate Values

This simple question is hard to answer in SQL: "Show me the name and salary of the employee who has the highest salary in each department, as well as the total salary for that person's department."

Broken into two separate queries, this question poses no problem. Here is the first part:

SELECT department_id, last_name, salary
  FROM employee E1
 WHERE salary = (SELECT MAX (salary)
                   FROM employee E2
                  WHERE E.department_id = E2.department_id)
GROUP BY department_id;

and here is the second part:

SELECT department_id, SUM (salary)
  FROM employee
ORDER BY department_id;

However, I cannot very easily combine them since that would require listing and obtaining both scalar (single row) and aggregate (across multiple rows) values from the same table. The following SELECT list contains the information I want to display. How could I construct my FROM, WHERE, and GROUP BY clauses to show both an individual's salary and the departmental total?

SELECT department_id, last_name, salary, SUM (salary)
  FROM ...?
 WHERE ...?
GROUP BY ...?

The most straightforward solution prior to Release 2.1 of PL/SQL was to create a view that "presummarized" the salary for each department:

CREATE VIEW dept_salary
AS
   SELECT department_id, SUM (salary) total_salary
     FROM employee
    GROUP BY department_id;

Now, with this view, I can get at my answer with a single SQL statement as follows:

SELECT E.department_id, last_name, salary, total_salary
  FROM employee E, dept_salary DS
 WHERE E.department_id = DS.department_id
   AND salary = (SELECT MAX (salary)
                   FROM employee E2
                  WHERE E.department_id = E2.department_id);

This doesn't seem like such a bad solution, except that you have to create a customized view each time you want to perform this kind of calculation. In addition, this SQL is far less than straightforward for many programmers.

A better solution is to make use of a stored function in SQL. Instead of creating a view, create a function that performs exactly the same calculation, but this time only for the specified department:

FUNCTION total_salary (dept_id_in IN department.department_id%TYPE)
   RETURN NUMBER
IS
   CURSOR grp_cur
   IS
      SELECT SUM (salary)
        FROM employee
       WHERE department_id = dept_id_in;
   return_value NUMBER;
BEGIN
   OPEN grp_cur;
   FETCH grp_cur INTO return_value;
   CLOSE grp_cur;
   RETURN return_value;
END;

In this version I outer-join the department and employee tables. This way if a department does not exist (bad department ID number), I return NULL. If a department has no employees, I return 0; otherwise, I return the total salary in that department.

Now my query does not need to join to a view containing a GROUP BY. Instead, it calls the total_salary function, passing the employee's department ID number as a parameter:

SELECT E.department_id, last_name, salary, total_salary (E.department_id)
  FROM employee E
 WHERE salary = (SELECT MAX (salary)
                   FROM employee E2
                  WHERE E.department_id = E2.department_id);

The resulting SQL statement is not only easier to read; it also executes faster, especially for larger tables. I could simplify the SQL statement further by creating a function that returns the maximum salary in a particular department. The above SELECT would then become simply:

SELECT department_id, last_name, salary, total_salary (department_id)
  FROM employee E
 WHERE salary = max_sal_in_dept (department_id);

17.8.3 Replacing Correlated Subqueries

You can also use stored functions in a SQL statement to replace correlated subqueries. A correlated subquery is a SELECT statement inside the WHERE clause of a SQL statement (SELECT, INSERT, or DELETE) which is correlated (or makes reference) to one or more columns in the enclosing SQL statement. In the preceding section I used a correlated subquery to determine the employee who receives the highest salary in each department:

SELECT E.department_id, last_name, salary, total_salary (E.department_id)
  FROM employee E
 WHERE salary = (SELECT MAX (salary)
                   FROM employee E2
                  WHERE E.department_id = E2.department_id);

The last three lines in the query contain a SELECT statement matching the department ID number for the "inner" employee (E2) to the department ID number for the "outer" employee table (E1). The inner query is executed once for every row retrieved in the outer query.

The correlated subquery is a very powerful feature in SQL, since it offers the equivalent of a procedural language's nested loop capability, as in:

LOOP
   LOOP
   END LOOP;
END LOOP;

Two drawbacks with a correlated subquery are:

The logic can become fairly complicated
The resulting SQL statement can be difficult to understand and follow

You can use a stored function in place of a correlated subquery to address these drawbacks; in the above example, I would want a function that calculates the highest salary in a given department:

FUNCTION max_salary (dept_id_in IN department.department_id%TYPE)
   RETURN NUMBER
IS
   CURSOR grp_cur
   IS
      SELECT MAX (salary)
        FROM employee
       WHERE department_id = dept_id_in;
   return_value NUMBER;
BEGIN
   OPEN grp_cur;
   FETCH grp_cur INTO return_value;
   CLOSE grp_cur;
   RETURN return_value;
END;

I can now use both total_salary and max_salary in my SELECT statement that says, "Show me the name and salary of the employee who has the highest salary in each department, as well as the total salary for that person's department."

SELECT E.department_id, last_name, salary, total_salary (department_id)
  FROM employee
 WHERE salary = max_salary (department_id);

Compare that simple, self-documenting piece of SQL to the version requiring a view and correlated subquery:

CREATE VIEW dept_salary
AS
   SELECT department_id, SUM (salary) total_salary
     FROM employee
    GROUP BY department_id;

SELECT E.department_id, last_name, salary, total_salary
  FROM employee E, dept_salary DS
 WHERE E.department_id = DS.department_id
   AND salary = (SELECT MAX (salary)
                   FROM employee E2
                  WHERE E.department_id = E2.department_id);

and I am sure you will agree that stored functions in SQL can make your life much easier.

You may have noticed that the total_salary function from the previous section and the max_salary from this section look very similar. The only difference between the two is that the cursor in total_salary uses the SUM group function and the cursor in max_salary uses the MAX group function. If you are as fanatical about consolidating your code into the smallest possible number of distinct "moving parts," you might consider a single function that returns a different group-level statistic for a department based on a second parameter, as follows:

FUNCTION salary_stat
   (dept_id_in IN department.department_id%TYPE,
    stat_type_in IN VARCHAR2)
   RETURN NUMBER
IS
   v_stat_type VARCHAR2(20) := UPPER (stat_type_in);

   CURSOR grp_cur
   IS
      SELECT SUM (salary) sumsal,
             MAX (salary) maxsal,
             MIN (salary) minsal,
             AVG (salary) avgsal,
             COUNT (DISTINCT salary) countsal,
        FROM employee
       WHERE department_id = dept_id_in;

   grp_rec grp_cur%ROWTYPE;
   retval NUMBER;
BEGIN
   OPEN grp_cur;
   FETCH grp_cur INTO grp_rec;
   CLOSE grp_cur;

   IF v_stat_type = 'SUM'
   THEN
      retval := grp_rec.sumsal;

   ELSIF v_stat_type = 'MAX'
   THEN
      retval := grp_rec.maxsal;

   ELSIF v_stat_type = 'MIN'
   THEN
      retval := grp_rec.minsal;

   ELSIF v_stat_type = 'COUNT'
   THEN
      retval := grp_rec.countsal;

   ELSIF v_stat_type = 'AVG'
   THEN
      retval := grp_rec.avgsal;
   END IF;

   RETURN retval;
END;

The overhead of adding these additional expressions in the SELECT list -- and the processing of the IF statement -- is negligible. With this new, generic utility, my request for salary analysis shown above now becomes:

SELECT E.department_id, last_name, salary, salary_stat (department_id, 'sum')
  FROM employee
 WHERE salary = salary_stat (department_id, 'max');

If I ever have to change the SQL required to obtain departmental-level statistics for salary, I have to upgrade only this single function.

17.8.4 Replacing DECODEs with IF Statements

The DECODE function offers IF-like capabilities in the nonprocedural SQL environment provided by the Oracle Server. You can use the DECODE syntax to create matrix reports with a fixed number of columns and also perform complex IF-THEN-ELSE logic within a query. The downside to DECODE is that it can be difficult to write and very difficult to maintain. Consider the following example of DECODE to determine whether a date is within the prescribed range and, if it is, add to the count of rows that fulfill this requirement:

SELECT FC.year_number,
       SUM (DECODE (GREATEST (ship_date, FC.q1_sdate),
                    ship_date,
                    DECODE (LEAST (ship_date, FC.q1_edate),
                            ship_date, 1,
                            0),
                    0)) Q1_results,
       SUM (DECODE (GREATEST (ship_date, FC.q2_sdate),
                    ship_date,
                    DECODE (LEAST (ship_date, FC.q2_edate),
                            ship_date, 1,
                            0),
                    0)) Q2_results,
       SUM (DECODE (GREATEST (ship_date, FC.q3_sdate),
                    ship_date,
                    DECODE (LEAST (ship_date, FC.q3_edate),
                            ship_date, 1,
                            0),
                    0)) Q3_results,
       SUM (DECODE (GREATEST (ship_date, FC.q4_sdate),
                    ship_date,
                    DECODE (LEAST (ship_date, FC.q4_edate),
                            ship_date, 1,
                            0),
                    0)) Q4_results
  FROM orders O,
       fiscal_calendar FC
 GROUP BY year_number;

The result set for this query might look like this:

YEAR NUMBER     Q1 RESULTS    Q2 RESULTS    Q3 RESULTS    Q4 RESULTS
------------    ----------    ----------    ----------    ----------
1993            12000         14005         22000         40000
1994            10000         15000         21000         55004

While it is very handy to use DECODE to produce such a report, the SQL required to accomplish the task is more than a little frightening. Here is how you might try to interpret the Q1 RESULTS nested DECODE: "If the ship date is greater than or equal to the first quarter start date and the shipdate is less than or equal to the first quarter end date, then add one to the sum of the total number of orders shipped in that quarter. Otherwise, add zero."

Unfortunately, unless you are experienced in interpreting DECODE statements, you will find it difficult to glean this understanding from that convoluted SQL statement. The repetition in that single SELECT also cries out for modularization, which we can supply with the following stored function (incr_in_range means "increment if in the range"):

FUNCTION incr_in_range
   (ship_date_in IN DATE, sdate_in IN DATE, edate_in IN DATE)
   RETURN INTEGER
IS
BEGIN
   IF ship_date_in BETWEEN sdate_in AND edate_in
   THEN
      RETURN 1;
   ELSE
      RETURN 0;
   END IF;
END;

Yep, that's all there is to it! With the incr_in_range function, that long and winding SELECT statement simply becomes:

SELECT FC.year_number,
       SUM (incr_in_range (ship_date, q1_sdate, q1_edate)) Q1_results,
       SUM (incr_in_range (ship_date, q2_sdate, q2_edate)) Q2_results,
       SUM (incr_in_range (ship_date, q3_sdate, q3_edate)) Q3_results,
       SUM (incr_in_range (ship_date, q4_sdate, q4_edate)) Q4_results
  FROM orders O,
       fiscal_calendar FC
 GROUP BY year_number;

This stored function gets rid of the code redundancy and makes the SELECT statement much more readable. In addition, this function could be used in other SQL statements to perform the same logic.

17.8.5 GROUP BY Partial Column Values

The GROUP BY clause in a SELECT statement allows you to collect data across multiple records and group them by one or more columns. The following SELECT statement, for example, calculates the total amount of salary paid to all workers with the same job description (or "title"):

  SELECT job_title_desc, SUM (salary)
    FROM employee E, job_title JT
   WHERE E.job_title_id = JT.job_title_id
GROUP BY job_title_desc;

A portion of the result set from this query might look like this:

ACCOUNT MANAGER              32000
OFFICE CLERK                 4000
PROJECT MANAGER              10006
SHIPPING CLERK               4200

What if you wanted to see a total amount of salary for each type or category of job title? How much do all the various kinds of clerks in the company earn? What about all the different types of vice presidents and managers?

With native SQL, this is very difficult to write. In effect, you want GROUP BY to assess all the different categories, but those categories do not exist in a separate table. They are denormalized into the job title descriptions, and the GROUP BY clause can be applied only to whole column values. Fortunately, stored functions in SQL provide a straightforward solution to the problem. The following function, job_category, takes as its single parameter the primary key to the title table and returns the job category. It does so by performing a LIKE operation on the text for that job title -- something you cannot do directly in the GROUP BY clause.

FUNCTION job_category (title_id_in IN job_title.job_title_id%TYPE)
   RETURN VARCHAR2
IS
   CURSOR title_cur IS
      SELECT job_title_desc FROM job_title
       WHERE job_title_id = title_id_in;
   title_rec title_cur%ROWTYPE;
BEGIN
   OPEN title_cur; FETCH title_cur INTO title_rec;
   IF title_cur%NOTFOUND
   THEN
      CLOSE title_cur;
      RETURN NULL;
   ELSE
      CLOSE title_cur;
      IF title_rec.job_title_desc LIKE '%CLERK%'
      THEN
         RETURN 'CLERK';
      ELSIF title_rec.job_title_desc LIKE '%VICE PRESIDENT%'
      THEN
         RETURN 'VICE PRESIDENT';
      ELSIF title_rec.job_title_desc LIKE '%MANAGER%'
      THEN
         RETURN 'MANAGER';
      END IF;
   END IF;
END;

I can now rewrite my query using the job_category function and cut through all the confusion of my denormalized job title descriptions:

SELECT job_category (job_title_id) as title, SUM (salary)
  FROM employee
GROUP BY title;

I use the function in both the SELECT list and the GROUP BY. This query now produces a result set that looks like the following:

CLERK                8200
MANAGER              42006
VICE PRESIDENT       75000

17.8.6 Sequential Processing Against a Column's Value

It is very easy for me to find out if a particular word or set of characters appears in a string with the INSTR function. It is also straightforward in SQL to determine which rows have a column that contains a certain word. It is much more difficult within native SQL to determine the number of times a particular word or set of characters appears in a string in a single row. The built-in set-at-a-time processing of SQL does not provide iterative or looping behavior within a row -- just across rows.

If I want to perform iterative analysis or computation on a particular value in a single row, I need to apply a PL/SQL function. Let's see how this would come into play.

Here is the SQL that shows me all lines of text containing the word "the":

SELECT text
  FROM notes
 WHERE INSTR (text, 'the') > 0;

I can even use INSTR to find all the lines of text containing the word "the" at least three times:

SELECT text
  FROM notes
 WHERE INSTR (text, 'the', 1, 3) > 0;

I could, if pressed, also display all lines of text that containing the word "the" precisely three times:

SELECT text
  FROM notes
 WHERE INSTR (text, 'the', 1, 3) != 0
   AND INSTR (text, 'the', 1, 4) = 0;

In other words, if INSTR returns a nonzero number or location for a third occurrence of "the", but also returns a zero for a fourth occurrence of "the", that means that the string occurs precisely three times in the string.

If, however, I need to see a report that displays the number of occurrences of "the" in each of my notes, or if I want to GROUP BY the number of occurrences of "the", SQL comes up short. The best I can do is pseudocode to express my wish:

SELECT number-of-thes-in-text, text
  FROM notes
 GROUP BY number-of-thes-in-text;

As a nonprocedural language that deals with a set (multiple rows) of data at a time, SQL doesn't give me any way to programmatically analyze the contents of a particular column. PL/SQL, on the other hand, is perfectly equipped to solve these kinds of problems. The ps.parse package included on the disk demonstrates how to build a function that will return the number of atomics (words and/or delimiters) in a string. I will not repeat the implementation here, but will show how the function would be used in SQL:

Show the number of occurrences of "the" in each line of text:

SELECT text,
       ps_parse.number_of_atomics (text, 'the')
  FROM notes;

Show the distribution of the use of the word "urgent" across the days of the week for all text in which the word urgent appears at least once:
```
SELECT TO_CHAR (note_date, 'DAY'),
       ps_parse.number_of_atomics (text, 'urgent') urgent_count
  FROM notes
 WHERE urgent_count > 0;
```

17.8.7 Recursive Processing in a SQL Statement

The SQL language does not support recursion, yet this powerful programming method is at times crucial to solving a problem. PL/SQL does allow for recursive execution of function calls, however, so you can put it to use inside a SQL statement where recursion is needed.

Suppose your application needs to print checks. A check contains both the numeric version of the amount of the check (say, $99.70) and the written version of the amount of the check (ninety-nine dollars and seventy cents). You will not be able to use SQL to convert the numeric check amount (stored in the database) into the written version. You can, however, put PL/SQL to use, along with some of its more interesting features like recursion, global data, and PL/SQL tables, to provide a very elegant solution.

The package below offers a checks package to provide this conversion capability.[2]

[2] This implementation of "number-to-words" was first presented by Mike Burnside of Oracle Australia at the International Oracle User's Week in San Francisco in September 1994. I have made some minor modifications.

/* Filename on companion disk: checks.spp */
PACKAGE checks
IS
   /* Convert number to words */
   FUNCTION num2words (number_in IN NUMBER) RETURN VARCHAR2;

   /* Since a package, must assert my purity level. */
   PRAGMA RESTRICT_REFERENCES (num2words, WNDS);

END checks;

PACKAGE BODY checks
IS
   /* Table structure to hold names of numeric components. */
   TYPE numwords_tabtype IS TABLE OF VARCHAR2(20)
      INDEX BY BINARY_INTEGER;
   words_table numwords_tabtype;

   /* Used in initialization section. */
   v_date DATE;

   FUNCTION num2words (number_in IN NUMBER) RETURN VARCHAR2
   IS
      my_number NUMBER;
   BEGIN
      /* Sorry, I don't do cents in this program! */
      my_number := FLOOR (number_in);

      /* $1000+ */
      IF my_number >= 1000
      THEN
         /* Break up into two recursive calls to num2words. */
         RETURN num2words (my_number/1000) ||
                ' Thousand ' ||
                num2words (MOD (my_number, 1000));
      END IF;

      /* $100-999 */
      IF my_number >= 100
      THEN
         /* Break up into two recursive calls to num2words. */
         RETURN num2words (my_number/100) ||
                ' Hundred ' ||
                num2words (MOD (my_number, 100));
      END IF;

      /* $20-$99 */
      IF my_number >= 20
      THEN
         /* Break up into tens word and then final dollar amount. */
         RETURN words_table (FLOOR (my_number/10)) ||
                ' ' ||
                num2words (MOD (my_number, 10));
      END IF;

      /* Down to 19 or less. Get word from "upper register" of table. */
      RETURN words_table (my_number + 10);

   END num2words;

BEGIN
   /* Initialization section of package. Run just once per session. */

   /* Manually construct the tens names. */
   words_table (1) := 'Ten';
   words_table (2) := 'Twenty';
   words_table (3) := 'Thirty';
   words_table (4) := 'Forty';
   words_table (5) := 'Fifty';
   words_table (6) := 'Sixty';
   words_table (7) := 'Seventy';
   words_table (8) := 'Eighty';
   words_table (9) := 'Ninety';

   /* Return NULL for zero. */
   words_table (10) := NULL;

   /* Construct number names for one through nineteen. */
   FOR day_index IN 1 .. 19
   LOOP
      v_date := TO_DATE (to_char(day_index) || '-JAN-94');
      words_table (day_index+10) :=
         INITCAP (TO_CHAR (v_date, 'DDSP'));
   END LOOP;

END checks;

Here are some examples of conversion from whole numbers to words:

checks.num2words (99)    ==> Ninety Nine
checks.num2words (12345) ==> Twelve Thousand Three Hundred Forty Five
checks.num2words (5)     ==> Five

I can also put this packaged function to use inside my SQL statement to query up all the information I need to print a check:

SELECT TO_CHAR (SYSDATE, 'Month DD, YYYY'),
       payee,
       amount,
       checks.num2words (amount),
       comment


  FROM bill






 WHERE bill_status = 'UNPAID'


17.7 Realities: Calling PL/SQL Functions in SQL		V. New PL/SQL8 Features