Quantcast
Channel: Obsessed with Oracle PL/SQL
Viewing all articles
Browse latest Browse all 312

Table Functions, Part 5c: Another use case for Pipelined Table Functions (and simple example)

$
0
0
From Oracle Help Center (a.k.a., documentation), we read:
Data is said to be pipelined if it is consumed by a consumer (transformation) as soon as the producer (transformation) produces it, without being staged in tables or a cache before being input to the next transformation.  
Pipelining enables a table function to return rows faster and can reduce the memory required to cache a table function's results.  
A pipelined table function can return the table function's result collection in subsets. The returned collection behaves like a stream that can be fetched from on demand. This makes it possible to use a table function like a virtual table.
In a nutshell, this means that the calling query can put to use the rows returned by the pipelined table function (PTF) before the function has returns all rows.

A simple way to demonstrate this (and highlight another nice use case for PTFs) is with the SQL IN operator.

IN is used in the WHERE clause to determine if a column or expression is in the specified list. The list can be a literal list:

SELECT * FROM my_table WHERE my_col IN (1,2,3,4)

or the list can be a subquery, which is really important, since you cannot have more than 1,000 literals in a list.

There is no such limit with this kind of IN list:

SELECT * FROM my_table 
WHERE my_col IN (SELECT num_col FROM my_list)

Hey, that's a FROM clause! That means we could call a table function there! And yes you can:


SELECT * FROM my_table 
WHERE my_col IN (
   SELECT num_col FROM TABLE (my_tf))

To evaluate whether or not a value is in the IN list, the SQL engine needs to go through the list, looking for a match. When it finds a match, it stops searching.

Gee, so that should mean that if the table function is pipelined, and the function is returning rows as it generates them, the SQL engine should be able to get to an answer faster with pipelining than without. 

Shall we test that? Yes, we shall!

First, I create a table and two functions, the second of which is pipelined:

CREATE TABLE plch_data (n NUMBER)
/

CREATE OR REPLACE TYPE numbers_t IS TABLE OF NUMBER;
/

CREATE OR REPLACE FUNCTION my_list_tf
   RETURN numbers_t AUTHID DEFINER
IS
   ns   numbers_t := numbers_t ();
BEGIN
   ns.EXTEND (1000000);

   FOR indx IN 1 .. 1000000
   LOOP
      ns (indx) := indx;
   END LOOP;

   RETURN ns;
END;
/

CREATE OR REPLACE FUNCTION my_list_ptf
   RETURN numbers_t
   PIPELINED AUTHID DEFINER
IS
BEGIN
   FOR indx IN 1 .. 1000000
   LOOP
      PIPE ROW (indx);
   END LOOP;
   RETURN;
END;
/

Let's take a closer look at the pipelined version. There are just three items to note:

  1. I add the PIPELINED keyword.
  2. Instead of populating a nested table to return, I pipe the row directly out of the function.
  3. I return nothing but control at the end of my function. 
Oh and here's an odd thing to note: if you leave off the RETURN; statement in a pipelined table function, you will get no complaint from the SQL or PL/SQL engines. Since no data is being returned by RETURN, the function will simply terminates and returns control.

But I suggest you include the RETURN anyway. Looks better, less confusing to someone maintaining the code later.


DECLARE
   l_count   INTEGER;
   l_start   PLS_INTEGER;

   PROCEDURE mark_start
   IS
   BEGIN
      l_start := DBMS_UTILITY.get_cpu_time;
   END mark_start;

   PROCEDURE show_elapsed (NAME_IN IN VARCHAR2)
   IS
   BEGIN
      DBMS_OUTPUT.put_line (
            '"'
         || NAME_IN
         || '" elapsed CPU time: '
         || TO_CHAR (DBMS_UTILITY.get_cpu_time - l_start)
         || ' centiseconds');
      mark_start;
   END show_elapsed;
BEGIN
   INSERT INTO plch_data VALUES (1);

   COMMIT;
   
   mark_start;

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_tf));

   show_elapsed ('TF match on first');

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_ptf));

   show_elapsed ('PTF match on first');

   UPDATE plch_data
      SET n = 1000000;

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_tf));

   show_elapsed ('TF match on last');

   SELECT COUNT (*)
     INTO l_count
     FROM plch_data
    WHERE n IN (SELECT * FROM TABLE (my_list_ptf));

   show_elapsed ('PTF match on last');
END;
/

And when I run this block with server output turned on, I see:

"TF match on first" elapsed CPU time: 11 centiseconds
"PTF match on first" elapsed CPU time: 1 centiseconds
"TF match on last" elapsed CPU time: 13 centiseconds
"PTF match on last" elapsed CPU time: 5 centiseconds

Yep. Pipelining resulted in a signficant boost in performance. Go, PTF, go!

By the way, here are the clean-up steps for the above script:

DROP TYPE numbers_t
/

DROP TABLE plch_data
/

DROP FUNCTION my_list_tf
/

DROP FUNCTION my_list_ptf
/


Viewing all articles
Browse latest Browse all 312

Trending Articles