10g: chaining pipelined functions is slow

Frank Bergemann

unread,

Oct 8, 2010, 4:42:17 PM10/8/10

to

Hi,

i use a chain of pipelined functions to separate an entire task into a
sequence of processing stages - like this:
(abstract, just pseudo-code, i'll try to post some working pl/sql code
next - but i hope you get the point)

function fConsumer
begin
cursor cur is select * from table(f1);
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%NOTFOUND;
-- do something with rec
end loop;
close cur;
return;
end;

function Calc1(rec in out nocopy ctx)
is
begin
-- do some calculation on ctx
-- or enrich with additional data
-- (from other tables, etc)
end;

function f1
return ctx_list pipelined
is
cursor cur is select * from table(f2);
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%NOTFOUND;
Calc1(rec);
pipe row(rec);
end loop;
close cur;
return;
end;

function Calc2(rec in out nocopy ctx)
is
begin
-- do some calculation on ctx
-- or enrich with additional data
-- (from other tables, etc)
end;

function f2
return ctx_list pipelined
is
cursor cur is select * from table(f3);
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%NOTFOUND;
Calc2(rec);
pipe row(rec);
end loop;
close cur;
return;
end;

[...] f3, f4, f5, ...

function CalcX(rec in out nocopy ctx)
is
begin
-- do some calculation on ctx
-- or enrich with additional data
-- (from other tables, etc)
end;

function fX
return ctx_list pipelined
is
cursor cur is select * from table(fDataSource);
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%NOTFOUND;
CalcX(rec);
pipe row(rec);
end loop;
close cur;
return;
end;

function fDataSource
return ctx_list pipelined
is
rec ctx;
number i;
begin

for i in 1..10000 loop
rec.x := i;
rec.y := 2*i;
pipe row(rec);
end loop;
return;
end;

The sequence of processing stages starts with fDataSource.
Then X, X-1, X-2, ... 3, 2, 1.
And it ends up in fConsumer.
(Don't get confused by revers order for 1..X)

What i expected, was oracle being able to use separate kind of working
threads for all the stages (f1, f2, f3, ..., fX). And i thought it's a
quite good model to localize access to different resources (tables) to
the fX/CalcX function pairs - so they don't interfere. Actually
something set-up similar to a message passing I/F like CSP
(communicating sequential processes). But it seems oracle does not
make use of such. It doesn't scale by this. Actually is slows down
significantly for each additional fX / CalcX that i introduce! It
speeds up again, if i collapse functions together into this:

function fConsumer
begin
cursor cur is select * from table(fDataSource);
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%NOTFOUND;
CalcX(rec);
[...]
Calc2(rec);
Calc1(rec);
-- do something with rec
end loop;
close cur;
return;
end;

Could s.o. confirm this?
Do i miss some additional option/flag/parameter for setting up
pipelined functions?
Or is this something not supported by 10g but will be by > 10g?

Or just the wrong approach(?!)

- many thanks!

cheers,
Frank

John Hurley

unread,

Oct 8, 2010, 5:51:16 PM10/8/10

to

Frank:

# i use a chain of pipelined functions to separate an entire task into

a sequence of processing stages - like this: (abstract, just pseudo-
code, i'll try to post some working pl/sql code next - but i hope you
get the point)

Sorry I skipped looking at your code so far.

What version exactly of 10g? Are you patched up ( somewhat? ... not
at all ? )?

What platform etc? Have you looked for existing bugs on metalink?

Frank Bergemann

unread,

Oct 9, 2010, 3:29:31 AM10/9/10

to

Hi John,

On 8 Okt., 23:51, John Hurley <hurleyjo...@yahoo.com> wrote:
> [...]

>
> What version exactly of 10g? Are you patched up ( somewhat? ... not
> at all ? )?

Here it is:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Solaris: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

Unfortunately i have no rights to execute 'opatch' on the system.
Is there another way to get a list of patches? - e.g. via ' connect
sys as sysdba' ?
(However, i assume, that our IT department is up to date with oracle
patches for the given oracle version.)

>
> What platform etc? Have you looked for existing bugs on metalink?

The platform is:

host> uname -a
SunOS padsol40 5.10 Generic_141444-09 sun4u sparc SUNW,SPARC-
Enterprise Solaris

But what do you mean with 'bugs on metalink'?

- thanks!

best regards,
Frank

Randolf Geist

unread,

Oct 9, 2010, 4:58:19 PM10/9/10

to

On Oct 8, 4:42 pm, Frank Bergemann <FBergem...@web.de> wrote:
> Hi,
>
> i use a chain of pipelined functions to separate an entire task into a
> sequence of processing stages - like this:
> (abstract, just pseudo-code, i'll try to post some working pl/sql code
> next - but i hope you get the point)
>

> Could s.o. confirm this?
> Do i miss some additional option/flag/parameter for setting up
> pipelined functions?
> Or is this something not supported by 10g but will be by > 10g?
>
> Or just the wrong approach(?!)
>
> - many thanks!
>
> cheers,
> Frank

I think your slow-down could be simply caused by the fact that each of
your pipelined table functions use single-row processing by using the
OPEN LOOP FETCH CLOSE pattern, but your collapsed one does this only
once. May be you want to try the same when using more efficient bulk
processing (BULK INTO ...) when fetching from the cursor.

By the way, the recommended way cascaded pipelined table functions are
supposed to be designed is to pass a cursor as parameter into the
function - that way you can write something like

select * from table(f(cursor(select * from table(g(cursor(select *
from table(h(cursor(some_query)))))))))

where each pipelined table function produces the output that is
consumed by the next table function as input - this can be even
performed in parallel using parallel execution and corresponding
declaration of the table function and its input cursor
(parallel_enable partition by ...)

For more information about the concepts, see e.g. here:

http://download.oracle.com/docs/cd/E11882_01/appdev.112/e17126/tuning.htm#LNPLS915

More details on parallel processing:
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e10765/pipe_paral_tbl.htm#i1004978

This is taken from the 11.2 docs but most of it applies to your 10.2
version as well.

Regards,
Randolf

Oracle related stuff blog:
http://oracle-randolf.blogspot.com/

Co-author of the "OakTable Expert Oracle Practices" book:
http://www.apress.com/book/view/1430226684
http://www.amazon.com/Expert-Oracle-Practices-Database-Administration/dp/1430226684

Frank Bergemann

unread,

Oct 10, 2010, 4:36:29 AM10/10/10

to

Hi Randolf,

On 9 Okt., 22:58, Randolf Geist <mah...@web.de> wrote:
>
> I think your slow-down could be simply caused by the fact that each of
> your pipelined table functions use single-row processing by using the
> OPEN LOOP FETCH CLOSE pattern, but your collapsed one does this only
> once. May be you want to try the same when using more efficient bulk
> processing (BULK INTO ...) when fetching from the cursor.
>
> By the way, the recommended way cascaded pipelined table functions are
> supposed to be designed is to pass a cursor as parameter into the
> function - that way you can write something like
>
> select * from table(f(cursor(select * from table(g(cursor(select *
> from table(h(cursor(some_query)))))))))
>
> where each pipelined table function produces the output that is
> consumed by the next table function as input - this can be even
> performed in parallel using parallel execution and corresponding
> declaration of the table function and its input cursor
> (parallel_enable partition by ...)
>
> For more information about the concepts, see e.g. here:

> [...]

Many thanks for these hints!
I am pretty sure, it will speed up my implementation.
However i am still wondering, if oracle sets up separate processing
instances (what i called 'worker threads') for each stage of the
chained pipelined functions? (i.e. for each pipelined function)
I understand now, that the "OPEN LOOP FETCH CLOSE pattern" is slow.
But it should not become slower again just by adding more stages of
this kind. In the result it should be as slow, as the slowest stage is
- IF oracle would use separate processing instance for each pipelined
function.
About 'parallel execution' that you mention above i have the feeling
this is a different kind of parallism. It's about partitioning the
entire data stream from source function to consuming function. But
still using a single processing instance to execute the entire chain
for a set of the partitioned data - correct?
My point here is, that functional decomposition might be closely
related to decomposition of accessed resources. And from that point of
view separate processing units per (chained) pipelined functions would
be quite useful in terms of performance.
But i have the feeling that such kind of parallism - separate
processing units per pipelined function - is not supported(?!)

best regards,
Frank

Randolf Geist

unread,

Oct 10, 2010, 8:30:38 AM10/10/10

to

On Oct 10, 10:36 am, Frank Bergemann <FBergem...@web.de> wrote:
> However i am still wondering, if oracle sets up separate processing
> instances (what i called 'worker threads') for each stage of the
> chained pipelined functions? (i.e. for each pipelined function)

Correct, as far as I know Oracle doesn't support this kind of
processing model out of the box. You can either have the whole chain
processed by a serial execution (single thread) or use parallel
processing to have the whole chain processed in multiple "partitions"
where each process takes care of a part of the whole set - see the
images that can be found when following the provided links to the
documentation. However you need to understand that the model that you
seem to have in mind potentially suffers from the required inter-
process communication that needs to stream the data between the
different "worker threads" - the same issue that Oracle has when using
its parallel processing model where a similar "producer - consumer"
model is possible, however it only supports one producer and consumer
set at the same time.

Frank Bergemann

unread,

Oct 10, 2010, 10:19:30 AM10/10/10

to

Am 10.10.2010 14:30, schrieb Randolf Geist:
> On Oct 10, 10:36 am, Frank Bergemann<FBergem...@web.de> wrote:
>> However i am still wondering, if oracle sets up separate processing
>> instances (what i called 'worker threads') for each stage of the
>> chained pipelined functions? (i.e. for each pipelined function)
>
> Correct, as far as I know Oracle doesn't support this kind of
> processing model out of the box. You can either have the whole chain
> processed by a serial execution (single thread) or use parallel
> processing to have the whole chain processed in multiple "partitions"
> where each process takes care of a part of the whole set - see the
> images that can be found when following the provided links to the
> documentation.

I will. To be honest, i am a bit disappointed by this.
It screw up my approach to scale up for a multi-stage processing
chain, dealing with different resources (tables) in the different
processing stages. So i have to change it the way you described before
and set up parallism by data partitioning.

> However you need to understand that the model that you
> seem to have in mind potentially suffers from the required inter-
> process communication that needs to stream the data between the
> different "worker threads" - the same issue that Oracle has when using
> its parallel processing model where a similar "producer - consumer"
> model is possible, however it only supports one producer and consumer
> set at the same time.

Actually, i am not generally concerned about inter-process
communication here. The context data (records) would have to be
buffered in queues (kind of tables) again anyway. But it could even be
"in memory". And: for passing on a UNIFORM context structure to all
the stages 1) only one stage will access a concrete context record at
a time. So there is no need for access synchronization (e.g.
locking). Moreover 2) the inter-process messages could be cut down to
transfer only a tracking_id for a context record, not the entire
context record itself.
Isn't BEA MessageQ in oracle's portfolio? *smile*

best regards,
Frank

Randolf Geist

unread,

Oct 11, 2010, 4:40:28 AM10/11/10

to

On Oct 10, 4:19 pm, Frank Bergemann <FBergem...@web.de> wrote:
> I will. To be honest, i am a bit disappointed by this.
> It screw up my approach to scale up for a multi-stage processing
> chain, dealing with different resources (tables) in the different
> processing stages. So i have to change it the way you described before
> and set up parallism by data partitioning.

Do you have evidence that Oracle's approach screwed up your processing
chain meaning do you have traced your execution to understand where
potentially excess work happens resp. where most of the time is spent?
It is very fortunate that Oracle is instrumented that well so that you
don't have to assume anything but can get hard facts where most of the
time is spent.

If you use the approach outlined above (the cascaded table function
model mentioned, not your original approach) then all the pipelined
table functions are really executed in a cascaded way which looks to
me like a quite efficient approach. You would potentially be CPU bound
since only one process is working on the whole processing chain,
however you need to be sure that CPU is really the problem here and
not anything else like I/O operations.

> the inter-process messages could be cut down to
> transfer only a tracking_id for a context record, not the entire
> context record itself.
> Isn't BEA MessageQ in oracle's portfolio? *smile*

Hm, and where is the actual data then stored that is pipelined between
the different processing worker threads? I think this is much more
complex than your simple description here. Either you have to pipeline
the whole data stream or you need to keep that data separate which
means that you have additional overhead for maintaining and accessing
this data again at each processing stage.

Message has been deleted

Randolf Geist

unread,

Oct 12, 2010, 6:24:44 AM10/12/10

to

On Oct 11, 8:49 pm, Frank Bergemann <FBergem...@web.de> wrote:
> Got it - i'll have a detailed look at the enterprise manager.
> Your hints already helped me to change back implementation to make not
> too much use of chaining (at least not in combination with this OPEN
> LOOP FETCH CLOSE idiom)
> - many thanks!!!

No need to go for Enterprise Manager necessarily - you can trace your
session using extended SQL trace (event 10046 or the DBMS_MONITOR
package from 10g). If you need a detailed break-down of your PL/SQL
activity there is the hierarchical PL/SQL profile DBMS_HPROF that has
been introduced with 11g. It generates almost no overhead and is very
easy to use, see http://download.oracle.com/docs/cd/E11882_01/appdev.112/e17125/adfns_profiler.htm#ADFNS02302.

> Most important for me was to get a _confirmation_, that there is no
> assignment of chained pipelined functions to different CPUs, thread or
> alike, but one "process" is executing the entire chain.
> So i don't go further into wrong direction.

If you want to build something similar to what you have in mind having
different worker threads working on the different chain steps you
could use Advanced Queueing together with DBMS_JOB / DBMS_SCHEDULER to
submit multiple worker threads. AQ offers "buffered in-memory queues"
that might come handy for that purpose. I've used such AQ / worker
threads quite successful in the past to parallelize tasks that
otherwise were running serially and therefore didn't make efficient
use of the available hardware.