Sub-optimal performance of a UDF

Mahesh S

unread,

Dec 4, 2006, 11:52:36 AM12/4/06

to

Hi

I recently wrote a user defined function in SQl to run in db2 and it
works fine. But I seem to get a warning saying that the performance is
sub-optimal.

Here is the warning message:

SQL0437W Performance of this complex query may be sub-optimal. Reason
code:
"2". SQLSTATE=01602

I have performed run statistics operation but am still getting the
warning.

Is there a list of things that I need to look into gain best
performance?

Thanks
Mahesh

Mahesh S

unread,

Dec 4, 2006, 11:56:02 AM12/4/06

to

I am producing the UDF code as it may help you guys in pointing out
what I am doing wrong. Here is the UDF code:

CREATE FUNCTION RISK_EQUATION_1( @patient_id VARCHAR(36) )
RETURNS double
------------------------------------------------------------------------
-- SQL UDF (Scalar)
------------------------------------------------------------------------
F1: BEGIN ATOMIC
DECLARE @x varchar(36);
DECLARE @z char;
DECLARE @hdl, @family_history, @smoke, @waist double;
DECLARE x1, x2, x3, x4, x5, x6, GmaxScore, answer double;
DECLARE start_time, end_time, diff_time timestamp ;

FOR myrow1 AS select PATIENT_VISIT_ID from
HEALTHCAREDB.PATIENT_VISIT_INFO_VIEW where PATIENT_ID = @patient_id DO
SET @x = myrow1.PATIENT_VISIT_ID;
END FOR;

FOR myrow2 AS select RC_7_13_1_DFH from HEALTHCAREDB.FAMILY_HISTORY
where PATIENT_ID = @patient_id DO
SET @z = myrow2.RC_7_13_1_DFH;
END FOR;

FOR myrow3 AS select RCO_4_01_0_2_W, RLD_5_06_3_HDLCL,
RL_14_02_1_SS from HEALTHCAREDB.PATIENT_VISIT_INFO_VIEW where
PATIENT_VISIT_ID = @x DO
SET @waist = myrow3.RCO_4_01_0_2_W;
SET @hdl = myrow3.RLD_5_06_3_HDLCL;
SET @smoke = myrow3.RL_14_02_1_SS;
END FOR;

IF @z = 'Y'
THEN SET @family_history = 1;
ELSE SET @family_history = 0;
END IF;

set x1 = @family_history;
set x2 = @family_history;
set x3 = @smoke;
set x4 = 0;
set x5 = 0;

set x2 = x3 - x2;
set x1 = HEALTHCAREDB.MAXIMUM(x1, x2);
set x2 = @hdl;
set x1 = HEALTHCAREDB.MAXIMUM(x1, x2);

set x2 = 1.735981;
set x3 = @family_history;
set x2 = x3 - x2;

if @waist = 104.0 then
set x3 = 1;
else set x3 = 0;
end if;

set x4 = @smoke;
if @waist = 103 then
set x5 = 1;
else set x5 = 0;
end if;

set x6 = 1.15732;
if x6 >= x5 then
set x5 = 1;
else set x5 = 0;
end if;

set x4 = HEALTHCAREDB.MAXIMUM(x4, x5);
set x3 = x4 - x3;
set x2 = x3 - x2;

if (x1 != 0) then
set x1 = x2 / x1;
else set x1 = 1;
end if;

if (@waist = 103.0) then
set x2 = 1;
else set x2 = 0;
end if;

set x1 = x2 - x1;
set x2 = @hdl;

set x1 = x2 - x1;

if (@waist = 104) then
set x2 = 1;
else set x2 = 0;
end if;

set x3 = 1.374318;

if (@waist = 104) then
set x4 = 1;
else set x4 = 0;
end if;

set x3 = x3 + x4;
set x2 = x2 + x3;
set x1 = HEALTHCAREDB.MAXIMUM(x1, x2);

set GmaxScore = x1;

set x1 = -0.4324524 * GmaxScore + 3.230582;

SET answer = ( 1 / (1 + x1));

return ( answer );

END

Knut Stolze

unread,

Dec 4, 2006, 12:17:42 PM12/4/06

to

Mahesh S wrote:

> I am producing the UDF code as it may help you guys in pointing out
> what I am doing wrong. Here is the UDF code:
>
> CREATE FUNCTION RISK_EQUATION_1( @patient_id VARCHAR(36) )
> RETURNS double
> ------------------------------------------------------------------------
> -- SQL UDF (Scalar)
> ------------------------------------------------------------------------
> F1: BEGIN ATOMIC
> DECLARE @x varchar(36);
> DECLARE @z char;
> DECLARE @hdl, @family_history, @smoke, @waist double;
> DECLARE x1, x2, x3, x4, x5, x6, GmaxScore, answer double;
> DECLARE start_time, end_time, diff_time timestamp ;
>
> FOR myrow1 AS select PATIENT_VISIT_ID from
> HEALTHCAREDB.PATIENT_VISIT_INFO_VIEW where PATIENT_ID = @patient_id DO
> SET @x = myrow1.PATIENT_VISIT_ID;
> END FOR;

Stupid question: why do you use a loop here? A simple

SET @x = ( SELECT ... FROM ... WHERE ... FETCH FIRST 1 ROWS ONLY )

would accomplish the same task as your loop.

> FOR myrow2 AS select RC_7_13_1_DFH from HEALTHCAREDB.FAMILY_HISTORY
> where PATIENT_ID = @patient_id DO
> SET @z = myrow2.RC_7_13_1_DFH;
> END FOR;

see above.

> FOR myrow3 AS select RCO_4_01_0_2_W, RLD_5_06_3_HDLCL,
> RL_14_02_1_SS from HEALTHCAREDB.PATIENT_VISIT_INFO_VIEW where
> PATIENT_VISIT_ID = @x DO
> SET @waist = myrow3.RCO_4_01_0_2_W;
> SET @hdl = myrow3.RLD_5_06_3_HDLCL;
> SET @smoke = myrow3.RL_14_02_1_SS;
> END FOR;

see above.

> IF @z = 'Y'
> THEN SET @family_history = 1;
> ELSE SET @family_history = 0;
> END IF;
>
> set x1 = @family_history;
> set x2 = @family_history;
> set x3 = @smoke;
> set x4 = 0;
> set x5 = 0;
>
> set x2 = x3 - x2;
> set x1 = HEALTHCAREDB.MAXIMUM(x1, x2);
> set x2 = @hdl;
> set x1 = HEALTHCAREDB.MAXIMUM(x1, x2);

How exactly is the function HEALTHCAREDB.MAXIMUM implemented?

> set x2 = 1.735981;
> set x3 = @family_history;
> set x2 = x3 - x2;
>
> if @waist = 104.0 then
> set x3 = 1;
> else set x3 = 0;
> end if;

You should consider using CASE expressions (here and in a few other places):

SET x3 = CASE
WHEN @waist = 104.0
THEN 1
ELSE 0
END;

Maybe you could use NULLs instead of 0 (if it makes sense).

--
Knut Stolze
DB2 Information Integration Development
IBM Germany

Serge Rielau

unread,

Dec 4, 2006, 1:21:28 PM12/4/06

to

as Knut notes

>>
>> set x1 = @family_history;
>> set x2 = @family_history;
>> set x3 = @smoke;
>> set x4 = 0;
>> set x5 = 0;

SET x1 = @family_history,
x2 = @family_history,
x3 = @smoke,
x4 = 0,
x5 = 0;

Note that DB2 says performance MAY be suboptimal.
It doesn't say it is.

In general try to avoid IF THEN ELSE in SQL UDF or TRIGGERS.
If your logic is complex use a stored procedure and CALL it from
the UDF.
Keep in mind that SQL UDF (and Triggers) are inlined into the execution
plan.
My rule of thumb is that any logic thing that doesn't fit onto a screen
belongs into a stored proc.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

WAIUG Conference
http://www.iiug.org/waiug/present/Forum2006/Forum2006.html

Mahesh S

unread,

Dec 5, 2006, 10:42:29 AM12/5/06

to

Hi

Thanks for the suggestions.

I have a few more questions regarding the performance of the UDF.

If I run the above UDF as a java function (call the database for each
query wriiten above in the UDF and do the computation within java
code), it runs faster than the actual execution of the UDF which is
very strange. The time taken for the java code happens to be around 20
ms. When I execute the UDF (select function(col) from table hwre
condition), it takes roughly 300 ms. Why is there so much disparity?
I assumed that the execution of the UDF will be faster as the querying
and computation is taking place closer to where the data resides.

Any pointers?

Cheers
Mahesh

Serge Rielau

unread,

Dec 5, 2006, 11:18:43 AM12/5/06

to

Execute the same statement a second time. I suspect you are spending
your time in compile. Running it twice will show the runtime cost.
SQL UDF are inlined into the plan while your Java UDf is pre-compiled.

Mahesh S

unread,

Dec 5, 2006, 11:23:17 AM12/5/06

to

Hi Serge

Well, I have a for loop running and I execute the function with a
different parameter each time. I have run the UDF in this manner
hundred times and the average execution time hovers around 300 ms.

When completely implemented in java as a function (the function makes 4
database calls for querying and then computes the result), the average
execution time hovers around 20 ms.

Firstly, am I correct in my assumption that the execution will be
faster using a UDF rather than querying the database over a network and
then computing?

Thanks
Mahesh

Mahesh S

unread,

Dec 5, 2006, 11:28:58 AM12/5/06

to

Just to clarify this bit in the above post

"Well, I have a for loop running and I execute the function with a
different parameter each time. I have run the UDF in this manner
hundred times and the average execution time hovers around 300 ms."

This is how I am calling the UDF from within java code:

------------------------------------------------------
Date start_date = new Date();

String sql1 = "VALUES HEALTHCAREDB.RISK_EQUATION_1('" +
rs.getString("PATIENT_ID") + "')";

Statement stmt1 = DB2Connection.createStatement();
stmt1.execute(sql1);
Date end_date = new Date();
long diff = end_date.getTime() - start_date.getTime();

System.out.println(diff);
------------------------------------------------------

Thats how I call the UDF.

Knut Stolze

unread,

Dec 5, 2006, 11:55:19 AM12/5/06

to

Mahesh S wrote:

> Just to clarify this bit in the above post
>
> "Well, I have a for loop running and I execute the function with a
> different parameter each time. I have run the UDF in this manner
> hundred times and the average execution time hovers around 300 ms."
>
> This is how I am calling the UDF from within java code:
>
> ------------------------------------------------------
> Date start_date = new Date();
>
> String sql1 = "VALUES HEALTHCAREDB.RISK_EQUATION_1('" +
> rs.getString("PATIENT_ID") + "')";

Switch to parameter markers. Here you have a different value for the
parameter given to the UDF. Thus, DB2 will see a different statement text
and re-compile the statement each time.

Besides, parameter markers are safer because the string in PATIENT_ID is not
part of the SQL statement and, therefore, cannot suddenly change the
statement itself. For example, let's assume that the string is:

' || ( SELECT password FROM yourtable WHERE ... ) || '

The overall statement would now be:

VALUES HEALTHCAREDB.RISK_EQUATION_1('...' ||
( SELECT password FROM yourtable WHERE ... ) || '')

Mahesh S

unread,

Dec 5, 2006, 12:14:07 PM12/5/06

to

Hi Knut

This is what I did after reading your mail but am getting an exception
and am not able to figure out whats wrong.

This is the code:

PreparedStatement pstmt =
DB2Connection.getConnection().prepareStatement("VALUES
HEALTHCAREDB.RISK_EQUATION_1(?)");
pstmt.setString(1,rs.getString("PATIENT_ID"));

pstmt.execute();

This is the exception I get:

com.ibm.db2.jcc.a.SqlException: DB2 SQL error: SQLCODE: -418, SQLSTATE:
42610, SQLERRMC: null
at com.ibm.db2.jcc.a.rf.e(rf.java:1680)
at com.ibm.db2.jcc.a.rf.a(rf.java:1239)
at com.ibm.db2.jcc.b.jb.h(jb.java:139)
at com.ibm.db2.jcc.b.jb.a(jb.java:43)
at com.ibm.db2.jcc.b.w.a(w.java:30)
at com.ibm.db2.jcc.b.cc.g(cc.java:160)
at com.ibm.db2.jcc.a.rf.n(rf.java:1219)
at com.ibm.db2.jcc.a.sf.gb(sf.java:1790)
at com.ibm.db2.jcc.a.sf.d(sf.java:2266)
at com.ibm.db2.jcc.a.sf.Z(sf.java:1298)
at com.ibm.db2.jcc.a.sf.execute(sf.java:1282)
at Test.main(Test.java:41)

Any pointers?

Thanks
Mahesh

Serge Rielau

unread,

Dec 5, 2006, 12:47:34 PM12/5/06

to

DB2 supports function overloading. If it doesn't knwo teh typ eit gets lost.
Try VALUES HEALTHCAREDB.RISK_EQUATION_1(CAST(? AS <sometype>))

Mahesh S

unread,

Dec 6, 2006, 6:35:13 AM12/6/06

to

yes, that works fine now.. great.. thanks a lot..