This is causing overflow on our data warehouse queries that are
automatically generated by a point and click tool (such as Cognos), and we
cannot ask end users to customize the SQL to cast the result as BIGINT,
because it is all point and click and they never see the SQL.
Any chance of getting DB2 to automatically make SUM function result into
BIGINT if the table column is INT? How do I request such an enhancement from
IBM?
Having to define the detail columns as BIGINT just because a column may be
SUM'ed will waste a lot space in our warehouse since we have billions of
rows.
According to some googling, i found that DB2 functions can be overloaded
So you should be able (but i dont know how) to create your own SUM()
function that always returns a BIGINT
--
Luuk
to clarify more,
there is an "How to Create User-Defined Functions" in COGNOS
(i found it at: http://apsdwh.vlaanderen.be/cognos/help/hwtcrtudfs.pdf)
i think this is a problem for COGNOS, or any reporting tool...?
--
Luuk
I tried it and successed.
Here are examples.
(Please see data type of result column educated_manpower.)
------------------------- Commands Entered -------------------------
CREATE FUNCTION db2admin.SUM( INTEGER )
SOURCE sysibm.SUM( BIGINT )
RETURNS BIGINT
;
--------------------------------------------------------------------
DB20000I The SQL command completed successfully.
------------------------- Commands Entered -------------------------
VALUES current path;
--------------------------------------------------------------------
1
--------------------------------------------------
"SYSIBM","SYSFUN","SYSPROC","SYSIBMADM","DB2ADMIN"
1 record(s) selected.
------------------------- Commands Entered -------------------------
SET PATH = "DB2ADMIN" , "SYSIBM","SYSFUN","SYSPROC","SYSIBMADM";
--------------------------------------------------------------------
DB20000I The SQL command completed successfully.
------------------------- Commands Entered -------------------------
DESCRIBE
SELECT workdept
, SUM( INT(edlevel) ) AS educated_manpower
, SUM(salary + bonus) AS yearly_payment
FROM employee
GROUP BY
workdept
;
--------------------------------------------------------------------
Column Information
Number of columns: 3
SQL type Type length Column name
Name length
-------------------- ----------- ------------------------------
-----------
453 CHARACTER 3
WORKDEPT 8
493 BIGINT 8
EDUCATED_MANPOWER 17
485 DECIMAL 31, 2
YEARLY_PAYMENT 14
------------------------- Commands Entered -------------------------
SELECT workdept
, SUM( INT(edlevel) ) AS educated_manpower
, SUM(salary + bonus) AS yearly_payment
FROM employee
GROUP BY
workdept
;
--------------------------------------------------------------------
WORKDEPT EDUCATED_MANPOWER YEARLY_PAYMENT
-------- -------------------- ---------------------------------
A00 51 271000.00
B01 18 95050.00
C01 54 242370.00
D11 149 516500.00
D21 93 323820.00
E01 16 80975.00
E11 71 237090.00
E21 60 217210.00
8 record(s) selected.
/**************************************************
********** Return CURRENT PATH ********************
**************************************************/
------------------------- Commands Entered -------------------------
SET PATH = "SYSIBM","SYSFUN","SYSPROC","SYSIBMADM","DB2ADMIN";
--------------------------------------------------------------------
DB20000I The SQL command completed successfully.
------------------------- Commands Entered -------------------------
DESCRIBE
SELECT workdept
, SUM( INT(edlevel) ) AS educated_manpower
, SUM(salary + bonus) AS yearly_payment
FROM employee
GROUP BY
workdept
;
--------------------------------------------------------------------
Column Information
Number of columns: 3
SQL type Type length Column name
Name length
-------------------- ----------- ------------------------------
-----------
453 CHARACTER 3
WORKDEPT 8
497 INTEGER 4
EDUCATED_MANPOWER 17
485 DECIMAL 31, 2
YEARLY_PAYMENT 14
I don't understand how a custom function is going to be used by a
point-and-click report generator.
I don't understand how a point-and-click report writer (which only
understands standard SQL functions) is going to generate the above
expression. Maybe I am missing something.
BTW, you don't need to cast a SUM as INT, since that is the default of doing
a SUM, even if the base column is a SMALLINT. The problem is that a SUM of
INT also returns INT (instead of BIGINT).
If current_path was SYSIBM","SYSFUN","SYSPROC","SYSIBMADM","DB2ADMIN",
sysibm.SUM was used for SUM( INT(edlevel) ) in same query.
So, it is possible to let DB2 use db2admin.SUM instead of sysibm.SUM
without modifying queries(wheather the queries were generated by tools
or constructed by users/programmers/administrators), by updating
current_path.
>
> BTW, you don't need to cast a SUM as INT, since that is the default of doing
> a SUM, even if the base column is a SMALLINT. The problem is that a SUM of
> INT also returns INT (instead of BIGINT).
I used INT(edlevel), because I couldn't find INT column in DB2
supplied SAMPLE tables.
"Tonkuma" explained that in another thread....
(if you dont see it, look here: http://tinyurl.com/2w54s7x)
Basically you define a new function for the user of this report-regenerator,
give it the name SUM
and make it return a BIGINT (which contains, of course, the SUM)
--
Luuk
I did the following:
1. created the new function as DB2INST1.SUM that converts SUM to BIGINT
2. db2 SET PATH = "DB2INST1","SYSIBM","SYSFUN","SYSPROC","SYSIBMADM"
3. db2 select sum(edlevel) from emp -- this returns an INT
4. db2 select db2inst1.sum(edlevel) from emp -- this returns a BIGINT
I need number 3 above to return a BIGINT. I cannot have the query tool
customize the SQL in any way.
I guessed that data type of edlevel in emp is SMALLINT.
Another my guess was that you created the new function as
DB2INST1.SUM( INT ).
If so, best fit for sum(edlevel) in number 3 above is sysibm.sum.
Because sysibm.sum accept SMALLINT directly, while db2inst1.sum need
to convert SMALLINT to INT.
If all my guess looks appropriate, you can confirm my guess by using
sum( INT(edlevel) ) in number 3.
Or change data type of edlevel in emp to INT.
Yes it is SMALLINT. So what? The IBM supplied SUM function on a SMALLINT
returns a INT. The SUM funciton on an INT returns an INT, and a SUM function
on a BIGINT returns a BIGINT. That's they way DB2 LUW 9.7 FP2 works.
> Another my guess was that you created the new function
> as DB2INST1.SUM( INT ).
> If so, best fit for sum(edlevel) in number 3 above is sysibm.sum.
> Because sysibm.sum accept SMALLINT directly, while db2inst1.sum need
> to convert SMALLINT to INT.
>
> If all my guess looks appropriate, you can confirm my guess by using
> sum( INT(edlevel) ) in number 3.
> Or change data type of edlevel in emp to INT.
I have no idea what you are talking about. The problem is that, even though
I created a UDF called DB2INST1.SUM that does always return a BIGINT as I
want, it will not get invoked by using the SUM(COL-NAME), unless I specify
DB2INST1.SUM(COL_NAME). But I can't do that since the select statemnet is
created by the query tool and is not hand-coded.
Such a change would break a large number of existing applications.
So your request will yield a near "over my dead body" answer ;-)
The behavior of SUM is well documented and when a tool is summing INTs
and there is a fear that may overflow then it shall cast to a higher
type. And that may not even be BIGINT, but DECFLOAT(34).
To fix an existing behavior the proposal to overload is a good one.
Not that in DB2 9.7 FP3 there is also a "connect procedure" which allows
you to force e.g. a SET PATH statement onto any session.
Cheers
Serge
--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab
How about a environment variable to override the current SUM function that
would casue the SUM to return a BIGINT. This would be useful for Data
Warehouse apps where query tools generate the SQL and we cannot cast the
columns becasue the tool generates the SQL on the fly for ad-hoc users.
I tried to create my own SUM funciton (DB2INST1.SUM) that works fine, but
cannot override the search order with the PATH statement that would cause it
to be invoked with just referencing SUM (as I mentioned in my previous
post). Am I missing something?
ok, i finnally tries it out meself... ;-)
i'm using database "TEST"
i did create 2 functions:
CONNECT TO TEST;
CREATE FUNCTION luuk.SUM( SMALLINT )
SOURCE sysibm.SUM( BIGINT )
RETURNS BIGINT
;
CONNECT TO TEST;
CREATE FUNCTION luuk.SUM( INTEGER )
SOURCE sysibm.SUM( BIGINT )
RETURNS BIGINT
;
DESCRIBE SELECT * FROM TEST
returns:
SQL type;Type length; Column name; Name length
497 INTEGER; 4; TESTINT; 7
501 SMALLINT; 2; TESTSMALL; 9
after:
set path = "LUUK","SYSIBM","SYSFUN","SYSPROC","SYSIBMADM"
this:
DESCRIBE SELECT SUM(TESTINT) AS sum FROM TEST
returned:
SQL type; Type length; Column name; Name length
493 BIGINT; 8; SUM; 3
this:
DESCRIBE SELECT SUM(TESTSMALL) AS sum FROM TEST
returned:
SQL type; Type length; Column name; Name length
493 BIGINT; 8; SUM; 3
My conclusion is that this works,
The only thing you have to find out
(and my DB2 knowlede is too limited to know)
it this default PATH can be changed.....
--
Luuk
Yes, we already know that creating and executing the functions work if the
schema name of the new function is specified. That was absolutely clear from
the previous posts if you read them.
The problem is in getting the newly defined function called <schema>.SUM to
be invoked when SUM is specified in the SQL. To say that this is "the only
thing ...." is a gross understatement.
sorry, it was not that clear to /me
cause will be the combination of my English, and my knowledge of DB2
> The problem is in getting the newly defined function called<schema>.SUM to
> be invoked when SUM is specified in the SQL. To say that this is "the only
> thing ...." is a gross understatement.
>
>
Do you expect any other problems?
I do mean, when only the PATH of this 1 user changes, all other users of
the same database should not notice anything.
--
Luuk
This is what Martin posted:
I did the following:
1. created the new function as DB2INST1.SUM that converts SUM to BIGINT
2. db2 SET PATH = "DB2INST1","SYSIBM","SYSFUN","SYSPROC","SYSIBMADM"
3. db2 select sum(edlevel) from emp -- this returns an INT
4. db2 select db2inst1.sum(edlevel) from emp -- this returns a BIGINT
Notice that even after setting the path in step 2, that step 3 does not work
as desired (returns INT).
Step 4 works (returns BIGINT) when explicitly naming the new UDF (including
schema name).
You can't say "SET PATH not worked as expected" from Martin's test
steps.
(Matching data type is precedent than schema selection using
current_path.)
DB2 will choose db2inst1.SUM for sum( INT ).
You can test this by changing data type of edlevel to INT or by using
INT column(or expression) for sum(expression) in your query.
If you couldn't understand my explanations(including my previous
post),
you should read carefully "Function resolution", "Determining the set
of candidate functions" and "Determining the best fit" in Chapter 2.
Language elements in the manual "DB2 SQL Reference, Volume 1".
lucky /me.... my first reading make me think this,
thats why i started testing tooo.
my tests shows changing PATH works....
>
> DB2 will choose db2inst1.SUM for sum( INT ).
>
> You can test this by changing data type of edlevel to INT or by using
> INT column(or expression) for sum(expression) in your query.
>
> If you couldn't understand my explanations(including my previous
> post),
> you should read carefully "Function resolution", "Determining the set
> of candidate functions" and "Determining the best fit" in Chapter 2.
> Language elements in the manual "DB2 SQL Reference, Volume 1".
>
>
nah, too much IBM documents..... ;-)
--
Luuk
Is this a client feature or server feature? Do you know where it is
documented?
OK, my apologies. It does work when doing a SUM on INT column with my own
UDF and the set path does work. I didn't understand (or I didn't believe)
that there was a distiction between SUMing a SMALLINT and INT, but there
obviously is.
Now I just need to to figure out how to automatically set the path for
ad-hoc users (they will not do it manually). Looks like 9.7 PF 3 may do
that, but waiting on FP3 for the Linux 64-bit version for AMD/Intel to be
released.
This is a server feature, implemented as a db configuration parameter.
See: http://bit.ly/c6T8ay
--
Helmut K. C. Tessarek
DB2 Performance and Development
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
db2 => CONNECT RESET;
DB20000I The SQL command completed successfully.
db2 => CONNECT TO TEST;
Database Connection Information
Database server = DB2/NT 9.7.3
SQL authorization ID = SERGE
Local database alias = TEST
db2 => VALUES CURRENT PATH;
1
---------------------------------------------------------
"MYSTUFF","SYSIBM","SYSFUN","SYSPROC","SYSIBMADM","SERGE"
1 record(s) selected.
db2 =>