Script to find any open old transactions

frit...@gmail.com

unread,

Jun 8, 2015, 12:53:10 PM6/8/15

to

Hello all.

I recently had a issue where one of my databases had a transaction log full state and after some poking around, I discovered that it was because there was some really old uncommitted transaction out there hogging a bunch of log. So I was able to terminate the connection and then everything recovered nicely.

So, I decided to incorporate a new bit into my monitoring scripts to check for the oldest transaction, and if the connection was more than a day old, to e-mail myself an alert. Here's what the bit of script looks like:

AP=`db2 get snapshot for db on $DBNAME | grep -i oldest | sed "s/.*= //"`
if [ -n "$AP" ] ; then
ID=`db2 get snapshot for application agentid $AP | grep -c "idle.*day"`
if [ $ID -eq 1 ] ; then
echo "Idle application $AP has open transaction exceeding 24 hours" | mail -s "Idle application
s on $DBNAME" -r "$EMFROM" "$EMAILTO"
fi
fi

The bit of script as I see it should check to see if there is an oldest transaction, then get the snapshot for the application having the oldest transaction and if it as been idle for more than a day, it should e-mail the alert.

The bit of script generally works just fine except for one thing. Every once in a while, I get a single alert. Note that the script runs every 5 minutes. So, I go to the db in question and look at the snapshot for the offending application. I can see that it has indeed been idle for a very long time. But the question is, how could that connection have been idle for such a long time and have just had the oldest transaction -- when the same connection did not have the oldest transaction 5 minutes ago? I'm not sure if I'm explaining the situation well, so let me try to give an example:

1) At noon, I check the database for oldest tranaction using the bit of script above. There is no oldest transaction given in the snapshot.
2) At 12:05, I check again. I get the alert for a certain connection say agentid 100.
3) I go in and do a snapshot for application agentid 100. It says it's been idle for 10 days.

So, the question is if it's been idle for 10 days, how could that same application be holding the oldest transaction when there was no oldest transaction at all as of 5 minutes ago? The weird thing about it is that when this happens, it is always for one of the db2 system connections... "db2fw##" (where the ## represents some number). This seems kinda strange, I was just wondering if anyone else had run into this.

Oh, also, this runs on AIX, DB2 version 10.1.2.

Thanks for the help.
Greig Wise

Lennart Jonsson

unread,

Jun 8, 2015, 3:08:38 PM6/8/15

to

Not sure why your script fails from time to time, but I think there are
better ways to detect "old" transactions. I use the following to monitor
old transactions and also get some information about them:

select
x.agent_id,
substr(z.corr_TOKEN,1,12) as ip,
cast(y.stmt_text as varchar(800)) as stmt,
x.locks_held,
x.APPL_IDLE_TIME,
x.UOW_START_TIME
from sysibmadm.SNAPAPPL x
join sysibmadm.snapstmt y
on x.agent_id = y.agent_id
join sysibmadm.SNAPAPPL_INFO z
on x.agent_id = z.agent_id
where UOW_STOP_TIME is null
and TIMESTAMPDIFF(4,CHAR(current_timestamp - UOW_START_TIME)) > 2
order by UOW_START_TIME

This will detect transactions older than 2 minutes (4 is minutes in
timestampdiff, exchange for other value as needed)

If you are only interested in whether there exists transactions or not
you could do something like:

db2 connect to $DBNAME
db2 "select 1 from sysibmadm.SNAPAPPL where UOW_STOP_TIME is null and
TIMESTAMPDIFF(8,CHAR(current_timestamp - UOW_START_TIME)) > 24" > /dev/null
if [ $? -ne 1 ]; then
# rows found, i.e. transactions exist

echo "Idle application $AP has open transaction exceeding 24 hours"
| mail -s "Idle application
s on $DBNAME" -r "$EMFROM" "$EMAILTO"
fi

db2 connect reset

$? from db2 is bit encoded as:

0 success
1 no rows found
2 warning such as creating a primary key reused an existing index
4 db2 error
8 os error

So you might want to handle additional return values such as 3.
Everything >= 4 means that something is really wrong so you might want
to send an error alert for those cases as well

You might also want to have a look at the newer GET_MON_... functions

Just some thoughts that might give you some ideas.

Cheers
/Lennart

Lennart Jonsson

unread,

Jun 8, 2015, 3:15:22 PM6/8/15

to

On 06/08/2015 09:08 PM, Lennart Jonsson wrote:
[...]

>
> db2 connect to $DBNAME
> db2 "select 1 from sysibmadm.SNAPAPPL where UOW_STOP_TIME is null and
> TIMESTAMPDIFF(8,CHAR(current_timestamp - UOW_START_TIME)) > 24" > /dev/null
> if [ $? -ne 1 ]; then
> # rows found, i.e. transactions exist
> echo "Idle application $AP has open transaction exceeding 24 hours"
> | mail -s "Idle application
> s on $DBNAME" -r "$EMFROM" "$EMAILTO"
> fi
>

Alternative:

AP=$(db2 -x "select agent_id from sysibmadm.SNAPAPPL where UOW_STOP_TIME
is null and TIMESTAMPDIFF(2,current_timestamp - UOW_START_TIME) > 1")

[...]