diag log is:
2010-06-04-16.21.19.718000+120 I415102H474 LEVEL: Info
PID : 3892 TID : 1428 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SYSTG001
APPHDL : 0-737 APPID: *LOCAL.DB2.100604142108
AUTHID : MAJITEL
EDUID : 1428 EDUNAME: db2agent (SYSTG001)
FUNCTION: DB2 UDB, database application extension for utili,
transport, probe:75
88
MESSAGE : Transport:Begin Extract DDL phase
2010-06-04-16.21.31.250000+120 E415578H636 LEVEL: Warning
PID : 3892 TID : 1428 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SYSTG001
APPHDL : 0-737 APPID: *LOCAL.DB2.100604142108
AUTHID : MAJITEL
EDUID : 1428 EDUNAME: db2agent (SYSTG001)
FUNCTION: DB2 UDB, data management, sqldEndNoLogList, probe:1
MESSAGE : ADM5530W The COMMIT processing of table
"SYSTOOLS.DB2LOOK_INFO" that
used NOT LOGGED INITIALLY has been initiated. It is
recommended that
you take a backup of this table's table space(s).
2010-06-04-16.21.33.468000+120 E416216H736 LEVEL: Warning
PID : 3892 TID : 1428 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SYSTG001
APPHDL : 0-737 APPID: *LOCAL.DB2.100604142108
AUTHID : MAJITEL
EDUID : 1428 EDUNAME: db2agent (SYSTG001)
FUNCTION: DB2 UDB, catcache support, sqlrlc_check_available_memory,
probe:100
MESSAGE : ADM4000W A catalog cache overflow condition has occurred.
There is
no error but this indicates that the catalog cache has
exceeded the
configured maximum size. If this condition persists, you
may want to
adjust the CATALOGCACHE_SZ DB configuration parameter.
2010-06-04-16.21.33.796000+120 I416954H529 LEVEL: Error
PID : 3892 TID : 1428 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SYSTG001
APPHDL : 0-737 APPID: *LOCAL.DB2.100604142108
AUTHID : MAJITEL
EDUID : 1428 EDUNAME: db2agent (SYSTG001)
FUNCTION: DB2 UDB, database application extension for utili,
transport_extractDD
LtoFile, probe:6083
MESSAGE : Transport:Invalid command type
DATA #1 : unsigned integer, 4 bytes
0
2010-06-04-16.21.33.812000+120 I417485H472 LEVEL: Info
PID : 3892 TID : 1428 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SYSTG001
APPHDL : 0-737 APPID: *LOCAL.DB2.100604142108
AUTHID : MAJITEL
EDUID : 1428 EDUNAME: db2agent (SYSTG001)
FUNCTION: DB2 UDB, database application extension for utili,
transport, probe:76
95
MESSAGE : Transport:End Extract DDL phase
2010-06-04-16.21.33.812000+120 I417959H594 LEVEL: Error
PID : 3892 TID : 1428 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SYSTG001
APPHDL : 0-737 APPID: *LOCAL.DB2.100604142108
AUTHID : MAJITEL
EDUID : 1428 EDUNAME: db2agent (SYSTG001)
FUNCTION: DB2 UDB, database utilities,
sqludExtractAndSaveDDLForTransport, probe
:2743
MESSAGE : SQL10007N Message "2146303891" could not be retrieved.
Reason code:
"4".
DATA #1 : String, 43 bytes
Transport: Error calling TRANSPORT(EXTRACT)
2010-06-04-16.21.33.812000+120 E418555H850 LEVEL: Severe
PID : 1832 TID : 1448 PROC : db2bp.exe
INSTANCE: DB2 NODE : 000
APPID : *LOCAL.DB2.100604142108
EDUID : 1448
FUNCTION: DB2 UDB, database utilities,
sqludTransportExtractDDLStagingDB, probe:
1183
MESSAGE : ZRC=0xFFFFF5E2=-2590
DATA #1 : String, 44 bytes
Error during transport DDL extraction phase.
DATA #2 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -2590 sqlerrml: 4
sqlerrmc: 16
sqlerrp : SQL09072
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
If this does not help, please set DIAGLEVEL to 4 and run 'db2diag -A' to clear
the log.
Run the transport operation again, zip the db2diag.log and attach it to your
next post. It might be that you will have to open a PMR. But let's go there
after I checked your next db2diag.log.
On 4.6.2010 10:47, hsn_ wrote:
> 2010-06-04-16.21.33.468000+120 E416216H736 LEVEL: Warning
> PID : 3892 TID : 1428 PROC : db2syscs.exe
> INSTANCE: DB2 NODE : 000 DB : SYSTG001
> APPHDL : 0-737 APPID: *LOCAL.DB2.100604142108
> AUTHID : MAJITEL
> EDUID : 1428 EDUNAME: db2agent (SYSTG001)
> FUNCTION: DB2 UDB, catcache support, sqlrlc_check_available_memory,
> probe:100
> MESSAGE : ADM4000W A catalog cache overflow condition has occurred.
> There is
> no error but this indicates that the catalog cache has
> exceeded the
> configured maximum size. If this condition persists, you
> may want to
>
> adjust the CATALOGCACHE_SZ DB configuration parameter.
--
Helmut K. C. Tessarek
DB2 Performance and Development
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
After setting DIAGLEVEL to 4 and deleting old diag log files, db2 does
not start anymore. I am attaching trace of db2start. It looks like
there is lock conflict on db2 diag log file. But funny thing is that
db2start is conflict with itself (pid is same, maybe thread is
different), its exclusively locked by same process which is trying to
grab lock. OS: Windows XP
> After setting DIAGLEVEL to 4 and deleting old diag log files, db2 does
> not start anymore. I am attaching trace of db2start. It looks like
> there is lock conflict on db2 diag log file. But funny thing is that
> db2start is conflict with itself (pid is same, maybe thread is
> different), its exclusively locked by same process which is trying to
> grab lock. OS: Windows XP
I was not able to reproduce your problem. I set diaglevel to 4 and did a
'db2diag -A'. db2stop and db2start worked without a problem.
I also deleted the db2diag.log. db2stop and db2start worked as well.
> http://rapidshare.com/files/396595597/db2start.zip
what kind of file is this? an export from the event log? a trace is done via
the db2trc program.
Anyway, I think you should open a PMR. Your problem does not make any sense to
me and I'm not a Windows person.
I am looking into our bug database and one customer had this problem
too, diaglevel 4 on windows and db2 does not start anymore.
Its timing dependant race condition - most likely 2 threads in
db2starts fights against lock on diag file. It appears on diaglevel 4
more because its needed to grab diag lock sooner due to more extensive
log output.
>
> Anyway, I think you should open a PMR. Your problem does not make any sense to
> me and I'm not a Windows person.
This error should be in unix version too, unless in unix locks are
process-wide instead of file descriptor or thread wide.
Ok, looked a little bit more detailed than the Event Log anyway. :-)
Unfortunately I can't see what is going on in the DB2 code from this trace.
I would need a db2trc dump, formatted as format and flow trace.
> I am looking into our bug database and one customer had this problem
> too, diaglevel 4 on windows and db2 does not start anymore.
> Its timing dependant race condition - most likely 2 threads in
> db2starts fights against lock on diag file. It appears on diaglevel 4
> more because its needed to grab diag lock sooner due to more extensive
> log output.
According to your procmon output, the locking issue has something to do with
rotating the logs. It does not seem to be a problem with db2diag.log itself.
If you change DIAGLEVEL to 3 again, db2start works? If you set it then to 4,
db2start fails again? How does it fail? Does it hang? If yes, a stack trace
would be nice. db2pd -stack all
This is a problem that should be handled via a PMR. If you have a PMR, please
let me know the number.
it contrains db2trace output and procmon syscall trace dump.
You need procmon from http://technet.microsoft.com/cs-cz/sysinternals/bb896645%28en-us%29.aspx
to view it.
db2pd -stack -all hangs producing no output.
> If you change DIAGLEVEL to 3 again, db2start works?
yes
> If you set it then to 4, db2start fails again?
no
> How does it fail? Does it hang?
yes
i think i found way to re-create this issue. set diaglevel to 4 then
db2stop, then delete .db2diag.rotate.lck and db2diag.*.log
db2start hangs. let me know if it worked for you too.
i am not aware that we have opened some pmr for it, we adopted dont
run db2 with diaglevel 4 on windows as part of our best practices
Thank you for the data.
> db2pd -stack -all hangs producing no output.
Hmm, this should not happen either. Very odd.
>> If you change DIAGLEVEL to 3 again, db2start works?
> yes
At least it is not a permanent error. :-)
>> If you set it then to 4, db2start fails again?
> no
Now, I'm confused. When does db2start fail? Only, if you set DIAGLEVEL to 4
and delete the .db2diag.rotate.lck and db2diag.*.log files?
It does not fail, if you don't delete the files?
So you can actually set it to 4 and start db2 somehow?
> i think i found way to re-create this issue. set diaglevel to 4 then
> db2stop, then delete .db2diag.rotate.lck and db2diag.*.log
> db2start hangs. let me know if it worked for you too.
I tried it, but I'm still not able to reproduce it. But I tried it on Win32.
What is your 'db2level' output?
Can you please also post the output of 'db2 get dbm cfg'?
As soon as I'm able to reproduce the problem, I'll send the data and the
problem description to the owner of this component.
> i am not aware that we have opened some pmr for it, we adopted dont
> run db2 with diaglevel 4 on windows as part of our best practices
Usually DIAGLEVEL 3 is more than enough, but for deeper problem analysis it
makes sense to change it to 4.
C:\IBM\SQLLIB\BIN>db2pd -stack all
Unable to attach to database manager. Please ensure db2start has been
run.
but db2start hangs before instance is fully started so db2pd probably
wait until db2 finishes its startup sequence. If you want i can
procmon db2pd to see what it is waiting for.
> Now, I'm confused. When does db2start fail? Only, if you set DIAGLEVEL to 4
> and delete the .db2diag.rotate.lck and db2diag.*.log files?
Yes.
Now i tested it again and it fails too if db2diag.0.log exists but it
is zero byte long. so no need to delete it and rotatelock.
> It does not fail, if you don't delete the files?
Yes.
> So you can actually set it to 4 and start db2 somehow?
If db2diag.0.log is longer than 0 bytes than it starts successfully.
otherwise you need to kill hanging db2start and then do
db2 update dbm cfg using diaglevel 3
which will update diaglevel back to 3 but it takes very long time -
about 15 minutes to finish. Then you can start db2 without problem
again.
> I tried it, but I'm still not able to reproduce it. But I tried it on Win32.
my os is windows xp 32 bit
> What is your 'db2level' output?
my db2 is 9.7.2 but that error is in 9.5 too. our customer report is
from person running 9.5.3.
> Can you please also post the output of 'db2 get dbm cfg'?
C:\IBM\SQLLIB\BIN>db2 get dbm cfg
Konfigurace správce databází
Typ uzlu = Databázový server s lokálními a vzdálenými klienty
Verze konfigurace správce databází = 0x0d00
Max. celkový počet otevřených souborů (MAXTOTFILOP) = 16000
Rychlost CPU (ms/instrukce) (CPUSPEED) =
2,519169e-007
Max. počet současně aktivních databází (NUMDB) = 8
Podpora federovaného databázového systému (FEDERATED) = NO
Název transakčního monitoru (TP_MON_NAME) =
Výchozí nákladový účet (DFT_ACCOUNT_STR) =
Cesta pro instalaci sady JDK (JDK_PATH) = C:\IBM
\SQLLIB\java\jd
k
Úroveň zachycení diagnostických chyb (DIAGLEVEL) = 3
Úroveň upozornění (NOTIFYLEVEL) = 3
Cesta adresáře diagnostických údajů (DIAGPATH) =
Velikost rotujícího žurnálu db2diag a žurnálu upozornění (MB)
(DIAGSIZE) = 20
Výchozí přepínače monitoru databází
Fond vyrovnávacích pamětí (DFT_MON_BUFPOOL) = OFF
Zámky (DFT_MON_LOCK) = ON
Řazení (DFT_MON_SORT) = OFF
Příkazy (DFT_MON_STMT) = OFF
Tabulky (DFT_MON_TABLE) = OFF
Časové značky (DFT_MON_TIMESTAMP) = ON
Transakce (DFT_MON_UOW) = OFF
Sledování narušení instance a databází (HEALTH_MON) = ON
Název skupiny SYSADM (SYSADM_GROUP) =
Název skupiny SYSCTRL (SYSCTRL_GROUP) =
Název skupiny SYSMAINT (SYSMAINT_GROUP) =
Název skupiny SYSMON (SYSMON_GROUP) =
Modul plug-in pro jméno uživatele a heslo klienta (CLNT_PW_PLUGIN) =
Modul plug-in zabezpečení Kerberos (CLNT_KRB_PLUGIN) = IBMkrb5
Modul plug-in skupiny (GROUP_PLUGIN) =
Modul plug-in GSS pro lokální autorizaci (LOCAL_GSSPLUGIN) =
Režim modulu plug-in serveru (SRV_PLUGIN_MODE) = UNFENCED
Seznam modulů plug-in GSS serveru(SRVCON_GSSPLUGIN_LIST)=
Modul plug-in pro jméno uživatele a heslo serveru (SRVCON_PW_PLUGIN)
=
Ověřování připojení serveru (SRVCON_AUTH) =
NOT_SPECIFIED
Správce klastru (CLUSTER_MGR) =
Ověřování správce databází (AUTHENTICATION) = SERVER
Alternativní ověřování (ALTERNATE_AUTH_ENC) =
NOT_SPECIFIED
Katalogizace povolena bez oprávnění (CATALOG_NOAUTH) = NO
Ověření všech klientů (TRUST_ALLCLNTS) = YES
Způsob ověření klientů (TRUST_CLNTAUTH) = CLIENT
Vynechání federovaného ověřování (FED_NOAUTH) = NO
Výchozí cesta databáze (DFTDBPATH) = C:
Velikost haldy monitoru databází (4kB) (MON_HEAP_SZ) =
AUTOMATIC(66)
Velikost haldy prostředí JVM (4kB) (JAVA_HEAP_SZ) = 2048
Velikost vyrovnávací paměti dozoru (4kB) AUDIT_BUF_SZ) = 0
Velikost sdílené paměti instance (4kB) (INSTANCE_MEMORY) =
AUTOMATIC(399591)
Výchozí velikost záloh. vyr.paměti (4kB) (BACKBUFSZ) = 1024
Výchozí velikost obnov. vyr.paměti (4kB) (RESTBUFSZ) = 1024
Velikost zásobníku agentů (AGENT_STACK_SZ) = 128
Minimum potvrzené soukromé paměti (4kB) (MIN_PRIV_MEM) = 32
Práh soukromé paměti (4kB) (PRIV_MEM_THRESH) = 20000
Práh haldy pro řazení (4kB) (SHEAPTHRES) = 0
Podpora mezipaměti adresářů (DIR_CACHE) = YES
Velikost haldy pro vrstvu podpory apl.(4kB) (ASLHEAPSZ) = 15
Max. velikost bloku I/O klienta (bajty) (RQRIOBLK) = 32767
Velikost haldy pro dotazy (4kB) (QUERY_HEAP_SZ) = 1000
Vliv obslužných programů na výkon (UTIL_IMPACT_LIM) = 10
Priorita agentů (AGENTPRI) = SYSTEM
Velikost fondu agentů (NUM_POOLAGENTS) =
AUTOMATIC(100)
Výchozí počet agentů ve fondu (NUM_INITAGENTS) = 0
Max. počet agentů pro koordinaci (MAX_COORDAGENTS) =
AUTOMATIC(200)
Max. počet klientských připojení (MAX_CONNECTIONS) =
AUTOMATIC(MAX_COORDAG
ENTS)
Udržování chráněného procesu (KEEPFENCED) = YES
Počet chráněných procesů ve fondu (FENCED_POOL) =
AUTOMATIC(MAX_COORDAG
ENTS)
Výchozí počet chráněných procesů (NUM_INITFENCED) = 0
Doba pro znovuvytvoření indexu (INDEXREC) = RESTART
Název databáze správce transakcí (TM_DATABASE) = 1ST_CONN
Interval pro resynchronizaci (s) (RESYNC_INTERVAL) = 180
Název SPM (SPM_NAME) = RADIM
Velikost žurnálu SPM (SPM_LOG_FILE_SZ) = 256
Omezení počtu agentů SPM (SPM_MAX_RESYNC) = 20
Cesta k žurnálu SPM (SPM_LOG_PATH) =
Název pracovní stanice NetBIOS (NNAME) =
Název služby TCP/IP (SVCENAME) = 50000
Režim zjišťování (DISCOVER) = SEARCH
Instance serveru zjišťování (DISCOVER_INST) = ENABLE
Soubor databáze klíčů serveru SSL (SSL_SVR_KEYDB) =
Soubor pro dočasné ukládání serveru SSL (SSL_SVR_STASH) =
Popis certifikátu serveru SSL (SSL_SVR_LABEL) =
Název služby SSL (SSL_SVCENAME) =
Specifikace šifrování protokolu SSL (SSL_CIPHERSPECS) =
Verze protokolu SSL (SSL_VERSIONS) =
Soubor databáze klíčů klienta SSL (SSL_CLNT_KEYDB) =
Soubor pro dočasné ukládání klienta SSL (SSL_CLNT_STASH) =
Max. stupeň paralelizmu pro dotazy (MAX_QUERYDEGREE) = 1
Povolení paralelizmu v rámci oblasti (INTRA_PARALLEL) = NO
Poč. vnitř. kom. vyrov. pamětí (4kB) (FCM_NUM_BUFFERS) =
AUTOMATIC(1024)
Počet vnitřních komunikačních kanálů (FCM_NUM_CHANNELS) =
AUTOMATIC(512)
Prodleva db2start/db2stop (min) (START_STOP_TIME) = 15
> As soon as I'm able to reproduce the problem, I'll send the data and the
> problem description to the owner of this component.
you can send him this report anyway he might be able to find what is
going on. This depends on logfile size >0. It is not tied to logrotate
function because it hangs with (diagsize 0) too.
Yes, you are right. Totally forgot about that.
> but db2start hangs before instance is fully started so db2pd probably
> wait until db2 finishes its startup sequence. If you want i can
> procmon db2pd to see what it is waiting for.
No, that's ok.
> If db2diag.0.log is longer than 0 bytes than it starts successfully.
At least there is a workaround. You always can add some characters to the
file... :-)
>> I tried it, but I'm still not able to reproduce it. But I tried it on Win32.
> my os is windows xp 32 bit
I also tried it on WinXP 32bit with DB2 9.7.2.
> Úroveň zachycení diagnostických chyb (DIAGLEVEL) = 3
> Úroveň upozornění (NOTIFYLEVEL) = 3
> Cesta adresáře diagnostických údajů (DIAGPATH) =
> Velikost rotujícího žurnálu db2diag a žurnálu upozornění (MB)
> (DIAGSIZE) = 20
Hmm, I am using my own diagpath (d:\db2dump). I also removed my diagpath value
and tried to reproduce the problem again. Without success.
> you can send him this report anyway he might be able to find what is
> going on. This depends on logfile size >0. It is not tied to logrotate
> function because it hangs with (diagsize 0) too.
I can send him this report, but if he is not able to reproduce the problem,
then I doubt that he can do something. I've been trying it now on 3 different
OS with 4 different DB2 versions/releases.
I have not been able to reproduce the problem even once.
queryopen
queryopen
createFile - creates new 0 bytes long file
lockFile offset 0, length 1, exclusive, dontwait. Fails because file
is zero sized and you cant lock 1 byte range on zero sized file.
Procedure for diag file locking needs to be changed. probably best way
will be to use file-wide locks instead of range locks, or query file
size before using range lock.