I'm running RepServer 12.6 warm standby to replicate two databases
between two ASE 12.5.4 (ESD 8) servers.
RepServer is running on the standby side, and I have a 1gb stable
queue. At some point yesterday a transaction caused replication to
stop for one database (Adding a user on the primary side of one of the
two databases (tmg460 in the output below) didn't replicate
properly). This didn't get caught until today (the monitoring scripts
weren't working properly), and this caused the stable queue to fill
completely. Once the stable queue filled, the connection for both of
the two databases, and luckily the monitoring on the second database
caught this failure, thus alerting me to the original failure.
So, I fixed the transaction that caused the first DSI thread to shut
down on tmg460, and resumed that connection. Now I can see the stable
queue gradually going down (it's at around 7% now after about an
hour), and the DSI thread and RepAgent on that database are both
"Active" in admin who.
However, the RepAgent thread on the other database (hgigrp) is "Down".
I've tried doing an sp_stop_rep_agent and sp_start_rep_agent on the
thread and it didn't come back up.
[sa] SYB_ST2SUXFN09.hgigrp.1> sp_stop_rep_agent hgigrp;
The Replication Agent thread for database 'hgigrp' is being stopped.
(return status = 0)
[sa] SYB_ST2SUXFN09.hgigrp.1> sp_stop_rep_agent hgigrp;
Msg 18422, Level 16, State 1
Server 'SYB_ST2SUXFN09', Procedure 'sp_stop_rep_agent', Line 158
The Replication Agent thread for database 'hgigrp' is not currently
running.
(return status = 1)
[sa] SYB_ST2SUXFN09.hgigrp.1> sp_start_rep_agent hgigrp;
Replication Agent thread is started for database 'hgigrp'.
(return status = 0)
If I look in the rep server log now:
I. 2009/06/12 18:00:09. Replication Agent for SYB_ST2SUXFN09.hgigrp
connected in passthru mode.
E. 2009/06/12 18:00:09. ERROR #14067 USER(REP_PH1SUXFN02_ra) - /
execint.c(1810)
Replication Agent for 'SYB_ST2SUXFN09.hgigrp' is in the
process of disconnecting. Possibly wait and try again.
12Jun--Fri
6:00:24pm
This continues to repeat every few minutes, indefinitely.
[sa] REP_PH1SUXFN02.asproot.1> admin who;
Spid Name State
Info
---- ---------- --------------------
----------------------------------------
19 DIST Awaiting Wakeup 105
SYB_ST2SUXFN08.tmg460
24 SQT Awaiting Wakeup 105:1 DIST
SYB_ST2SUXFN08.tmg460
12 SQM Awaiting Message 105:1
SYB_ST2SUXFN08.tmg460
11 SQM Awaiting Message 105:0
SYB_ST2SUXFN08.tmg460
20 DIST Awaiting Wakeup 102
SYB_ST2SUXFN09.hgigrp
23 SQT Awaiting Wakeup 102:1 DIST
SYB_ST2SUXFN09.hgigrp
10 SQM Awaiting Wakeup 102:1
SYB_ST2SUXFN09.hgigrp
9 SQM Awaiting Message 102:0
SYB_ST2SUXFN09.hgigrp
25 DSI EXEC Awaiting Command 101(1)
SYB_PH1SUXFN02.REP_PH1SUXFN02_RSS
13 DSI Awaiting Message 101
SYB_PH1SUXFN02.REP_PH1SUXFN02_RSSD
8 SQM Awaiting Message 101:0
SYB_PH1SUXFN02.REP_PH1SUXFN02_RSSD
27 DSI EXEC Awaiting Command 104(1)
SYB_PH1SUXFN02.hgigrp
14 DSI Awaiting Message 104
SYB_PH1SUXFN02.hgigrp
67 DSI EXEC Active 108(1)
SYB_PH1SUXFN02.tmg460
66 DSI Awaiting Command 108
SYB_PH1SUXFN02.tmg460
28 DSI EXEC Awaiting Command 106(1)
SYB_ST2SUXFN08.tmg460
16 DSI Awaiting Message 106
SYB_ST2SUXFN08.tmg460
68 REP AGENT Awaiting Command
SYB_ST2SUXFN08.tmg460
29 DSI EXEC Awaiting Command 103(1)
SYB_ST2SUXFN09.hgigrp
17 DSI Awaiting Message 103
SYB_ST2SUXFN09.hgigrp
REP AGENT Down
SYB_ST2SUXFN09.hgigrp
18 dSUB
Sleeping
6 dCM Awaiting
Message
7 dAIO Awaiting
Message
21 dREC Sleeping
dREC
22 dSTATS
Sleeping
110 USER Awaiting Command
REP_PH1SUXFN02_ra
111 USER Active
sa
5 dALARM Awaiting
Wakeup
[sa] REP_PH1SUXFN02.asproot.1> admin disk_space;
Partition
Logical Part.Id Total Segs Used
Segs State
--------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
------------------------------- ----------- -----------
----------- -------------------------------
/syb_data1/
stablequeue.dat
stablequeue 101
1000 73 ON-LINE//
Any ideas?
So, I ended up just restarting rep server and everything recovered
fine...but I'd love any ideas as to what might have caused this or
things I can try next time.