Rep Server is stuck

Fredrik Berglin

unread,

Jul 17, 2003, 2:45:54 AM7/17/03

to

Cindy Hodgins wrote:
>
> And one more thing: replication is actually working for two of the
> replicate databases, but not for the others. Now what's that about!

Hi.

I've had problems similiar to this one but unfortunately I don't think there is any one solutions to it. There are too many factors involved. Replication was actually working but working so slow it seemed stuck.

I would take a good long look at what the RepServer is trying to accomplish in the replicates by sp_showplan. Perhaps it's getting stuck due to bad indexes, which cuases a table scan, which causes it to block other processes.

Have you run update stats lately?

Are the replicates identical in all respects? (Hardware/OS/ASE ... etc)

Are all replicated databases Warm Standby setups or are you using definitions/subscriptions?

Regards.
//Fredrik Berglin

--
------------------------------------------------------
Read the source, Luke.
------------------------------------------------------
Fredrik Berglin fredrik...@commentor.nospam.se
Commentor AB Tel: 031 - 701 19 00
Pusterviksgatan 3-9 Cell: 0707 - 48 64 08
413 01 Göteborg Fax: 031 - 711 51 25
------------------------------------------------------

Michael Peppler

unread,

Jul 17, 2003, 5:50:07 AM7/17/03

to

On Thu, 17 Jul 2003 08:45:54 +0200, Fredrik Berglin wrote:

> Cindy Hodgins wrote:
>> [quoted text muted]

>
> Hi.
>
> I've had problems similiar to this one but unfortunately I don't think
> there is any one solutions to it. There are too many factors involved.
> Replication was actually working but working so slow it seemed stuck.
>
> I would take a good long look at what the RepServer is trying to
> accomplish in the replicates by sp_showplan. Perhaps it's getting stuck
> due to bad indexes, which cuases a table scan, which causes it to block
> other processes.

Yep - I've had the same thing, a delete on an unindexed table (yes, stupid
of me) that I ran in a loop on the primary, and which just filled up the
stable queue because the standby needed to do a table scan for each row
that was being deleted.

Michael
--
Michael Peppler Data Migrations, Inc.
mpep...@peppler.org http://www.mbay.net/~mpeppler
Sybase T-SQL/OpenClient/OpenServer/C/Perl developer available for short or
long term contract positions - http://www.mbay.net/~mpeppler/resume.html

Jason

unread,

Jul 17, 2003, 6:33:12 AM7/17/03

to

Might be a good idea to do a sp_showplan <spid>,null,null,null on the
replicate ASE server to see if the maint user is doing table scans.

Peter Simandl

unread,

Jul 16, 2003, 11:56:12 PM7/16/03

to

Hello,
check in ASE what is blocking your rep server spid. It my be even some
zombie process. BTW I suppose you have checked admin who_is_down just to
be sure.
HTH,
Peter

Cindy Hodgins wrote:
> I have a Rep Server 12.0 that replicates one small database to multiple
> servers, all ASE 12.0. No problems till today. Replication appears to
> be stuck for all but two replicate databases, and replication is flowing
> for those two. admin who,sqt returns the same numbers so I know it's
> not working. At least two of the replicate servers (of course it's the
> two most important servers, one being production, the other our major
> test system) are showing blk_spid with the repsrvr id. It's causing
> problems on that test system, and I'm already worried that I'll find out
> it's causing problems on production. I can find no errors in any sybase
> logs, and bouncing the entire repserver machine didn't help. I even
> tried bouncing our major test system machine, to no avail. Any ideas
> are GREATLY appreciated. Thanks.
>
> Cindy
>

Cindy Hodgins

unread,

Jul 17, 2003, 3:54:40 PM7/17/03

to

Jason wrote:
> Michael Peppler wrote:
>
>> On Thu, 17 Jul 2003 08:45:54 +0200, Fredrik Berglin wrote:
>>
>>
>>> Cindy Hodgins wrote:
>>>
>>>> [quoted text muted]
>>>
>>>
>>> Hi.
>>>
>>> I've had problems similiar to this one but unfortunately I don't think
>>> there is any one solutions to it. There are too many factors involved.
>>> Replication was actually working but working so slow it seemed stuck.
>>>
>>> I would take a good long look at what the RepServer is trying to
>>> accomplish in the replicates by sp_showplan. Perhaps it's getting stuck
>>> due to bad indexes, which cuases a table scan, which causes it to block
>>> other processes.
>>
>>

Hadn't thought about that. I'll definitely check.

>>
>> Yep - I've had the same thing, a delete on an unindexed table (yes,
>> stupid
>> of me) that I ran in a loop on the primary, and which just filled up the
>> stable queue because the standby needed to do a table scan for each row
>> that was being deleted.
>>
>> Michael
>> --
>> Michael Peppler Data Migrations, Inc.
>> mpep...@peppler.org http://www.mbay.net/~mpeppler
>> Sybase T-SQL/OpenClient/OpenServer/C/Perl developer available for
>> short or long term contract positions -
>> http://www.mbay.net/~mpeppler/resume.html
>>
> Might be a good idea to do a sp_showplan <spid>,null,null,null on the
> replicate ASE server to see if the maint user is doing table scans.
>

Will do. And it somehow got through its bottleneck last night after I
posted this. And the problem hasn't reoccurred, though I'm of course
concerned it will.

Also, ASE and the OS are the same for primary and all replicates, but
the hardware is vastly different from one to another.

Cindy

Cindy Hodgins

unread,

Jul 16, 2003, 11:29:41 PM7/16/03

to

Cindy Hodgins

unread,

Jul 16, 2003, 11:29:09 PM7/16/03

to

Cindy Hodgins

unread,

Jul 17, 2003, 12:11:08 AM7/17/03

to

I'm not sure where you mean. On the replicate ASE? Nothing's blocking
rep server spid - the rep server itself is doing the blocking to other
spids.

admin who_is_down has shown the rep agent to be down a couple of times
during this mess, but I don't know why. I didn't stop the rep agent,
and there are no messages in the logs. But it will start back up with a
sp_start_rep_agent.

admin who,sqt also returns a 1 for SQM Blocked for the primary database.
Trying to do a suspend connection, for example, just hangs. But if I
bounce replication server, then when it comes back up the suspend
connection will have worked. And when it first comes up, admin who,sqt
returns all zeros, but after you do it about 30 seconds, it shows no
progress, and shows st:C on all the replicates.

Thanks.

Cindy

Cindy Hodgins

unread,

Jul 17, 2003, 12:13:48 AM7/17/03

to

And one more thing: replication is actually working for two of the
replicate databases, but not for the others. Now what's that about!

Cindy

A cornell

unread,

Jul 17, 2003, 7:05:06 PM7/17/03

to

Another possibility, the replicate may running out of locks. If it is it
will rollback the transaction then do it again all day long. Check the ASE
error log for an error message about running out of locks.

"Peter Simandl" <pet...@volny.cz> wrote in message
news:3F161E5C...@volny.cz...

Wolfgang Kunk

unread,

Jul 18, 2003, 5:43:19 AM7/18/03

to

Hi Cindy,
not sure if your problem is the same problem we have, because we use RS
12.1 and ASE 12.5, but sometimes a DSI hangs without any reason. The DSI
process is 'awaiting command', the queue contains statements to be
delivered but nothing happens. So the transaction stucks and might block
other processes.
The only workaround we find is to kill the process, suspend the DSI and
resume it again (suspend with nowait and resume doesn't work, killing
the process is necassary).
We opened a case with sybase support, but without a result so far.

Wolfgang Kunk
RTL Television

Cindy Hodgins schrieb: