Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Performance Issues (Again)

53 views
Skip to first unread message

Orkan Genc

unread,
Jul 15, 2003, 6:22:01 AM7/15/03
to
Hi All;

(Sorry I've posted this to rep-agent group but I was warned that here would
be the right place..)

We are using warm standby replication for our major databases on disaster
recovery purposes. What we have observed so far is that during the day there
is no problem at all for the normal operation however when end-of-day
operations start (which includes continuous large transactions) problems may
start. The problems are reflected to operators as "log segment full" in some
databases, however to my observation, the real problem is: the rep_agent is
not fast enough to read the transaction log, translate it to LTL, write it
to the stable device queue and re-locate the secondary truncation point on
the log segment. I think I can say that because although the process has
finished on the primary dataserver, the queue size continue to increase for
a long period of time although there are no other transactions on this
database.
For large transactional operations we have handled
dumping the transaction log automatically, but it is no use. I have overcome
the problem
(not in a good fashion) by inserting a "waitfor delay xx:xx:xx" after each
large batch. This just increase the operation duration, there must be some
better ways.

So, can you offer a better way of handling this bottleneck? What may be the
problems in terms of configuration or design? Could applying parallel dsi be
a solution, if so what is the proper way of application? Should it be
applied for the replicate side dsi threads or primary side? We made the
following configuration changes and did not see much of a difference.

ASEREP is the replicate dataserver..

suspend connection to ASEREP.tarihce
alter connection to ASEREP.tarihce set parallel_dsi to 'on'
resume connection to ASEREP.tarihce

suspend connection to ASEREP.tarihce
alter connection to ASEREP.tarihce set dsi_serialization_method to 'none'
--(it could be isolation 3, there is only a single user in end-of-day
operations)
resume connection to ASEREP.tarihce

suspend connection to ASEREP.tarihce
alter connection to ASEREP.tarihce set dsi_num_threads to '12'
resume connection to ASEREP.tarihce

after these changes still there is only "one active" dsi thread, while the
others remain in "Awaiting Command" status. Our configuration is as follows:

Adaptive Server Enterprise/12.0.0.6/P/EBF 10391 ROLLUP/HP9000-879/HP-UX
11.0/1891/64bit/FBO/Thu Aug 15 08:46:13 2002

Replication Server/12.5/P/HP9000/800/HP-UX 11.0/1/OPT/Mon Mar 25 01:52:47
PST 2002

sp_config_rep_agent "tarihce":
-------------------------------
Parameter Name Default Config Value Run Value
trace flags 0 0 0
fade timeout 30 30 30
scan timeout 15 15 15
retry timeout 60 60 60
batch ltl true true true
trace log file n/a n/a n/a
security mechanism n/a n/a n/a
msg integrity false false false
unified login false false false
scan batch size 1000 10000 10000
schema cache growth factor 1 1 1
skip ltl errors false false false
msg origin check false false false
msg confidentiality false false false
rs servername n/a ASEREP_RS ASEREP_RS
msg replay detection false false false
mutual authentication false false false
send warm standby xacts false true true
connect database tarihce tarihce tarihce
rs username n/a ASEREP_RS_ra ASEREP_RS_ra
msg out-of-sequence check false false false
skip unsupported features false false false
send maint xacts to replicate false false false
connect dataserver ASEPROD12 ASEPROD12 ASEPROD12


rs_configure :
--------------
Config Name Config Value Run Value
byte_order 256 256
cm_fadeout_time 300 300
cm_max_connections 64 64
current_rssd_version 1250 1250
dsi_exec_request_sproc on on
dsi_max_xacts_in_group 20 20
dsi_text_convert_multiplie 1.0 1.0
exec_cmds_per_timeslice 5 5
exec_sqm_write_request_lim 16384 16384
ha_failover off off
id_msg_confidentiality not_required not_required
id_msg_integrity not_required not_required
id_msg_origin_check not_required not_required
id_msg_replay_detection not_required not_required
id_msg_sequence_check not_required not_required
id_mutual_auth not_required not_required
id_security_mechanism
id_server ASEREP_RS ASEREP_RS
id_unified_login not_required not_required
init_sqm_write_delay 1000 1000
init_sqm_write_max_delay 10000 10000
md_source_memory_pool 100000 100000
md_sqm_write_request_limit 100000 100000
memory_limit 1500 1500
memory_max 3 3
minimum_rssd_version 1250 1250
msg_confidentiality not_required not_required
msg_integrity not_required not_required
msg_origin_check not_required not_required
msg_replay_detection not_required not_required
msg_sequence_check not_required not_required
mutual_auth not_required not_required
num_client_connections 30 30
num_concurrent_subs 10 10
num_msgqueues 250 250
num_msgs 45568 45568
num_mutexes 200 200
num_stable_queues 50 50
num_threads 450 450
oserver ASEREP_RS ASEREP_RS
password_encryption 0 0
queue_dump_buffer_size 1000 1000
rec_daemon_sleep_time 120 120
rsm_proc_compat_version 1250 1250
rssd_error_class rs_sqlserver_error_cla rs_sqlserver_error_class
security_mechanism
security_services off off
send_truncate_table on on
sqm_recover_segs 1 1
sqm_warning_thr1 75 75
sqm_warning_thr2 90 90
sqm_warning_thr_ind 70 70
sqm_write_flush on on
sqt_max_cache_size 1048576 1048576
sqt_max_prs_size 262144 262144
sre_reserve 0 0
stats_daemon_sleep_time 600 600
stats_flush_rssd off off
stats_reset_afterflush on on
stats_sampling on on
sts_cachesize 100 100
sts_full_cache_rs_articles off off
sts_full_cache_rs_classes off off
sts_full_cache_rs_columns off off
sts_full_cache_rs_config off off
sts_full_cache_rs_database off off
sts_full_cache_rs_datatype off off
sts_full_cache_rs_diskaffi off off
sts_full_cache_rs_diskpart off off
sts_full_cache_rs_erroract off off
sts_full_cache_rs_exceptsc off off
sts_full_cache_rs_exceptsh off off
sts_full_cache_rs_exceptsl off off
sts_full_cache_rs_funcstri off off
sts_full_cache_rs_function off off
sts_full_cache_rs_idnames off off
sts_full_cache_rs_ids off off
sts_full_cache_rs_locater off off
sts_full_cache_rs_maintuse off off
sts_full_cache_rs_objects off off
sts_full_cache_rs_oqid off off
sts_full_cache_rs_publicat off off
sts_full_cache_rs_queuemsg off off
sts_full_cache_rs_queuemsg off off
sts_full_cache_rs_queues off off
sts_full_cache_rs_recovery off off
sts_full_cache_rs_repdbs off off
sts_full_cache_rs_repobjs on on
sts_full_cache_rs_routes off off
sts_full_cache_rs_routever off off
sts_full_cache_rs_rules off off
sts_full_cache_rs_segments off off
sts_full_cache_rs_sites off off
sts_full_cache_rs_statcoun off off
sts_full_cache_rs_statdeta off off
sts_full_cache_rs_statrun off off
sts_full_cache_rs_subscrip off off
sts_full_cache_rs_systext off off
sts_full_cache_rs_translat off off
sts_full_cache_rs_users on on
sts_full_cache_rs_version off off
sts_full_cache_rs_wherecla off off
sub_daemon_sleep_time 120 120
sub_sqm_write_request_limi 32768 32768
unified_login not_required not_required
use_security_services off off
use_ssl off off
varchar_truncation off off

TIA
Regards,
----------------------------------------------------------------------------
----

A.ORKAN GENC

Peter Simandl

unread,
Jul 15, 2003, 3:46:17 PM7/15/03
to
Hi Orkan,
this shows ( unless I have missed something ) you have the default
configuration. Of course with exception of connections, but the
connections write data on the replicated site, what is not your problem.
You should first have look on the document Jason recommends and check
the improvements. Then come back with your steps and results and we can
advice further.
HTH,
Peter

Orkan Genc

unread,
Jul 16, 2003, 4:18:20 AM7/16/03
to
Hi Peter;

Thanks for the reply. If you are referring to the replication definition
creation method that Jason had suggested, that would merely help as this
table has 6 columns and 5 of them forms the uniqueness. So I believe, the
minimal columns will not make much of a difference. We had an improvement on
the processing of the queues after increasing the dsi_sqt_max_cache_size to
10M, but I'm not sure if this is also about how fast they are written to the
stable device queue. Any suggessions?

Cheers,
Orkan


"Peter Simandl" <pet...@volny.cz> wrote in message
news:3F145A09...@volny.cz...

Peter Simandl

unread,
Jul 16, 2003, 10:09:53 PM7/16/03
to
Hi Orkan,
no, I was more refering to his sentence "rep-agent benefits from having
the log of the primary database bound to a user-defined cache" since the
rep agent does not care about minimal colums a lot, the problem is disk
IO, not the transaction log decoding.

Orkan Genc

unread,
Jul 18, 2003, 5:09:57 AM7/18/03
to
Hi Peter;

There may be a reply that I've missed somehow, I really can't find a post
with such a sentence. Probably there is a problem. I'll have a look on that
anyway. Thanks.

"Peter Simandl" <pet...@volny.cz> wrote in message

news:3F160571...@volny.cz...

Jason_Wicks

unread,
Jul 18, 2003, 2:17:20 AM7/18/03
to
It was my understanding that the rep-agent required an idea of the
structure of the table (ie, colums/datatypes). A rep def speeds up this
process of identifying the structure. The rep defs improvment is not just
about the minimal column clause.

You might want to confirm this with some who knows what they are talking
about first.

Cheers,
Jason

Greg Carter

unread,
Jul 25, 2003, 7:14:56 PM7/25/03
to
Jason,

Currently RepAgent/ASE does not make use of repdef information. Other
RepAgents including RepAgent/DB2, RepAgent/Oracle, RepAgent/Informix,
RepAgent/MS-SQL Server and RepAgent/UDB do. However, for the importance
of repdefs in performance, especially Warm Standby performance, see my
earlier post in this thread.

--
Thanks,
G.Carter
Sybase Replication Server Engineering
mailto:greg....@workmail.com

Orkan Genc

unread,
Aug 1, 2003, 2:26:41 AM8/1/03
to
Hi Greg;

I was a bit suprised with the results too, here are the answers, I hope they
are the ones you are looking for. Thanks a lot.

Best Regards
Orkan

"Greg Carter" <greg....@workmail.com> wrote in message
news:3F2938D0...@workmail.com...
Orkan,

Let me make sure I understand the mechanics of the test. Did the updates at
the primary look something like:
insert into statimer....
[updates to Tb_MisMusSak_2003_test]
insert into statimer...

>Yes almost, I took update to the table and insert into the statimer as a
single transaction and >between these there is either a "drop replication
definition .." or a "create replication definition" (or >nothing if I had
wanted to repeat the test).

When you make a distinction between the "RepDef" and "WS" cases, does that
mean that in both cases you were measuring latency from primary to warm
standby, but in the first case with a repdef defined and in the second case
without?

>Yes in both cases I measured the latency from primary to warm standby, as I
mentioned earlier I was >dropping / creating the replication definition.

Note that with the time stamps for measuring latency in table statimer, it
is actually the latency of updates to this table that you are measuring. Was
a repdef for statimer added and removed in these tests as indicated by the
"RepDef" and "WS" designations?

>You are right, it is the table statimer that we're measuring the latency
and "NO" repdefs were made >for that table. But do you think defining a
repdef for a tiny table like statimer would make a >difference?


How many updates to table Tb_MisMusSak_2003_test were performed in between
inserts to statimer? Typically I'll do a good many of them so that we get
all RepServer modules actively engaged as you might expect in a high volume
environment.

> Do you mean how many times I executed the update or is the question about
how many rows does >this update effects? I ran the query about 10 times and
there are only 50 rows for that table, would it >be better if this update
was run with a larger table?

If you could provide these few more details I may be able to come up with a
reasonable explanation -maybe not, though. After going on 3 years of effort
dedicated to RepServer performance, I feel 100% confident in making just
this one statement - RepServer performance is a very fickle thing.

> I hope


--
Thanks,
G.Carter
Sybase Replication Server Engineering
mailto:greg....@workmail.com

Orkan Genc wrote:

Hi Greg;

I decided to try what you have suggested and carried out such a test plan.
First of all, I had previously seen your post describing how to measure the
latency, so I won't be describing it here. But to give an idea the tables
are as you have suggested.

AT Primary
create table statimer(
pdesc varchar(255),
pdate datetime default getdate())
go

AT Remote
create table statimer(
pdesc varchar(255),
pdate datetime default getdate(),
rdate datetime default getdate())
go

The structure of the table for which we would like to create repdef is as
follows :

CREATE TABLE dbo.Tb_MisMusSak_2003_test
(
ilktarih smalldatetime NOT NULL,
sontarih smalldatetime NOT NULL,
a_kod char(3) NOT NULL,
mus_kod varchar(15) NOT NULL,
m_no int NOT NULL,
grup char(2) NOT NULL,
hesap smallint NOT NULL,
adet decimal(20,2) NOT NULL
)
LOCK ALLPAGES
go
GRANT REFERENCES ON dbo.Tb_MisMusSak_2003_test TO public
go
GRANT SELECT ON dbo.Tb_MisMusSak_2003_test TO public
go
GRANT INSERT ON dbo.Tb_MisMusSak_2003_test TO public
go
GRANT DELETE ON dbo.Tb_MisMusSak_2003_test TO public
go
GRANT UPDATE ON dbo.Tb_MisMusSak_2003_test TO public
go

CREATE UNIQUE NONCLUSTERED INDEX Pk_MisMusSak1
ON
dbo.Tb_MisMusSak_2003_test(sontarih,ilktarih,a_kod,mus_kod,m_no,grup,hesap)
go

CREATE NONCLUSTERED INDEX Pk_MisMusSak2
ON dbo.Tb_MisMusSak_2003_test(ilktarih,sontarih,m_no,grup)
go

CREATE NONCLUSTERED INDEX Pk_MisMusSak3
ON dbo.Tb_MisMusSak_2003_test(sontarih,a_kod,mus_kod,grup,m_no,hesap)
go

and the replication definition I created is :

create replication definition tarihce_test_Tb_MisMusSak_2003
with primary at LDS.tarihce
with all tables named 'Tb_MisMusSak_2003_test'
(ilktarih smalldatetime, sontarih smalldatetime, a_kod char(3),mus_kod
varchar(15), m_no int, grup char(2), hesap smallint)
primary key (sontarih,ilktarih,a_kod,mus_kod,m_no,grup,hesap)
send standby all columns
replicate minimal columns

For warm standby or for repdef cases I ran the following commands and
checked the latency on the remote site.
Note that there was almost no activity on the server and repserver was
almost quieced, but I repeated the test several times to be sure:

update Tb_MisMusSak_2003_test set sontarih=sontarih
go
insert statimer (pdesc) select "orkanx" (x being the step number)

Here are the results :

select *,datediff(ms,pdate,rdate) from tarihce..statimer --REPLICATE SITE

pdesc pdate rdate
Latency
orkan4 Jul 30 2003 4:10:34:166PM Jul 30 2003 4:10:35:510PM 1343 -->
RepDef
orkan5 Jul 30 2003 4:17:11:543PM Jul 30 2003 4:17:12:766PM 1223 --> WS
orkan6 Jul 30 2003 4:20:35:156PM Jul 30 2003 4:20:36:260PM 1103 --> WS
orkan7 Jul 30 2003 5:10:02:536PM Jul 30 2003 5:10:03:823PM 1286 --> WS
orkan8 Jul 30 2003 5:10:38:096PM Jul 30 2003 5:10:39:286PM 1190 --> WS
orkan9 Jul 30 2003 5:11:43:636PM Jul 30 2003 5:11:44:966PM 1330 -->
RepDef
orkan10 Jul 30 2003 5:13:20:730PM Jul 30 2003 5:13:21:880PM 1150 --> WS
orkan11 Jul 30 2003 5:20:41:656PM Jul 30 2003 5:20:42:883PM 1226 -->
RepDef
orkan12 Jul 30 2003 5:21:17:656PM Jul 30 2003 5:21:18:883PM 1226 -->
RepDef
orkan13 Jul 30 2003 5:23:49:793PM Jul 30 2003 5:23:50:900PM 1106 --> WS

Looking at the results above I can say WS doing a better job then repdefs
for this specific table which was my initial point. Do you think there could
be a better way of defining the replication definitions? I would think for
tables with large number of columns compared to the number of primary key
columns this would be more efficient. Anyway, do you think am I right to
stay on warm standby configuration for this table or any other suggestions?

TIA

Regards
Orkan

"Greg Carter" <gca...@sybase.com> wrote in message
news:3F21B855...@sybase.com...
Orkan,

To the extent that you can configure RepServer to generate efficient SQL you
will see real performance gains. Hence, for a Warm Standby connection you
should always create repdefs specifying the primary key (unless performance
is not a concern) and set "replicate_minimal_columns" to "on" for the
logical connection (I believe this is the default). In some of my tests I've
seen a performance gain on the order of 50% with another 5 to 10% gain from
turning on "replicate_minimal_columns".

Regarding the improvement you see in performance after increasing
dsi_sqt_max_cache_size, how did you measure that performance? From my
observations, customers often fool themselves into thinking that an increase
in SQT Cache has increased throughput because they measure throughput by
looking at the output of "admin who,sqm". After the increase in SQT cache,
they see the queue being read at a faster rate for longer periods of time.
Well naturally, with a bigger cache to fill, reads will take place faster
for longer periods of time. But if they wait long enough for the cache to
fill up, they'll find that the read rate off the queue drops back to what
they were seeing before. They only reason why SQT cache sizing may be an
impact on performance is if it is not big enough to contain whole
transactions and transactions must be removed from cache and then re-read
when needed. This is indicated by "admin who,sqt", the "trans removed"
column (or something to that effect). You can also check counter
24009-"TransRemoved".

The real measure of how fast things are moving along is how fast the DSI is
pushing transactions out to the replicate. There are a number of counters to
monitor this including 5000-"TransTotal", 5007-"TransSucceeded",
5020-"CmdGroups", 5024-"CmdGroupsCommit", 5030-"CmdsRead",
57000-"TransApplied", 57002-"CmdsApplied", 57020-"TPS_Average",
57043-"DSIEBPSaverage", 57049-"TransAvgGroup", 57069-"DSIEBatch", etc. If
you compare the queue read rate over time to any of these, until the SQT
cache is filled the read rate is high and then it drops down while these
counters in the DSI typically show a steady rate of processing the whole
time.


--
Thanks,
G.Carter
Sybase Replication Server Engineering
mailto:greg....@workmail.com

0 new messages