CommitBatchSize vs. CommitBatchThreshold for transactional distribution agents

KVD

unread,

Oct 24, 2008, 4:01:51 PM10/24/08

to

I've been playing around with the CommitBatchSize and
CommitBatchThreshold values and am thoroughly confused. I set up a
simple transactional publication containing 1 article with a single
push subscriber, then created a new agent profile based on the Verbose
history agent profile. The only thing I changed were the values for
CommitBatchSize and CommitBatchThreshold. All of this was done on SQL
2005.

For test 1, I set CommitBatchSize to 5 and CommitBatchThreshold to 10,
then issued 7 update statements that affected 1 row each. In profiler
I saw one batch with 5 update statements (all with the same
transaction ID) followed by a second batch with 2 update statements
(also all with the same transaction ID, but not the same transaction
ID as the first batch of 5 updates) . No problem, I expect it to work
this way.

For test 2, same settings as test 1, I issued a single update
statement that affected 20 rows. In profiler I saw one batch with 20
update statements, all with the same transaction ID.

For test 3, I set CommitBatchSize to 100 and CommitBatchThreshold to
5. I issued the same 7 update statements as in test 1. In profiler I
saw one batch with 6 update statements (all with the same transaction
ID) followed by a second batch with 1 update statement (different
transaction ID than the last).

Based on what I see in profiler I can only conclude that when
CommitBatchThreshold > CommitBatchSize the value for
CommitBatchThreshold is essentially ignored. However, when
CommitBatchThreshold <= CommitBatchSize the value for
CommitBatchThreshold is what dictates how many commands are delivered
in a batch. Even stranger, it appears that in the second case the
number of commands that are delivered is actually CommitBatchThreshold
+ 1.

Am I missing something obvious?

Hilary Cotter

unread,

Oct 24, 2008, 10:15:00 PM10/24/08

to

commit batch size is the number of transactions before a commit, commit
batch threshold is the total number of commands before a commit. These
commands could be in different transactions, so its the lesser of these
values which will trip a commit.

However what is key to remember in this case is that the distribution agent
always honors transactional boundaries.

The only problem with your tests is test #3. If it was 5 and 2 it would make
more sense. I am wondering if perhaps they was a gap between the first 6 and
the last one.

Paul Ibison

unread,

Oct 27, 2008, 5:50:01 AM10/27/08

to

My suspicion is that this is very wrong in BOL (BOL definitions at the
bottom).

If BOL was correct, it would be possible to break the ACID properties of a
transaction via a small value for -CommitBatchThreshold. In BOL I think the
"COMMIT" that is referred to is not actually anything to do with transactions
- it could more reasonably be the no of records/transactions obtained in a
network round trip and applied at the subscriber. Empirically I find that the
transaction boundaries are always respected regardless of these settings and
I checked this using log explorer at the subscriber. So, if I'm right, then
the BOL definitions should read:

CommitBatchSize : "Is the maximum number of transactions to be read from the
distributor before being applied at the subscriber".
CommitBatchThreshold : "Is the maximum number of commands to be read from
the distributor before being applied at the subscriber".

I've added the "maximum" statement because if there are only a few
transactions of commands, these are still sent over on synchronization.

Let's request verification from MS for this section.

Cheers,

Paul Ibison

-CommitBatchSize commit_batch_size
Is the number of transactions to be issued to the Subscriber before a COMMIT
statement is issued. The default is 100.

-CommitBatchThreshold commit_batch_threshold
Is the number of replication commands to be issued to the Subscriber before
a COMMIT statement is issued. The default is 1000.

Hilary Cotter

unread,

Oct 27, 2008, 7:31:00 AM10/27/08

to

Paul, IMHO, BOL is does not mention whether it does or not.

The posters problem is with singleton updates.

" I issued the same 7 update statements as in test 1. In profiler I
saw one batch with 6 update statements (all with the same transaction
ID) followed by a second batch with 1 update statement (different
transaction ID than the last).
"

He does 7 separate singleton updates, each of them affect 1 row, each of
them a transaction.

6 of them are applied in a batch (the commitbatchsize of 5 should have
limited it to 5), and then the other one is in a different batch -different
transaction id. At least this is how I read the post.

What bol does not make clear is how it handles transactions and its
constituent commands.

If in the first case I did a batch update which affected 20 rows, this 20
row update would be replicated in a single transaction event though my
commitbatchthreshold was 10. The distribution agent respects transactional
boundaries.

Where you would see it is if you issued two transactions, one of 5 rows and
one of 6 rows.

Commitbatchsize would not trigger a commit as its two transactions, but
commitbatchthreshold would as its 11 commands.

Paul Ibison

unread,

Oct 27, 2008, 9:20:00 AM10/27/08

to

Actually you've put what I was trying to express much more clearly than me
"What bol does not make clear is how it handles a transaction and its
constituent commands". It seems to me that the commands referred to in
CommitBatchThreshold only refers to individual commands outside of a grouped
transaction, also that both definitions refer to a "Maximum" before the
COMMIT is issued, ie 2 changes need making to the BOL definitions.

This makes producing the rules in the original post "CommitBatchThreshold >
CommitBatchSize" etc quite tricky.

On the point of the poster's strange issue with 7 singleton updates becoming
2 transactions: one with 6 commands and one with one command, I have no ideas
- I've just tested on SQL Server 2000 with Log Explorer and it is totally
reproducible there also! Have requested clarification for this logic from our
friends at MS.

Cheers,

Paul Ibison

Hilary Cotter

unread,

Oct 27, 2008, 9:31:00 AM10/27/08

to

Paul, now I am hopelessly confused, are you saying that it violates
transactional integrity, or that it trips the commitbatchthreshold 1 too late?

I think we are chasing straws really. These parameters have great impact on
performance, whether they are one off or not.

PS Happy Diwali!

Paul Ibison

unread,

Oct 27, 2008, 9:56:01 AM10/27/08

to

Hilary,

just understood your test: "Where you would see it is if you issued two

transactions, one of 5 rows and one of 6 rows. Commitbatchsize would not
trigger a commit as its two transactions, but commitbatchthreshold would as

its 11 commands." with Commitbatchsize = 5 and commitbatchthreshold = 10.

Carried it out and what actually happens is the both transactions go over.
However the same occurs when the updates hit 5 rows and 4 rows (ie one less
than commitbatchthreshold) which is consistent with my proposal that the
commitbatchsize and commitbatchthreshold are maximums.

Not too sure if commitbatchthreshold only ever applies to "singleton
transactions" but it seems so.

Cheers,

Paul Ibison

unread,

Oct 27, 2008, 10:03:00 AM10/27/08

to

Just posted my other findings in this thread, but basically I think that
(a) the poster is correct and there is a (reproducible) bug for the 3rd case.
(b) the definitions of the 2 parameters in BOL need to be changed to refer
to "Maximums" after which a transaction is committed.
(c) the definition for the CommitBatchThreshold should be clarified to refer
only to commands outside a grouped transaction ("singletons").
I have pinged this to VTS to see what he says.
Cheers,
Paul

Hilary Cotter

unread,

Oct 27, 2008, 10:10:09 AM10/27/08

to

Paul, think your problem is that you are thinking of it like a spill bucket.
The spill bucket can only hold so much before dumping.

What happens here is like a bar maid at the pubs you frequent to all hours
and probably are at while you are reading this message.

Let's suppose the bar maid on duty tonight is the high IQ type. She can
memorize 50 orders (commitbatchThreshold), but her tray is only so large, so
she can carry 5 beers (commit batch size).

If you give her two orders, is she going to wait around for 3 more orders?
Of course not (unless she is chatting u up). She will go and grab the beers
no matter that her brain can hold 50 orders and her tray can hold 5 beers.

Same thing is going on here. You issue 2 transactions with a varied amount
of commands. Is the distribution agent going to wait 7 years for the next 3
transactions to trip a commit? No, its going to trip when it hits the polling
interval or some other internal mechanism.

It only when its things get busy that the bar maid start delivering orders
with a full bar tray, and then there is no time to chat for her to chat you
up anymore, until you mention your ailment and immediately clear the bar.

You can only observe this action when pimping, oops typo there, pumping
large numbers of transactions and commands to the distribution agent.

What is puzzling however is the split in test 3, however.

Paul Ibison

unread,

Oct 27, 2008, 11:11:02 AM10/27/08

to

Hi Hilary - that's the same as what I'm saying - the parameters only
represent a Maximum (and there is no minimum). However in the analogy the
maximum no of beers only applies to individual orders of a single beer
(assuming a batch is an order and a threshold is a beer).
Cheers,
Paul

Hilary Cotter

unread,

Oct 27, 2008, 11:20:00 AM10/27/08

to

Ah, I missed your point. My analogy does not extend itself nicely to
encapsulate the logic. I was thinking you were hung up on the spill bucket
metaphor.

KVD

unread,

Oct 27, 2008, 2:17:15 PM10/27/08

to

I re-ran my tests and I am always seeing CommitBatchThreshold + 1
commands delivered to the subscriber. Maybe someone at Microsoft who
reads this can chime in on why this is happening, otherwise it sounds
like a bug to me. Not a showstopper, of course, but still not what is
expected based on documentation.

I also confirmed the bit about transactional boundries; I had
CommitBatchThreshold set to 5 and CommitBatchSize set to 1,000, then I
ran a transaction (one the publisher) where I update 10 rows one at a
time and commit. In Profiler I see a single transaction with 10
updates. When I run the 10 updates outside of a transaction on the
publisher I see transactions with 6 and 4 updates at the subscriber.

Unfortunately I don't think the concept of transaction vs. command is
very clearly explained either, since a command is really autocommit
transaction. It would probably cause less confusion to explain how the
values work if it's phrased more in terms of explicit vs. autocommit
transactions.

Thanks for your help,
Kendal Van Dyke
http://kendalvandyke.blogspot.com/

Paul Ibison

unread,

Oct 28, 2008, 5:14:00 AM10/28/08

to

Definitely agree that there's some need for clarification (apart form the bug
that is) and have raised up all this with MS to try to get a clearer picture
so I can request BOL to be updated. Will update this thread when I know more.
Many thanks,
Paul Ibison, www.replicationanswers.com

Gopal Ashok [MSFT]

unread,

Oct 28, 2008, 9:53:32 PM10/28/08

to

Just had a chat with Paul on this one

Commithreshold is honored at a transaction boundary. Let's take the
following example

Commitbatchsize 10
Commitbatchthreshold 20

Transaction1 has 19 commands
Transaction 2 has 11 commands
Transaction 3 has 5 commands

With the above setting, the distribution agent reader thread will set the
commit flag for the command block after we read all the commands in
transaction 2. This is because we check the commitbatch size and
commitbatchthreshold after a transaction boundary is reached. We dont issue
a commit after we read the first command (20th command in that batch) in
transaction 2. So this commit batch on the subscriber will have committed 2
transactions with 30 commands even though the threshold is 20.

So the batch will be committed either when we hit the commitbatchsize OR
when a transaction crosses the command threshold before the commitbatchsize
is hit.

With regard to the threshold adding one more command that seems to be a bug.

Hope this helps.

"Paul Ibison" <Paul....@ReplicationAnswers.Com.(donotspam)> wrote in
message news:0A8275F6-3C9A-4388...@microsoft.com...

Hilary Cotter

unread,

Dec 3, 2008, 8:00:26 AM12/3/08

to

I strongly suspect this 1 off command is the commit command.

No bug here.

On Oct 28, 8:53 pm, "Gopal Ashok [MSFT]" <gopalas...@hotmail.com>
wrote:

> Just had a chat with Paul on this one
>
> Commithreshold is honored at a transaction boundary. Let's take the
> following example
>
> Commitbatchsize 10
> Commitbatchthreshold 20
>
> Transaction1 has 19 commands
> Transaction 2 has 11 commands
> Transaction 3 has 5 commands
>
> With the above setting, the distribution agent reader thread will set the
> commit flag for the command block after we read all the commands in
> transaction 2. This is because we check the commitbatch size and
> commitbatchthreshold after a transaction boundary is reached. We dont issue
> a commit after we read the first command (20th command in that batch) in
> transaction 2. So this commit batch on the subscriber will have committed 2
> transactions with 30 commands even though the threshold is 20.
>
> So the batch will be committed either when we hit the commitbatchsize OR
> when a transaction crosses the command threshold before the commitbatchsize
> is hit.
>
> With regard to the threshold adding one more command that seems to be a bug.
>
> Hope this helps.
>

> "Paul Ibison" <Paul.Ibi...@ReplicationAnswers.Com.(donotspam)> wrote in
> messagenews:0A8275F6-3C9A-4388...@microsoft.com...