Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MSMQ takes a long time to connect: "waiting to connect" problem

1,461 views
Skip to first unread message

Malcolm Sheldon

unread,
Jun 16, 2003, 1:46:13 AM6/16/03
to
I have developed a POS application using Queued Components/MSMQ, we
have had 8 stores running for 6 months without a problem, and are now
rolling out the remaining 180. So far there are 70 installed.

There is one PC at HO, the "server", and each store has one or more
PC's or "clients", connected via a VPN Each is running XP-SP1 and
MSMQ. They are operating in Workgroup mode, (ie no active directory),
the server knows the IP address of each store, and each store knows
the IP of the server. They communicate using direct formatnames, eg
to send a message from client to server, the client uses:-

"Queue:FormatName=DIRECT=TCP:xxxx.xxxx.xxx.xxx\PRIVATE$\Str_Recv/new:Str_Recv.Str_Recv"

where the x's are the IP of the server and visa versa.

I have a basic "heartbeat" which sends a message to each store, and
each store then sends back an acknowledgement.

With the pilot of 8 stores everything was fine. But now we've got 70
stores installed, the time to send and receive to all stores has
become almost a hour.

When the heartbeat starts, I see all the queues created as outgoing
queues. Some of them connect immediately, which is what I would
expect, but some of them just sit there with "waiting to connect". I
can ping the stores, and if I go into their system with remote
desktop, I can ping the server, but the queues dont connect.

Slowly over a long time, the queues do connect, and the messages flow.
But from first to last takes about 50 minutes!!!

I have tried changing 2 registry settings, in the MSMQ parameters
entry:

WaitTime to 4000(4 seconds)
CleanupInterval to 7200000 (2 hours)

The later because I thought maybe if the queues already existed it
might improve thngs. And the WaitTime to force a faster retry of a
missed connection.

The only difference is that the "Waiting to Connect" now seems to be
"Inactive", even though there's a pending message, this then goes to
"Waiting to Connect", then "Connected" and back to "Inactive"


So, my question is, why is it taking so long to send and receive 70
tiny messages??? (I'm afraid of what it's going to be like when we've
got the full network in, there will be over 200 clients in total!)

Anybody got any ideas what the problem might be????

Thanks in advance

Malcolm Sheldon

Doron Juster [MSFT]

unread,
Jun 16, 2003, 5:08:33 AM6/16/03
to
The msmq service has a pool of worker threads which handle all networking
and queuing activities. When you send to a remote machine, a worker thread
will try to connect to that machine. If the remote machine is offline, the
thread is blocked for 20 seconds (that's the default time of the winsock
"connect()" api) until it fails and move on to other activities.
That means that if you send to many remote machines at once, and some of
them are offline, then networking activity of msmq is slowed down because of
worker threads that are blocked in "try to connect".
Even if all machines are online but on slow links, it will take some time
until msmq connect to all and send to all. I cannot comment about the
specific numbers you mention because I don't have all the details.

The registry "WaitTime" is absolutely not useful when using direct=tcp. It
may help with quicker name resolution if you use direct=os. In any case,
value less than 30 seconds introduces lot of "noise" and interfer with msmq
timing and activity. I'd suggest that you remove this registry and let msmq
use the defaut.

Workarounds for such a scenario:
1. Increase the number of worker threads on the server. That's done with the
QMThreadNo registry, see below its description from Win2k resource kit.
Don't increase it too much. (500 threads is probably a bad value). Benchmark
until you're satisfied with the results.
2. On server, pause outgoing queues if you know that remote client is
offline. A paused outgoing queue does not consume dynamic resources
(threads, sockets, etc). For this, you need to write code that determine
status of remote clients and pause the queues. A common technique is that
each client send an "hello" message to server when it connects, instead of
server unconditionally sending heartbeat to all clients. Then your code on
the server receive the hello and resume relevant outgoing queues.
3. Increase value of "CleanupInterval" on server and all clients, as you
did. This means that unused sessions and queue objects are not cleaned up.
Sessions which alrady exist will be ready to send immediatly, when a new
message is available. This option may consume resources, so you need to
benchmark it. Make sure you change this on all computers, otherwise the
computers with default value will tear down unused sessions long before this
interval expire on the other side.

Thanks, Doron

QMThreadNo
HKLM\SOFTWARE\Microsoft\MSMQ\Parameters

Data type Range Default value
REG_DWORD 0x1 - 0x10 threads Windows 2000 Server:Number of
processors *5+3 Windows 2000 Professional:Number of processors*3

Description
Determines the number of threads created in the Message Queuing process
(Mqsvc.exe) to handle messages.

Increasing the value of this entry can improve the performance of Message
Queuing, but too many threads can overload the processor.

Note

This entry does not appear in the registry unless you add it or use a
program to change its default value.


--
This posting is provided "AS IS" with no warranties, and confers no rights.
.

"Malcolm Sheldon" <bnch...@ozemail.com.au> wrote in message
news:50356c40.03061...@posting.google.com...

note.gif

Malcolm Sheldon

unread,
Jun 18, 2003, 1:00:09 AM6/18/03
to
Hi Doron,

Thanks for the suggestions, unfortunately it hasnt made much
difference, I increased the QMThreadNo all the way up to 100, but the
problem remains.

I also came across TCPNoDelay which looked worth a try, and I think
had some small effect but, again the problem remains.

However, I came across a thread with you and a couple of other guys
back in December, where you suggest that it could be an exhaustion of
licenses problem. And you describe MSMQ only using 10 licenses at a
time, when talking between non-server systems. (I wasnt aware that
MSMQ had a license limitation/structure). However, I should have
mentioned, that all systems are running XP Pro, SP1. Even what we
call the comms server, is actually just a standard XP workstation on
our network.

This sounded to me like this was exactly what we're seeing. So I
reduced the "CleanupInterval" to 10 seconds and I can now send my
heartbeat to all stores in less than 5 minutes, which was previously
taking 55 minutes!

Is there a way I can increase the number of licenses?? Or do we have
to change the XP "server" to a real server, which would be Win2K, I
guess, since there isnt a XP Server edition. (In which case I'm
wondering if the XP/MSMQ-3 type clients can talk to the W2K/MSMQ-2
server, (but thats another problem, which I'm sure I can get around))


Hope you can shed some light

Kind Regards

Malcolm Sheldon

"Doron Juster [MSFT]" <Dor...@Online.Microsoft.com> wrote in message news:<OxpRfb#MDHA...@TK2MSFTNGP11.phx.gbl>...

> begin 666 note.gif
> M1TE&.#EA"@`*`+/_`(V,C?__S/_,`/\%!?]=7<# P-/3TX6%A0("`@``````
> M`````````````````````"'Y! $```4`+ `````*``H```0H$,AI"AD@Z(T$
> AWEJ'%!DGC.1VG%4:&,@1'(5[&&%=@NC.U8B@,%B+```[
> `
> end

0 new messages