There is one PC at HO, the "server", and each store has one or more
PC's or "clients", connected via a VPN Each is running XP-SP1 and
MSMQ. They are operating in Workgroup mode, (ie no active directory),
the server knows the IP address of each store, and each store knows
the IP of the server. They communicate using direct formatnames, eg
to send a message from client to server, the client uses:-
"Queue:FormatName=DIRECT=TCP:xxxx.xxxx.xxx.xxx\PRIVATE$\Str_Recv/new:Str_Recv.Str_Recv"
where the x's are the IP of the server and visa versa.
I have a basic "heartbeat" which sends a message to each store, and
each store then sends back an acknowledgement.
With the pilot of 8 stores everything was fine. But now we've got 70
stores installed, the time to send and receive to all stores has
become almost a hour.
When the heartbeat starts, I see all the queues created as outgoing
queues. Some of them connect immediately, which is what I would
expect, but some of them just sit there with "waiting to connect". I
can ping the stores, and if I go into their system with remote
desktop, I can ping the server, but the queues dont connect.
Slowly over a long time, the queues do connect, and the messages flow.
But from first to last takes about 50 minutes!!!
I have tried changing 2 registry settings, in the MSMQ parameters
entry:
WaitTime to 4000(4 seconds)
CleanupInterval to 7200000 (2 hours)
The later because I thought maybe if the queues already existed it
might improve thngs. And the WaitTime to force a faster retry of a
missed connection.
The only difference is that the "Waiting to Connect" now seems to be
"Inactive", even though there's a pending message, this then goes to
"Waiting to Connect", then "Connected" and back to "Inactive"
So, my question is, why is it taking so long to send and receive 70
tiny messages??? (I'm afraid of what it's going to be like when we've
got the full network in, there will be over 200 clients in total!)
Anybody got any ideas what the problem might be????
Thanks in advance
Malcolm Sheldon
The registry "WaitTime" is absolutely not useful when using direct=tcp. It
may help with quicker name resolution if you use direct=os. In any case,
value less than 30 seconds introduces lot of "noise" and interfer with msmq
timing and activity. I'd suggest that you remove this registry and let msmq
use the defaut.
Workarounds for such a scenario:
1. Increase the number of worker threads on the server. That's done with the
QMThreadNo registry, see below its description from Win2k resource kit.
Don't increase it too much. (500 threads is probably a bad value). Benchmark
until you're satisfied with the results.
2. On server, pause outgoing queues if you know that remote client is
offline. A paused outgoing queue does not consume dynamic resources
(threads, sockets, etc). For this, you need to write code that determine
status of remote clients and pause the queues. A common technique is that
each client send an "hello" message to server when it connects, instead of
server unconditionally sending heartbeat to all clients. Then your code on
the server receive the hello and resume relevant outgoing queues.
3. Increase value of "CleanupInterval" on server and all clients, as you
did. This means that unused sessions and queue objects are not cleaned up.
Sessions which alrady exist will be ready to send immediatly, when a new
message is available. This option may consume resources, so you need to
benchmark it. Make sure you change this on all computers, otherwise the
computers with default value will tear down unused sessions long before this
interval expire on the other side.
Thanks, Doron
QMThreadNo
HKLM\SOFTWARE\Microsoft\MSMQ\Parameters
Data type Range Default value
REG_DWORD 0x1 - 0x10 threads Windows 2000 Server:Number of
processors *5+3 Windows 2000 Professional:Number of processors*3
Description
Determines the number of threads created in the Message Queuing process
(Mqsvc.exe) to handle messages.
Increasing the value of this entry can improve the performance of Message
Queuing, but too many threads can overload the processor.
Note
This entry does not appear in the registry unless you add it or use a
program to change its default value.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
.
"Malcolm Sheldon" <bnch...@ozemail.com.au> wrote in message
news:50356c40.03061...@posting.google.com...
Thanks for the suggestions, unfortunately it hasnt made much
difference, I increased the QMThreadNo all the way up to 100, but the
problem remains.
I also came across TCPNoDelay which looked worth a try, and I think
had some small effect but, again the problem remains.
However, I came across a thread with you and a couple of other guys
back in December, where you suggest that it could be an exhaustion of
licenses problem. And you describe MSMQ only using 10 licenses at a
time, when talking between non-server systems. (I wasnt aware that
MSMQ had a license limitation/structure). However, I should have
mentioned, that all systems are running XP Pro, SP1. Even what we
call the comms server, is actually just a standard XP workstation on
our network.
This sounded to me like this was exactly what we're seeing. So I
reduced the "CleanupInterval" to 10 seconds and I can now send my
heartbeat to all stores in less than 5 minutes, which was previously
taking 55 minutes!
Is there a way I can increase the number of licenses?? Or do we have
to change the XP "server" to a real server, which would be Win2K, I
guess, since there isnt a XP Server edition. (In which case I'm
wondering if the XP/MSMQ-3 type clients can talk to the W2K/MSMQ-2
server, (but thats another problem, which I'm sure I can get around))
Hope you can shed some light
Kind Regards
Malcolm Sheldon
"Doron Juster [MSFT]" <Dor...@Online.Microsoft.com> wrote in message news:<OxpRfb#MDHA...@TK2MSFTNGP11.phx.gbl>...
> begin 666 note.gif
> M1TE&.#EA"@`*`+/_`(V,C?__S/_,`/\%!?]=7<# P-/3TX6%A0("`@``````
> M`````````````````````"'Y! $```4`+ `````*``H```0H$,AI"AD@Z(T$
> AWEJ'%!DGC.1VG%4:&,@1'(5[&&%=@NC.U8B@,%B+```[
> `
> end