"John Breakwell (MSFT)" wrote:
> Hi PTH
>
> Clustering itself won't make any difference to how messages are processed by
> the Triggers service so I would not worry about that aspect too much.
>
> You can get situations where messages are skipped but they will still be in
> the queue.
> Restarting the service will cause these messages to be processed properly.
>
> If the message is no longer in the queue AND the corresponding trigger never
> happened then something else has definitely removed the message.
> Note that you can get messages removed from a queue by MSMQ itself if a
> timer expires.
> For example, if you have set the message's "Time To Be Received" to
> something short like a minute and there are a lot of new messages in the
> queue then the Triggers service may not reach the message before it expires.
> In this case, if you are using "negative source journaling" then you will
> see the message in one of the sender's Dead Letter Queues.
> If the messages are very important then you SHOULD be using such features to
> ensure you always know where the messages are.
>
> "If we do when we shut down the app, it wont restart for sometimes up to 10
> mins as the queue seems to be locked in exclusive read access."
> How are you disposing of the exisiting queue handle?
>
> Cheers
> John Breakwell (MSFT)
>
>
> "PTH" <P...@discussions.microsoft.com> wrote in message
> news:821E2C7D-9A0A-4486...@microsoft.com...
> > I'm witnessing the following error message below in a clustered
> > environment
> > (2 nodes-Active/Passive):-
> >
> > Error: Failed to redirect trigger - The Message at which the cursor is
> > currently pointing was removed from the queue by another process or by
> > another call to MQReceiveMessage without the use of this cursor.
> >
> > Now we normally get this error if another application is pointing to the
> > same queue but in this case I'm 99% sure this is not the case. The queue
> > is
> > only used by one application and is a public queue. At customers with a
> > standard 2003 server we dont get the problem. However 2 customers with
> > clustered msmq both get the same problem, our software version is no
> > different between the environments and works in the same way. The
> > application is receiving and sending tens of thousands of messages a day
> > and
> > it probably happens around 20-50 times a day. Unfortunately a very
> > important
> > message didn't arrive yesterday and there was one of these errors at the
> > time
> > the application should have sent it across so its the most likely suspect.
> > Until now we have not noticed any impact from these messages but that may
> > have been more luck than anything.
> >
> > Has anyone come across this problem before in a clustered environment or
> > has
> > anyone got any idea's of steps to get to the bottom of it as our next step
> > is
> > to try and replicate it in a test environment?
> >
> > Even though only one app uses the queue its not being used in exclusive
> > read
> > access. If we do when we shut down the app, it wont restart for sometimes
> > up
> > to 10 mins as the queue seems to be locked in exclusive read access.
> > Again
> > this only happens in a clustered message queue environment?
> >
> > Thanks for and help and pointers in advance.
> >
>
Is there a way to be able to tell if messages are timing out and being
removed from the queue? What is the default if not specified? I'd be
surprised if its this as when the applications is stopped the queue builds up
constantly for a long period. If there were a timeout period I would expect
it to reach a ceiling. The application should process the messages within a
second when its running, looking at the data that was processed just before
it which gets sent at the same time there were no delays which would indicate
it wasn't sat in the queue waiting to be processed.
I've got a feeling this is not going to be asy to get to the bottom of :-(
Regards, Paul.
Apologies I may have confused things with the title. After speaking to
development aparantely the "failed to redirect trigger" is our own error
message. We don't actually use msmq triggers, its unticked in the add/remove
programs. The message contains a trigger number in the body that our
applications uses hence the confusion. The message arrives on the queue and
we use the msmq event to read the message. I'll try and rename the title to
"Failed to read message from queue". In light of the above have you got any
more advice?
With regard to closing down apps we do a queue.close() followed by a
queue.dispose().
Your review is correct except we use a much more basic method to read the
messages off the queue, we use the mq_arrived event to get the top message.
Below is the snippet of the code used from our development team. To clarify
a point in another of your posts, each message is sent within a transaction.
We don't send multiple messages within a single transaction.
------------------------------------------------------------------------------
Private WithEvents mobjMyMMQEvent as MSMQEvent
Private Sub mobjMyMMQEvent_Arrived(ByVal Queue As Object, ByVal Cursor As
Long)
< Variable declarations >
Dim lobjMessage as MSMQMessage
On Error Goto ErrorHandler
' Get message data from queue
Set lobjMessage = Queue.PeekCurrent(, , 0)
lstrMessageLabel = lobjMessage.Label
lstrMessageBody = lobjMessage.Body
<Process MQ message label and body>
' Remove message from queue
Set lobjMessage = Queue.ReceiveCurrent(, , 0)
Set lobjMessage = Nothing
mobjMyMMQ.EnableNotification mobjMyMMQEvent, MQMSG_FIRST
<Rest of routine>
End Sub
----------------------------------------------------------------------
With regard to your questions:
Q1. No, our applications aren't currently cluster aware so they are run
manually on the active node. SQL, MSMQ and MSDTC are clustered and we create
the queues on the clustered msmq instance using the network name. I have
noticed though that there are outgoing queues from the local msmq instance to
the clustered queues which I'm not convinced is correct?
Q2. It doesn't look like the hotfix is installed. We have a pre-production
cluster and a have written a test app to overload our application with
messages. Once we have replicated the problem I will try the hot fix to see
if it resolves the issue.
Q3. The queues are not transactional.
Is there a performance overhead when using negative source journaling?
We have been using the application for 9 years on NT, 2000 & 2003 and have
not had this issue until recently when moving it onto clusters. The
application hasn't change much since moving to a 2000 platform.
Thanks again for all your help.
"John Breakwell (MSFT)" wrote:
> Hi
>
> To review:
> MSMQ running on a cluster.
> There is one application processing the messages in the queue.
> When the application is not running, messages wait/accumulate.
> The application uses cursors to move through the queue to find messages
> based on a number in the message body.
> Occsaionally an error is returned about the cursor pointing to a message
> that has already been processed.
>
> Q1 Is the application running within the clustered resource group along
> with MSMQ?
> Q2 Is the hotfix for KB 937549 in place?
> Q3 Are the queues transactional?
>
> Recommendations.
>
> Use Source journaling (postive and negative) on the messages so you
> guarantee you know what happens to them.
>
> Cheers
> John Breakwell (MSFT)
>
> "PTH" <P...@discussions.microsoft.com> wrote in message
> news:DE12DFEA-735F-40CA...@microsoft.com...
Just as an update, we have wrote a test app that just sends messages to the
queue and the app sends a message back to the test app. We have put a
sequence number in the body of the message so we can check to see if the
message goes missing. After 2 mins the test app had sent 93,114 messages. 3
minutes later all the messages had been recevied back to the test app. We
had 5 of the errors in event viewer and 5 out of sequence errors in the test
app log. So we are definately losing the messages some where. Are you aware
of any other way messages can be removed from a queue? I'm assuming the
message arrived for the event to be triggered.
The development team are going to look at what code changes they can do to
see if they can address the issue but I'm still worried it maybe some kind of
MSMQ/OS issue and code changes wont help?
Is there anything else we can investigate/test? Would msmq logging help?
Regards, Paul.
"John Breakwell (MSFT)" wrote:
> Hi
>
> You are using cursors (PeekCurrent, ReceiveCurrent) - you're just not
> manipulating them around the queue.
>
PTH: This is correct.
>
> You mention sending messages within a transaction but then say that the
> queue is not transactional.
> Are you referring to an external MSDTC transaction? If so, MSMQ won't use it
> to roll back the message as the queue is not transactional.
>
PTH: Again I think I maybe confusing matters with terminology not being a
developer. The messages are sent from a vb app and they are not sent within
a transaction using MSDTC or other 3rd party software. I tried making the
queues transactional but no messages were sent so I guess this would need
code changes.
>
> "Is there a performance overhead when using negative source journaling?"
> A Yes. Is the performance overhead going to be significant? I don't think
> so.
> Basically there may be an extra (tiny) message or 2 sent back to the client
> to confirm the message has been processed.
> If you DO use it, ensure you diligently manage the Dead letter and journal
> queues as otherwise the machine will fill up.
>
PTH: Thanks for this info, I will discuss with development and maybe do some
testing.
>
> "No, our applications aren't currently cluster aware so they are run
> manually on the active node."
> Applications don't need to be cluster aware to be clustered. As long as the
> application doesn't have a user interface, you can probably cluster it
> successfully.
>
PTH: We did try to cluster them as generic application resource types but
some of our other apps running on the cluster do have GUIs which caused some
techincal & support issues. Do you think this is still relevant to the
problem?
>
> "I have noticed though that there are outgoing queues from the local msmq
> instance to the clustered queues which I'm not convinced is correct?"
> MSMQ won't send messages unless you tell it to. If there is an outgoing
> queue then it is as a result of what your application is doing. What is the
> name of the outgoing queue?
>
PTH: the outgoing queue is <clus. msmq name>\application name. The same
queue that the application uses.
>
> "We have been using the application for 9 years on NT, 2000 & 2003 and have
> not had this issue until recently when moving it onto clusters."
> Was the system originally all running on the same machine, or split over two
> networked machines? I ask because your clustered system is effectively split
> over two networked machines - one machine has the client application whilst
> the other hosts the queues. They may be running in the same physical RAM of
> one of the cluster nodes but logically they are different locations. The
> application is performing a remote receive on messages in the clustered MSMQ
> queue using the RPC protocol. You have implemented a Peek-then-Receive
> process to simulate a transactional operation. (This functionality is now
> built in to MSMQ 4.0 with Transactional Remote Receives).
>
PTH: Normally it is running on the same server and we dont see the problem,
i.e. the application queue and application are on the same server. It is
only on a clustered environment we get the problem where logically the app is
on a different machine to msmq although they are physically the same server.
If we clustered the app and the test app and we didn't see the problem then I
guess this would indicate that this is contributing to the problem in some
way.
>
> Do you know where in the code you are seeing the exception raised?
> I assume from the use of "On Error Goto ErrorHandler" that you just know
> it's in mobjMyMMQEvent_Arrived() somewhere.
> Is there any other Peek or Receive code elsewhere that the application could
> branch to somehow?
>
PTH: The Peek & Receive code is the only instance in the application, the
application is single threaded and its the only application using the queue
so I cant see how the messages are getting removed?
>
> "It doesn't look like the hotfix is installed."
> The hotfix is only relevant for a problem where messages are skipped over
> and left in the queue.
> Your problem is that messages are being read form the queue but you don't
> know why.
> So don't expect anything different if you apply it - no harm in testing the
> fix out, though, just in case.
>
PTH: There are no messages left in the queue so I too don't think its the
same issue which is a shame as it would have made my life alot easier. I may
still try it anyway.
>
> Cheers
> John Breakwell (MSFT)
>
>
> "PTH" <P...@discussions.microsoft.com> wrote in message
> news:54AC3BF6-1D28-4F49...@microsoft.com...
I've managed to make some good progress with the developer today. It looks
like the error is happening on the peak message and when it then goes to the
error routine it removes it from the queue as it does a read of the message.
We have addressed it by reseting the queue and re-enabling notification
instead which doesn't seem to result in any of the messages being lost. We
still don't understand why we get the error in the first place but as long as
no data is lost I'm past caring.
Thanks for your help. If we discover anything else I will let you know.
Regards, PTH.
"PTH" wrote:
> The latest hotfix did not address the issue :-(
>
> Investigations on issue.
>
> Regards, Paul.
>
> "PTH" wrote:
>
> > Hi John,
> >
> > Hope you had a good weekend. I've managed to replicate the problem on a
> > non-clustered environment. Basically run the apps on one server but use
> > msmqs on another and I get the same problem. So the issue seems to be if the
> > app and queue are on different machines. I ran our test application for over
> > an hour and it sent nearly 8 million messages and didn't lose one. Soon as I
> > moved the queues to a different server messages were going missing within a
> > minute. I don't think its a code issue, because as far as the codes
> > concerned its just a queue it doesn't know whether its local or remote. I'm
> > not sure its a networking issue either as the message must be getting to the
> > queue for it to raise the event. Which leaves me to beleive its some kind of
> > msmq issue. I'm just in the process of applying the hotfix you recommended
> > (the latest & greatest ;-) so will let you know how I get on.
> >
> > If this doesn't work then I guess I may need to raise a support case.
> >
> > The test app is just in a loop, i.e. sending and receiving messages at the
> > same time. The application that errors is just a message forwards. Reads
> > messages from its queue and sends them to the relevant application, in this
> > case the test app.
> >
> > Regards, Paul.
> >
> > "John Breakwell (MSFT)" wrote:
> >
> > > Hi
> > >
> > > Apologies for the incomplete message.
> > > I hit "send" instead of "send later" as I've run out of time today to
> > > complete the message.
> > > I'll follow up later to fill in the blanks.
> > >
> > > Cheers
> > > John Breakwell (MSFT)
> > >