MQ dropping messages??

Kevin Esteb

unread,

Jan 18, 2012, 12:51:40 PM1/18/12

to PoCo::MQ

It seems that the MQ is dropping messages. Here is a log listing:

From the MQ:

[Wed Jan 18 09:37:09 2012] [info] STORE: COMPLEX: FRONT: Message 09AEE8D0-41FB-11E1-BCF8-5210816CF4A9 claimed by client 246

[Wed Jan 18 09:37:09 2012] [info] Dispatching message 09AEE8D0-41FB-11E1-BCF8-5210816CF4A9 to client 246

[Wed Jan 18 09:37:09 2012] [notice] RECV (246): ACK - message 09AEE8D0-41FB-11E1-BCF8-5210816CF4A9

[Wed Jan 18 09:37:10 2012] [info] Dispatching message 2CD24650-3E99-11E1-A493-8114816CF4A9 to client 246

[Wed Jan 18 09:37:11 2012] [notice] RECV (246): ACK - message 2CD24650-3E99-11E1-A493-8114816CF4A9

[Wed Jan 18 09:37:11 2012] [info] Dispatching message 2D022F8C-3E99-11E1-A493-8114816CF4A9 to client 246

[Wed Jan 18 09:37:11 2012] [notice] RECV (246): ACK - message 2D022F8C-3E99-11E1-A493-8114816CF4A9

[Wed Jan 18 09:37:14 2012] [info] Dispatching message 50902C56-3E99-11E1-A493-8114816CF4A9 to client 246

From the queue processor:

[2012/01/18 09:37:10] DEBUG - _server_message()

[2012/01/18 09:37:10] DEBUG - received a "MESSAGE" message

[2012/01/18 09:37:10] INFO - lmon: received message #2CD24650-3E99-11E1-A493-8114816CF4A9 type: "wise-lmon" from esd105-db-01

[2012/01/18 09:37:10] DEBUG - lmon: entering store_lmon()

[2012/01/18 09:37:11] DEBUG - lmon: key = esd105-db-01, 2012-01-14 02:18:00

[2012/01/18 09:37:11] DEBUG - lmon: mid = 14004998

[2012/01/18 09:37:11] INFO - lmon: processed 87 items from: esd105-db-01

[2012/01/18 09:37:11] DEBUG - lmon: leaving store_lmon()

send_data()

[2012/01/18 09:37:11] DEBUG - _server_message()

[2012/01/18 09:37:11] DEBUG - received a "MESSAGE" message

[2012/01/18 09:37:11] INFO - lmon: received message #2D022F8C-3E99-11E1-A493-8114816CF4A9 type: "wise-lmon" from wem-lmgt-01

[2012/01/18 09:37:11] DEBUG - lmon: entering store_lmon()

[2012/01/18 09:37:11] DEBUG - lmon: key = wem-lmgt-01, 2012-01-14 02:18:00

[2012/01/18 09:37:11] DEBUG - lmon: mid = 14004999

[2012/01/18 09:37:11] INFO - lmon: processed 59 items from: wem-lmgt-01

[2012/01/18 09:37:11] DEBUG - lmon: leaving store_lmon()

send_data()

As you can see the last message sent from the MQ is not received by the queue processor and queue processing stalls. This started happening after the first of the year. In an attempt to try an resolve the problem I have updated POE and MQ to the latest and greatest and separate the queue processor out into its own process. The MQ is running with these parameters:

--data-dir /data/db/mq --log-conf /wise/etc/log.conf --pidfile /wise/var/run/mq.pid --front-max 64M --pump-freq 5 --granularity 5 -stats -stats-interval 60

There is currently about a 120,000 message back log on this particular queue. The work around has been to restart the queue processor every 5 minutes.

Any help would be appreciated.

David Snopek

unread,

Jan 19, 2012, 2:06:50 PM1/19/12

to poc...@googlegroups.com

Hi Kevin,

I've been away from the MQ and Perl programming in general for about 3
years now. The last couple release were either (1) fixes for changes
to Moose or (2) integrating old branches that never were integrated
(the multi-server stuff).

So, I'm not sure I can jump in and help out on this one. Sorry. :-/

I think it might be time to hand this project off to new maintainer.
Is anyone interested?

Best regards,
David.

2012/1/18 Kevin Esteb <kes...@wsipc.org>:

> --
> You received this message because you are subscribed to the Google Groups
> "PoCo::MQ" group.
> To post to this group, send email to poc...@googlegroups.com.
> To unsubscribe from this group, send email to
> pocomq+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/pocomq?hl=en.

--
Blog: http://www.linguatrek.com
Product: http://www.bibliobird.com

Kevin Esteb

unread,

Jan 23, 2012, 4:07:59 PM1/23/12

to poc...@googlegroups.com

Sorry to hear that.

Well, brute forcing the queue processor, by restarting it every 5 minutes, has decreased the backlog from 120,000 messages down to around 30,000 messages. Playing with the various parameters has not made a noticeable difference in when the queue stalls. It is funny, another queue, that handles more messages is processing just fine. It makes me wonder if the backing store database is subtlety corrupted.

I have tried to decipher the code, but like I said before, I don't understand Moose. The problem appears to be when retrieving messages from the backing store. It is more likely to stall on a really old messages then newer messages. Is the backing store being used in a LIFO or FIFO fashion? From the log, it appears to take newer messages and then randomly processes older messages???

Strange stuff, anybody else have any insights?

Naveed Massjouni

unread,

Jan 23, 2012, 4:37:46 PM1/23/12

to poc...@googlegroups.com

On Thu, Jan 19, 2012 at 2:06 PM, David Snopek <dsn...@gmail.com> wrote:
> Hi Kevin,
>
> I've been away from the MQ and Perl programming in general for about 3
> years now. The last couple release were either (1) fixes for changes
> to Moose or (2) integrating old branches that never were integrated
> (the multi-server stuff).
>
> So, I'm not sure I can jump in and help out on this one. Sorry. :-/
>
> I think it might be time to hand this project off to new maintainer.
> Is anyone interested?
>
> Best regards,
> David.

I'm sad to hear to you have been away from Perl for so long. Hope you
will come back one day. I would be glad to help maintain pocomq. It
has always been one of my favorite modules.
-Naveed

David Snopek

unread,

Jan 23, 2012, 6:33:51 PM1/23/12

to poc...@googlegroups.com

2012/1/23 Naveed Massjouni <nave...@gmail.com>:

Naveed,

Your contributions to PoCo::MQ have always been greatly appreciated!

In thinking about this now, I don't think I'd need to give you
anything to allow you to help maintain pocomq. Since the code is in a
DVCS like Bazaar you can just branch it. Although, I know you've been
wanting to move to git for a long time. :-) The mirror in GitHub is
up-to-date:

https://github.com/dsnopek/POE--Component--MessageQueue

All the scripts and information for making a release is included in
the repo itself. If you were to upload a release on CPAN with a
higher version, it would simply supercede my version.

So, it's really just a matter of having some new changes to release! :-)

Best regards,
David.

David Snopek

unread,

Jan 23, 2012, 6:37:45 PM1/23/12

to poc...@googlegroups.com

2012/1/23 Kevin Esteb <kes...@wsipc.org>:

> Sorry to hear that.
>
> Well, brute forcing the queue processor, by restarting it every 5 minutes, has decreased the backlog from 120,000 messages down to around 30,000 messages. Playing with the various parameters has not made a noticeable difference in when the queue stalls. It is funny, another queue, that handles more messages is processing just fine. It makes me wonder if the backing store database is subtlety corrupted.
>
> I have tried to decipher the code, but like I said before, I don't understand Moose. The problem appears to be when retrieving messages from the backing store. It is more likely to stall on a really old messages then newer messages. Is the backing store being used in a LIFO or FIFO fashion? From the log, it appears to take newer messages and then randomly processes older messages???
>
> Strange stuff, anybody else have any insights?

So, I beleive it will pull from the DBI store in order. But there are
a number of optimizations at various levels that will return newer
messages first. The main idea is that, if it has a message in memory
*right now* and a subscriber willing to take the message *right now*,
just give it to 'em! :-) Waiting to get a new one from DBI will take
a longer.

I'm pretty sure it's possible to remove all those optimizations,
though, and use only DBI.

Hope that helps somewhat!

Best regards,
David.

Kevin Esteb

unread,

Jan 24, 2012, 1:46:59 PM1/24/12

to poc...@googlegroups.com

OK, the backlog cleared up this morning and I removed the brute force cron job. The queue processor is now running normally.

Observed behavior:

1) If a significant number of messages is placed into the backing store, the MQ will choke on them.
2) In order to get message processing restarted, you need to restart the queue processor.
3) When the queue processor starts, all current messages are processed as normal, messages from backing store appear to be processed in a random order.
4) After certain number of messages and/or time the MQ will stall on the backlog. This is indicated by saying that it has processed a message that a client has claimed, but that message never reaches the client, i.e. waiting for an ACK.
5) This seemed to start happening after v0.2.10, updating to v0.2.12 didn't improved the situation.

What is a significant number? At 200, it cleared up without intervention, at 2000, I had to intervene, at 200,000 or more the system is doing a whole lotta navel gazing and recovery is deleting the backlog.

Reply all

Reply to author

Forward