Python NSQ Reader not draining NSQ messages on some hosts stored In Memory

Ryan Blomberg

unread,

Feb 21, 2018, 5:42:14 PM2/21/18

to nsq-users

Configuration

3 lookupd hosts
5 nsqd hosts

On a single host I am using pynsq to connect to NSQ using a lookupd hosts list

I am trying to read off of a topic + channel combo meaning the messages queue up on the channel until I clear them with the reader.

The problem I am seeing is that my consumer appears to connect to all NSQ hosts and is draining them OK except for those messages that are being held in memory on each host

NSQAdmin shows 4/5 hosts on that channel with the following

Memory + Disk
10000 + 0

The reader is reading at 500 messages / second which implies it is keeping up with the incoming messages except all hosts seem to maintain the 10k in memory buffer except one

That one stats at 0 messages on the NSQD side

My question is why are those messages being held in memory?

Jehiah Czebotar

unread,

Feb 21, 2018, 8:09:54 PM2/21/18

to Ryan Blomberg, nsq-users

Can you share some more concrete stats from nsqd that show the state you are describing? The output from curl "http://127.0.0.1:4151/stats?topic=$topic&channel=$channel" would be helpful

--

Jehiah

Ryan Blomberg

unread,

Feb 21, 2018, 11:45:54 PM2/21/18

to nsq-users

The python code doing the pynsq Reader call is

r = nsq.reader.Reader( message_handler = callback,

lookupd_http_addresses=lookupd_hosts,

topic=self.topic,

channel=self.channel or ephemeral_channel,

lookupd_poll_interval=10)

nsq.run()

Example stats from each nsqd instance (5/5)

[root@c58604a3ef45 /]$ curl "http://prod-zync-docker01.zx:4151/stats?topic=zync-session-tracking&channel=zx-67510#ephemeral"

nsqd v1.0.0-compat (built w/go1.8)

start_time 2018-02-09T05:50:11Z

uptime 310h49m27.165032009s

Health: OK

[zync-session-tracking] depth: 0 be-depth: 0 msgs: 402932696 e2e%:

[nsq_to_file ] depth: 0 be-depth: 0 inflt: 147 def: 0 re-q: 0 timeout: 313 msgs: 402932696 e2e%:

[V2 d8ecc30117ac ] state: 3 inflt: 147 rdy: 400 fin: 335986864 re-q: 0 msgs: 335987011 connected: 227h27m4s

[zx-67510#ephemeral ] depth: 10000 be-depth: 0 inflt: 0 def: 0 re-q: 0 timeout: 0 msgs: 8676931 e2e%:

[V2 zx ] state: 3 inflt: 0 rdy: 0 fin: 0 re-q: 0 msgs: 0 connected: 6h7m32s

[root@c58604a3ef45 /]$ curl "http://prod-zync-docker02.zx:4151/stats?topic=zync-session-tracking&channel=zx-67510#ephemeral"

nsqd v1.0.0-compat (built w/go1.8)

start_time 2018-02-09T17:17:25Z

uptime 299h22m57.129335099s

Health: OK

[zync-session-tracking] depth: 0 be-depth: 0 msgs: 332231918 e2e%:

[nsq_to_file ] depth: 0 be-depth: 0 inflt: 137 def: 0 re-q: 0 timeout: 288 msgs: 332231918 e2e%:

[V2 d8ecc30117ac ] state: 3 inflt: 137 rdy: 400 fin: 277342135 re-q: 0 msgs: 277342272 connected: 227h27m48s

[zx-67510#ephemeral ] depth: 10000 be-depth: 0 inflt: 0 def: 0 re-q: 0 timeout: 0 msgs: 7181588 e2e%:

[V2 zx ] state: 3 inflt: 0 rdy: 0 fin: 0 re-q: 0 msgs: 0 connected: 6h8m16s

[root@c58604a3ef45 /]$ curl "http://prod-zync-docker03.zx:4151/stats?topic=zync-session-tracking&channel=zx-67510#ephemeral"

nsqd v1.0.0-compat (built w/go1.8)

start_time 2018-02-09T05:50:15Z

uptime 310h50m10.65831391s

Health: OK

[zync-session-tracking] depth: 0 be-depth: 0 msgs: 411370974 e2e%:

[nsq_to_file ] depth: 0 be-depth: 0 inflt: 23 def: 0 re-q: 0 timeout: 351 msgs: 411370974 e2e%:

[V2 d8ecc30117ac ] state: 3 inflt: 23 rdy: 400 fin: 342607998 re-q: 0 msgs: 342608021 connected: 227h27m51s

[zx-67510#ephemeral ] depth: 0 be-depth: 0 inflt: 0 def: 0 re-q: 0 timeout: 0 msgs: 8822739 e2e%:

[V2 zx ] state: 3 inflt: 0 rdy: 1 fin: 8822739 re-q: 0 msgs: 8822739 connected: 6h8m19s

[root@c58604a3ef45 /]$ curl "http://prod-zync-docker04.zx:4151/stats?topic=zync-session-tracking&channel=zx-67510#ephemeral"

nsqd v1.0.0-compat (built w/go1.8)

start_time 2018-02-09T05:50:16Z

uptime 310h50m14.206657965s

Health: OK

[zync-session-tracking] depth: 0 be-depth: 0 msgs: 329125100 e2e%:

[nsq_to_file ] depth: 0 be-depth: 0 inflt: 177 def: 0 re-q: 0 timeout: 282 msgs: 329125100 e2e%:

[V2 d8ecc30117ac ] state: 3 inflt: 177 rdy: 400 fin: 275637162 re-q: 0 msgs: 275637339 connected: 227h27m56s

[zx-67510#ephemeral ] depth: 10000 be-depth: 0 inflt: 0 def: 0 re-q: 0 timeout: 0 msgs: 7132768 e2e%:

[V2 zx ] state: 3 inflt: 0 rdy: 0 fin: 0 re-q: 0 msgs: 0 connected: 6h8m24s

[root@c58604a3ef45 /]$ curl "http://prod-zync-docker05.zx:4151/stats?topic=zync-session-tracking&channel=zx-67510#ephemeral"

nsqd v1.0.0-compat (built w/go1.8)

start_time 2018-02-09T05:50:03Z

uptime 310h50m30.880939237s

Health: OK

[zync-session-tracking] depth: 0 be-depth: 0 msgs: 327691258 e2e%:

[nsq_to_file ] depth: 0 be-depth: 0 inflt: 16 def: 0 re-q: 0 timeout: 296 msgs: 327691258 e2e%:

[V2 d8ecc30117ac ] state: 3 inflt: 16 rdy: 400 fin: 274052284 re-q: 0 msgs: 274052300 connected: 227h28m0s

[zx-67510#ephemeral ] depth: 10000 be-depth: 0 inflt: 0 def: 0 re-q: 0 timeout: 0 msgs: 7122885 e2e%:

[V2 zx ] state: 3 inflt: 0 rdy: 0 fin: 0 re-q: 0 msgs: 0 connected: 6h8m28s

Jehiah Czebotar

unread,

Feb 21, 2018, 11:59:28 PM2/21/18

to Ryan Blomberg, nsq-users

In NSQ, "ephemeral" channels are designed to be used when it's not important that you get all the messages, and you want to bound how many are queued. The number queued is bounded by "-mem-queue-size" which defaults to 10k. What this means in practice for you is that your "zx-67510#ephemeral" channel will queue 10k messages for the client, but not a single one more. If you were consuming from a non-ephemeral channel you would see the full message backlog overflow to disk.

The following line is the lifetime metrics for a connection from your client to nsqd. Note "msgs: 0" indicating this particular connection never processed any messages.

[V2 zx ] state: 3 inflt: 0 rdy: 0 fin: 0 re-q: 0 msgs: 0 connected: 6h8m28s

however if you look at the line for the nsqd that is getting messages drained, you'll see something different. That connection is consuming messages.

[V2 zx ] state: 3 inflt: 0 rdy: 1 fin: 8822739 re-q: 0 msgs: 8822739 connected: 6h8m19s

You'll also note "rdy: 1" indicating that the client is ready for 1 message.

The issue in your situation is that you have four message sources, but your client is only ready for one message at a time so it can't get messages from all four sources. If you set max_in_flight to 4, your consumer will be able to send "rdy: 1" to all four nsqd's and consume all the messages.

https://pynsq.readthedocs.io/en/latest/reader.html#nsq.Reader

http://nsq.io/overview/design.html

--

Jehiah

Ryan Blomberg

unread,

Feb 22, 2018, 1:14:18 AM2/22/18

to nsq-users

Thank you so much for the response. The #ephemeral thing was just me testing. Typically I connect to a specific channel.

The max_in_flight fixed the issue so thanks again for the advice.

Reply all

Reply to author

Forward