Python NSQ Reader not draining NSQ messages on some hosts stored In Memory

149 views
Skip to first unread message

Ryan Blomberg

unread,
Feb 21, 2018, 5:42:14 PM2/21/18
to nsq-users
Configuration

3 lookupd hosts
5 nsqd hosts

On a single host I am using pynsq to connect to NSQ using a lookupd hosts list

I am trying to read off of a topic + channel combo meaning the messages queue up on the channel until I clear them with the reader.

The problem I am seeing is that my consumer appears to connect to all NSQ hosts and is draining them OK except for those messages that are being held in memory on each host

NSQAdmin shows 4/5 hosts on that channel with the following

Memory + Disk
10000 + 0

The reader is reading at 500 messages / second which implies it is keeping up with the incoming messages except all hosts seem to maintain the 10k in memory buffer except one

That one stats at 0 messages on the NSQD side

My question is why are those messages being held in memory?

Jehiah Czebotar

unread,
Feb 21, 2018, 8:09:54 PM2/21/18
to Ryan Blomberg, nsq-users
Can you share some more concrete stats from nsqd that show the state you are describing? The output from curl "http://127.0.0.1:4151/stats?topic=$topic&channel=$channel" would be helpful

-- 
Jehiah

Ryan Blomberg

unread,
Feb 21, 2018, 11:45:54 PM2/21/18
to nsq-users

The python code doing the pynsq Reader call is

r = nsq.reader.Reader( message_handler = callback,
                    lookupd_http_addresses=lookupd_hosts,
                    topic=self.topic,
                    channel=self.channel or ephemeral_channel,
                    lookupd_poll_interval=10)
nsq.run()

Example stats from each nsqd instance (5/5)

nsqd v1.0.0-compat (built w/go1.8)
start_time 2018-02-09T05:50:11Z
uptime 310h49m27.165032009s

Health: OK

   [zync-session-tracking] depth: 0     be-depth: 0     msgs: 402932696 e2e%: 
      [nsq_to_file              ] depth: 0     be-depth: 0     inflt: 147  def: 0    re-q: 0     timeout: 313   msgs: 402932696 e2e%: 
        [V2 d8ecc30117ac         ] state: 3 inflt: 147  rdy: 400  fin: 335986864 re-q: 0        msgs: 335987011 connected: 227h27m4s
      [zx-67510#ephemeral       ] depth: 10000 be-depth: 0     inflt: 0    def: 0    re-q: 0     timeout: 0     msgs: 8676931  e2e%: 
        [V2 zx                   ] state: 3 inflt: 0    rdy: 0    fin: 0        re-q: 0        msgs: 0        connected: 6h7m32s
nsqd v1.0.0-compat (built w/go1.8)
start_time 2018-02-09T17:17:25Z
uptime 299h22m57.129335099s

Health: OK

   [zync-session-tracking] depth: 0     be-depth: 0     msgs: 332231918 e2e%: 
      [nsq_to_file              ] depth: 0     be-depth: 0     inflt: 137  def: 0    re-q: 0     timeout: 288   msgs: 332231918 e2e%: 
        [V2 d8ecc30117ac         ] state: 3 inflt: 137  rdy: 400  fin: 277342135 re-q: 0        msgs: 277342272 connected: 227h27m48s
      [zx-67510#ephemeral       ] depth: 10000 be-depth: 0     inflt: 0    def: 0    re-q: 0     timeout: 0     msgs: 7181588  e2e%: 
        [V2 zx                   ] state: 3 inflt: 0    rdy: 0    fin: 0        re-q: 0        msgs: 0        connected: 6h8m16s
nsqd v1.0.0-compat (built w/go1.8)
start_time 2018-02-09T05:50:15Z
uptime 310h50m10.65831391s

Health: OK

   [zync-session-tracking] depth: 0     be-depth: 0     msgs: 411370974 e2e%: 
      [nsq_to_file              ] depth: 0     be-depth: 0     inflt: 23   def: 0    re-q: 0     timeout: 351   msgs: 411370974 e2e%: 
        [V2 d8ecc30117ac         ] state: 3 inflt: 23   rdy: 400  fin: 342607998 re-q: 0        msgs: 342608021 connected: 227h27m51s
      [zx-67510#ephemeral       ] depth: 0     be-depth: 0     inflt: 0    def: 0    re-q: 0     timeout: 0     msgs: 8822739  e2e%: 
        [V2 zx                   ] state: 3 inflt: 0    rdy: 1    fin: 8822739  re-q: 0        msgs: 8822739  connected: 6h8m19s
nsqd v1.0.0-compat (built w/go1.8)
start_time 2018-02-09T05:50:16Z
uptime 310h50m14.206657965s

Health: OK

   [zync-session-tracking] depth: 0     be-depth: 0     msgs: 329125100 e2e%: 
      [nsq_to_file              ] depth: 0     be-depth: 0     inflt: 177  def: 0    re-q: 0     timeout: 282   msgs: 329125100 e2e%: 
        [V2 d8ecc30117ac         ] state: 3 inflt: 177  rdy: 400  fin: 275637162 re-q: 0        msgs: 275637339 connected: 227h27m56s
      [zx-67510#ephemeral       ] depth: 10000 be-depth: 0     inflt: 0    def: 0    re-q: 0     timeout: 0     msgs: 7132768  e2e%: 
        [V2 zx                   ] state: 3 inflt: 0    rdy: 0    fin: 0        re-q: 0        msgs: 0        connected: 6h8m24s
nsqd v1.0.0-compat (built w/go1.8)
start_time 2018-02-09T05:50:03Z
uptime 310h50m30.880939237s

Health: OK

   [zync-session-tracking] depth: 0     be-depth: 0     msgs: 327691258 e2e%: 
      [nsq_to_file              ] depth: 0     be-depth: 0     inflt: 16   def: 0    re-q: 0     timeout: 296   msgs: 327691258 e2e%: 
        [V2 d8ecc30117ac         ] state: 3 inflt: 16   rdy: 400  fin: 274052284 re-q: 0        msgs: 274052300 connected: 227h28m0s
      [zx-67510#ephemeral       ] depth: 10000 be-depth: 0     inflt: 0    def: 0    re-q: 0     timeout: 0     msgs: 7122885  e2e%: 
        [V2 zx                   ] state: 3 inflt: 0    rdy: 0    fin: 0        re-q: 0        msgs: 0        connected: 6h8m28s

Jehiah Czebotar

unread,
Feb 21, 2018, 11:59:28 PM2/21/18
to Ryan Blomberg, nsq-users
In NSQ, "ephemeral" channels are designed to be used when it's not important that you get all the messages, and you want to bound how many are queued. The number queued is bounded by "-mem-queue-size" which defaults to 10k. What this means in practice for you is that your "zx-67510#ephemeral" channel will queue 10k messages for the client, but not a single one more. If you were consuming from a non-ephemeral channel you would see the full message backlog overflow to disk.

The following line is the lifetime metrics for a connection from your client to nsqd. Note "msgs: 0" indicating this particular connection never processed any messages.

        [V2 zx                   ] state: 3 inflt: 0    rdy: 0    fin: 0        re-q: 0        msgs: 0        connected: 6h8m28s

however if you look at the line for the nsqd that is getting messages drained, you'll see something different. That connection is consuming messages.

        [V2 zx                   ] state: 3 inflt: 0    rdy: 1    fin: 8822739  re-q: 0        msgs: 8822739  connected: 6h8m19s

You'll also note "rdy: 1" indicating that the client is ready for 1 message. 

The issue in your situation is that you have four message sources, but your client is only ready for one message at a time so it can't get messages from all four sources. If you set max_in_flight to 4, your consumer will be able to send "rdy: 1" to all four nsqd's and consume all the messages.


-- 
Jehiah

Ryan Blomberg

unread,
Feb 22, 2018, 1:14:18 AM2/22/18
to nsq-users
Thank you so much for the response.  The #ephemeral thing was just me testing.  Typically I connect to a specific channel.

The max_in_flight fixed the issue so thanks again for the advice.
Reply all
Reply to author
Forward
0 new messages