max.poll.records and pause

714 views
Skip to first unread message

Yan Wang

unread,
Jun 23, 2017, 2:20:43 AM6/23/17
to kafka-clients
Hi! 

As I know, we only request a new fetch if number current fetched records smaller than max.poll.records. That is the law for the total number of records or for each partition?
If that is for the total number of records, we may have starving partitions.

Say, my max.poll.records is 1. I have two partitions P1, P2. 
P1 is fetched with 100 records. P2 is fetched 1 records. 
I have to finish all my P1 then I can fetch the records from the server. Is it correct?

What about pause? For example,  after we get a P1 record, we pause P1, then call poll again. I think it will return the record in P2.
Now, P2 has nothing in local. But P1 is still paused. If I call poll again, Do we fetch new records from the server or not?
If not, we have to wait till P1 is resumed. If P1 pauses too long time, we may have a lot of data for P2 in the server but not fetched to the local.

Thank you very much!

Yan

Ewen Cheslack-Postava

unread,
Jun 28, 2017, 12:57:32 AM6/28/17
to Yan Wang, kafka-clients
max.poll.records only controls the number of records returned from poll, but does not affect fetching. The consumer will try to prefetch records from all partitions it is assigned. It will then buffer those records and return them in batches of max.poll.records each (either all from the same topic partition if there are enough left to satisfy the number of records, or from multiple topic partitions if the data from the last fetch for one of the topic partitions does not cover the max.poll.records).

For your first example, the consumer does process all messages from a fetch request before moving onto another fetch request, so all the P1 records would be returned before the P2 record.

I actually thought your second scenario would return a P2 record after the P1 pause, but I think after looking at the code that we will, in fact, return the rest of P1 records we have in the current fetch request we're processing. I think that may be a bug. But in the case you gave where P2 has nothinglocal and P1 is paused, we definitely will not send a fetch request for P1 -- any partition that is paused will not send fetch requests. You must resume before fetch requests are sent for that partition.

-Ewen

--
You received this message because you are subscribed to the Google Groups "kafka-clients" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kafka-clients+unsubscribe@googlegroups.com.
To post to this group, send email to kafka-...@googlegroups.com.
Visit this group at https://groups.google.com/group/kafka-clients.
To view this discussion on the web visit https://groups.google.com/d/msgid/kafka-clients/665571df-1ae4-4894-b909-8e9ee3870b80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages