Blueflood Rollup service blocked

27 views
Skip to first unread message

BalaKrishna Mumdluri

unread,
Jul 20, 2017, 12:26:49 PM7/20/17
to Blueflood Discuss
We have been using Blueflood for a while in our project. It has been working well until recent. We are noticing an issue with Rollup engine that seem to have hogged and not rolluping any metrics all of sudden.
When I turn on debug mode and see the following message repeats all the time "Still waiting for rollups to finish reading for metrics_5m,1235,18 91033". This is basically from LocatorFetchRunnable where it has been kept in timed sleep till read is done for all metrics with in the rollup context. All 30 LocatorFetchRunnable threads are just sleeping to see if the read count to get down to NONE. All RollupRunnable 80 threads seem to executing dataToRollup() on AstyanaxReader. 

But Query API is able to fetch data points when requested from GUI as repair on read for SPLOT is SET.

When I see the JMX metrics, I dont see any error counters got incremented. But this metric "Scheduled Slot Check" value is growing over time. 

Any pointers would be helpful.




Chandrasekhar A

unread,
Jul 20, 2017, 1:44:04 PM7/20/17
to BalaKrishna Mumdluri, Blueflood Discuss
I dont currently work on the project, but I had worked before and here are my thoughts.

"Scheduled Slot Check" is growing over time. This just means that the number of slots that are pending to be rolled up are increasing gradually. This makes sense as LocatorFetchRunnable threads are sitting idle waiting for something and the queue is just getting piled up.

I would say find out why LocatorFetchRunnable thread is waiting. Is it because of executionContext.doneReading or executionContext.doneWriting? If it is cos of reading, try increasing MAX_ROLLUP_READ_THREADS and see if it helps. If it because of writing try increasing MAX_ROLLUP_WRITE_THREADS. But if it has been working okay so far, I would check cassandra logs to find out why reads/writes arent finishing.

Also, try restarting the app.

--
You received this message because you are subscribed to the Google Groups "Blueflood Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blueflood-discuss+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/blueflood-discuss.
To view this discussion on the web visit https://groups.google.com/d/msgid/blueflood-discuss/eac2adcc-2d42-4fab-9c16-d4740e0f7b36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chandrasekhar A

unread,
Jul 20, 2017, 1:46:03 PM7/20/17
to BalaKrishna Mumdluri, Blueflood Discuss
I would also makes sure that the number of locators per shard, havent increased over time. This would means you would need lot more rollup runnable read/write threads to finish processing a slot.

On Thu, Jul 20, 2017 at 12:44 PM, Chandrasekhar A <chandra...@gmail.com> wrote:
I dont currently work on the project, but I had worked before and here are my thoughts.

"Scheduled Slot Check" is growing over time. This just means that the number of slots that are pending to be rolled up are increasing gradually. This makes sense as LocatorFetchRunnable threads are sitting idle waiting for something and the queue is just getting piled up.

I would say find out why LocatorFetchRunnable thread is waiting. Is it because of executionContext.doneReading or executionContext.doneWriting? If it is cos of reading, try increasing MAX_ROLLUP_READ_THREADS and see if it helps. If it because of writing try increasing MAX_ROLLUP_WRITE_THREADS. But if it has been working okay so far, I would check cassandra logs to find out why reads/writes arent finishing.

Also, try restarting the app.

Chandrasekhar A

unread,
Jul 20, 2017, 2:51:33 PM7/20/17
to BalaKrishna Mumdluri, Blueflood Discuss
For Locator fetch threads of 30, we have 300 read threads and 30 write threads configured.

On Thu, Jul 20, 2017 at 12:46 PM, Chandrasekhar A <chandra...@gmail.com> wrote:
I would also makes sure that the number of locators per shard, havent increased over time. This would means you would need lot more rollup runnable read/write threads to finish processing a slot.
On Thu, Jul 20, 2017 at 12:44 PM, Chandrasekhar A <chandra...@gmail.com> wrote:
I dont currently work on the project, but I had worked before and here are my thoughts.

"Scheduled Slot Check" is growing over time. This just means that the number of slots that are pending to be rolled up are increasing gradually. This makes sense as LocatorFetchRunnable threads are sitting idle waiting for something and the queue is just getting piled up.

I would say find out why LocatorFetchRunnable thread is waiting. Is it because of executionContext.doneReading or executionContext.doneWriting? If it is cos of reading, try increasing MAX_ROLLUP_READ_THREADS and see if it helps. If it because of writing try increasing MAX_ROLLUP_WRITE_THREADS. But if it has been working okay so far, I would check cassandra logs to find out why reads/writes arent finishing.

Also, try restarting the app.

BalaKrishna Mumdluri

unread,
Jul 21, 2017, 2:44:26 AM7/21/17
to Blueflood Discuss, mumdluri.b...@gmail.com
Thanks Chandra for reply.

The LocatorFetchRunnable threads are waiting on  executionContext.doneReading() loop. All 80 RollupRunnable threads are also busy in getDataToRoll() on AstynaxReader. But the database layer seem to responding properly for Blueflood query layer (RollupHandler.java).

So you say RollupRunnable threads are busy? That too for 2 days? No rollups are persisted for last two days by Rollup Service except the rollups that triggered during repair on read.


BalaKrishna Mumdluri

unread,
Jul 21, 2017, 2:58:03 AM7/21/17
to Blueflood Discuss, mumdluri.b...@gmail.com
I dont see any JMX metric to keep an eye on number of locators per shard. Did I miss something?


On Thursday, 20 July 2017 23:16:03 UTC+5:30, Chandra Addala wrote:
I would also makes sure that the number of locators per shard, havent increased over time. This would means you would need lot more rollup runnable read/write threads to finish processing a slot.
On Thu, Jul 20, 2017 at 12:44 PM, Chandrasekhar A <chandra...@gmail.com> wrote:
I dont currently work on the project, but I had worked before and here are my thoughts.

"Scheduled Slot Check" is growing over time. This just means that the number of slots that are pending to be rolled up are increasing gradually. This makes sense as LocatorFetchRunnable threads are sitting idle waiting for something and the queue is just getting piled up.

I would say find out why LocatorFetchRunnable thread is waiting. Is it because of executionContext.doneReading or executionContext.doneWriting? If it is cos of reading, try increasing MAX_ROLLUP_READ_THREADS and see if it helps. If it because of writing try increasing MAX_ROLLUP_WRITE_THREADS. But if it has been working okay so far, I would check cassandra logs to find out why reads/writes arent finishing.

Also, try restarting the app.
On Thu, Jul 20, 2017 at 11:26 AM, BalaKrishna Mumdluri <mumdluri.b...@gmail.com> wrote:
We have been using Blueflood for a while in our project. It has been working well until recent. We are noticing an issue with Rollup engine that seem to have hogged and not rolluping any metrics all of sudden.
When I turn on debug mode and see the following message repeats all the time "Still waiting for rollups to finish reading for metrics_5m,1235,18 91033". This is basically from LocatorFetchRunnable where it has been kept in timed sleep till read is done for all metrics with in the rollup context. All 30 LocatorFetchRunnable threads are just sleeping to see if the read count to get down to NONE. All RollupRunnable 80 threads seem to executing dataToRollup() on AstyanaxReader. 

But Query API is able to fetch data points when requested from GUI as repair on read for SPLOT is SET.

When I see the JMX metrics, I dont see any error counters got incremented. But this metric "Scheduled Slot Check" value is growing over time. 

Any pointers would be helpful.




--
You received this message because you are subscribed to the Google Groups "Blueflood Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blueflood-disc...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages