Exception during rebalance

82 views
Skip to first unread message

Xiaoming Zhang

unread,
Sep 3, 2014, 1:46:20 AM9/3/14
to druid-de...@googlegroups.com
We have a cluster of 1180,1181,1182,1183 real time nodes, and we have two kafka cluster, each cluster will send topic Trkng.druid-sessionEven and Trkng.druid-sojEvent to our RT nodes.

And this morning, we find this:

Group           Topic                          Pid Offset          logSize         Lag             Owner
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
druid-group     Trkng.druid-sessionEvent       0   46633398        50023348        3389950         none
druid-group     Trkng.druid-sessionEvent       1   46632780        50017228        3384448         none
druid-group     Trkng.druid-sessionEvent       2   46635465        50025836        3390371         none
druid-group     Trkng.druid-sessionEvent       3   46631205        50026472        3395267         none
druid-group     Trkng.druid-sessionEvent       4   46684305        50025745        3341440         none
druid-group     Trkng.druid-sessionEvent       5   46676594        50026989        3350395         none
druid-group     Trkng.druid-sessionEvent       6   46637137        50026441        3389304         none
druid-group     Trkng.druid-sessionEvent       7   46698363        50019958        3321595         none
druid-group     Trkng.druid-sessionEvent       8   46647640        50033630        3385990         none
druid-group     Trkng.druid-sessionEvent       9   46657940        50012439        3354499         none
druid-group     Trkng.druid-sessionEvent       10  46631774        50025077        3393303         none
druid-group     Trkng.druid-sessionEvent       11  46636472        50023383        3386911         none
druid-group     Trkng.druid-sessionEvent       12  46646085        50025555        3379470         none
druid-group     Trkng.druid-sessionEvent       13  46685613        50023757        3338144         none
druid-group     Trkng.druid-sessionEvent       14  46649747        50027325        3377578         none
druid-group     Trkng.druid-sessionEvent       15  46655034        50031805        3376771         none
druid-group     Trkng.druid-sessionEvent       16  46635949        50014838        3378889         none
druid-group     Trkng.druid-sessionEvent       17  46648086        50027776        3379690         none
druid-group     Trkng.druid-sessionEvent       18  46651771        50027803        3376032         none
druid-group     Trkng.druid-sessionEvent       19  46641373        50021013        3379640         none
druid-group     Trkng.druid-sessionEvent       20  46675506        50021094        3345588         none
druid-group     Trkng.druid-sessionEvent       21  46622957        50019899        3396942         none
druid-group     Trkng.druid-sessionEvent       22  46645628        50022884        3377256         none
druid-group     Trkng.druid-sessionEvent       23  46683581        50026612        3343031         none
druid-group     Trkng.druid-sessionEvent       24  46687924        50031538        3343614         none
druid-group     Trkng.druid-sessionEvent       25  46642392        50020689        3378297         none
druid-group     Trkng.druid-sessionEvent       26  46678776        50020449        3341673         none
druid-group     Trkng.druid-sessionEvent       27  46678240        50030068        3351828         none
druid-group     Trkng.druid-sessionEvent       28  46637131        50025850        3388719         none
druid-group     Trkng.druid-sessionEvent       29  46630372        50026858        3396486         none
druid-group     Trkng.druid-sessionEvent       30  46679413        50025061        3345648         none
druid-group     Trkng.druid-sessionEvent       31  46663765        50036960        3373195         none
druid-group     Trkng.druid-sessionEvent       32  46658429        50038392        3379963         none
druid-group     Trkng.druid-sessionEvent       33  46654579        50032438        3377859         none
druid-group     Trkng.druid-sessionEvent       34  46679864        50024647        3344783         none
druid-group     Trkng.druid-sessionEvent       35  46650554        50026681        3376127         none
druid-group     Trkng.druid-sojEvent           0   206163160       206189990       26830           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           1   206126549       206153594       27045           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           2   206123424       206150064       26640           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           3   206158759       206185405       26646           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           4   206141096       206168015       26919           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           5   206170143       206197244       27101           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           6   206124669       206151470       26801           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           7   206151135       206177973       26838           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           8   206152857       206179664       26807           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           9   206171542       206198506       26964           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           10  206145569       206172531       26962           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           11  206152698       206179809       27111           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           12  206145874       206172468       26594           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           13  206141492       206168464       26972           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           14  206172780       206199552       26772           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           15  206159638       206186557       26919           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           16  206164997       206191837       26840           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           17  206151750       206178602       26852           druid-group_phxdbx1180.phx.ebay.com-1409647062845-479fc732-0
druid-group     Trkng.druid-sojEvent           18  206162487       206179744       17257           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           19  206161661       206178930       17269           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           20  206158101       206175175       17074           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           21  206166774       206183889       17115           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           22  206175803       206192989       17186           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           23  206128422       206145471       17049           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           24  206161792       206178944       17152           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           25  206175151       206192048       16897           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           26  206148039       206165102       17063           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           27  206164796       206181775       16979           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           28  206174377       206191696       17319           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           29  206167578       206185043       17465           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           30  206173223       206190290       17067           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           31  206151490       206168419       16929           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           32  206134050       206151349       17299           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           33  206156147       206173577       17430           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           34  206169767       206186736       16969           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0
druid-group     Trkng.druid-sojEvent           35  206165856       206183368       17512           druid-group_phxdbx1181.phx.ebay.com-1409647096488-fbfd72a4-0


And we check the log of 1180 and 1181, and find a lot of this exception:
druid.log.9:org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /consumers/druid-group/ids/druid-group_phxdbx1182.phx.ebay.com-1409561136158-16ea6930
druid.log.9:    at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
druid.log.9:2014-09-01 21:38:56,094 INFO  [druid-group_phxdbx1181.phx.ebay.com-1409561131136-890e11ab_watcher_executor] kafka.consumer.ZookeeperConsumerConnector - [druid-group_phxdbx1181.phx.ebay.com-1409561131136-890e11ab], exception during rebalance

While i check the log of 1182 and 1183, the error is like this:
2014-09-02 10:06:55,892 INFO  [chief-soj_real] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"alerts","timestamp":"2014-09-02T10:06:55.892Z","service":"realtime","host":"phxdbx1182.phx.ebay.com:8083","severity":"component-failure","description":"RuntimeException aborted realtime processing[soj_real]","data":{"class":"io.druid.segment.realtime.RealtimeManager","exceptionType":"com.metamx.common.ISE","exceptionMessage":"hydrant[FireHydrant{index=null, queryable=io.druid.segment.QueryableIndexSegment@5140f075, count=63}] not the right count[62]","exceptionStackTrace":"com.metamx.common.ISE: hydrant[FireHydrant{index=null, queryable=io.druid.segment.QueryableIndexSegment@5140f075, count=63}] not the right count[62]\n\tat io.druid.segment.realtime.plumber.Sink.<init>(Sink.java:92)\n\tat io.druid.segment.realtime.plumber.RealtimePlumber.bootstrapSinksFromDisk(RealtimePlumber.java:538)\n\tat io.druid.segment.realtime.plumber.RealtimePlumber.startJob(RealtimePlumber.java:152)\n\tat io.druid.segment.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:184)\n"}}]
2014-09-02 10:07:03,736 ERROR [chief-session_real] io.druid.segment.realtime.RealtimeManager - RuntimeException aborted realtime processing[session_real]: {class=io.druid.segment.realtime.RealtimeManager, exceptionType=class com.metamx.common.ISE, exceptionMessage=hydrant[FireHydrant{index=null, queryable=io.druid.segment.QueryableIndexSegment@42b31b33, count=26}] not the right count[25]}
2014-09-02 10:07:03,737 INFO  [chief-session_real] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"alerts","timestamp":"2014-09-02T10:07:03.737Z","service":"realtime","host":"phxdbx1182.phx.ebay.com:8083","severity":"component-failure","description":"RuntimeException aborted realtime processing[session_real]","data":{"class":"io.druid.segment.realtime.RealtimeManager","exceptionType":"com.metamx.common.ISE","exceptionMessage":"hydrant[FireHydrant{index=null, queryable=io.druid.segment.QueryableIndexSegment@42b31b33, count=26}] not the right count[25]","exceptionStackTrace":"com.metamx.common.ISE: hydrant[FireHydrant{index=null, queryable=io.druid.segment.QueryableIndexSegment@42b31b33, count=26}] not the right count[25]\n\tat io.druid.segment.realtime.plumber.Sink.<init>(Sink.java:92)\n\tat io.druid.segment.realtime.plumber.RealtimePlumber.bootstrapSinksFromDisk(RealtimePlumber.java:538)\n\tat io.druid.segment.realtime.plumber.RealtimePlumber.startJob(RealtimePlumber.java:152)\n\tat io.druid.segment.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:184)\n"}}]

Could you help me to find the cause and solve this problem?

Thank you very much!

Gian Merlino

unread,
Sep 3, 2014, 7:37:25 PM9/3/14
to druid-de...@googlegroups.com
The more interesting error looks like the bootstrapSinksFromDisk related one. The kafka error looks like something that happens from time to time but is harmless if not persistent.

What version of Druid are you using? Does this problem occur every time you start up?

Xiaoming Zhang

unread,
Sep 3, 2014, 10:31:36 PM9/3/14
to druid-de...@googlegroups.com
The version we use is 0.6.121 and this error occurs every time we start up.

Full gc caused rebalance, and as time goes by, the capacity full gc may decrease goes down.

Once Full gc can not decrease capacity, the machine loses from ZK nodes.

在 2014年9月4日星期四UTC+8上午7时37分25秒,Gian Merlino写道:

Fangjin Yang

unread,
Sep 4, 2014, 1:10:13 PM9/4/14
to druid-de...@googlegroups.com
I believe the bootstrap error is the same problem in https://groups.google.com/forum/#!topic/druid-development/NzuHcYgQCvg. Will continue discussion in that thread over the possible problem. I suspect that the problem may already have been fixed in the latest stable, but need to understand things a bit more to make sure.
Reply all
Reply to author
Forward
0 new messages