Historical node alert and exception when dropping the segment

138 visualizações
Ir para a primeira mensagem não lida

Jae Hyeon Bae

não lida,
02/02/2013, 10:42:4402/02/13
para druid-de...@googlegroups.com
Intermittently, I am getting the following alert. This looks a little harmless because compute node threw an exception and immediately ZkCoordinator processed the segment and successfully deleted. But the alert sent by an email is making me nervous whenever I saw the message 'check the alert'. This is happening on several compute nodes among a dozen of compute nodes.

Please read the following log snippet.

INFO  2013-02-02 12:51:44,637 [PhoneBook--0] com.metamx.druid.coordination.ZkCoordinator: New node[streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16] with segmentClass[class com.metamx.druid.coordination.SegmentChangeRequestDrop]
INFO  2013-02-02 12:51:44,637 [PhoneBook--0] com.metamx.druid.coordination.ServerManager: Told to delete a queryable on dataSource[streaming_client_log] for interval[2013-01-30T08:00:00.000Z/2013-01-30T09:00:00.000Z] and version [2013-01-30T08:00:00.000Z] that I don't have.
INFO  2013-02-02 12:51:44,638 [PhoneBook--0] com.metamx.druid.loading.S3ZippedSegmentGetter: Deleting directory[/mnt/data/druid/indexCache/netflix.s3.genpop.prod/druid/realtime/streaming_client_log/2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z/2013-01-30T08:00:00.000Z/16]
WARN  2013-02-02 12:51:44,638 [PhoneBook--0] com.metamx.druid.coordination.ZkCoordinator: Unable to delete segmentInfoCacheFile[/mnt/data/druid/segmentInfoCache/streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16]
ERROR 2013-02-02 12:51:44,638 [PhoneBook--0] com.metamx.druid.coordination.ZkCoordinator: Exception thrown when dropping segment[DataSegment{size=537090652, shardSpec=com.netflix.nfdruid.shard.LinearShardSpec@71217006, metrics=[resCount], dimensions=[areacode, city, client_version, company, coordinate, country, device_model, devtype_id, error_code, error_subcode, event_type, firmware_version, nccp_version, network, network_spec, network_type, nrdp_version, region, severity, ui_version, video_id, zipcode], version='2013-01-30T08:00:00.000Z', loadSpec={type=s3_zip, bucket=netflix.s3.genpop.prod, key=druid/realtime/streaming_client_log/2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z/2013-01-30T08:00:00.000Z/16/index.zip}, interval=2013-01-30T08:00:00.000Z/2013-01-30T09:00:00.000Z, dataSource='streaming_client_log'}]
com.metamx.common.IAE: Cannot unannounce node[streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16] on service[/druid/compressed/servedSegments/ec2-107-22-63-83.compute-1.amazonaws.com:7102]        
at com.metamx.phonebook.StoppedPhoneBook.unannounce(StoppedPhoneBook.java:115)        
at com.metamx.phonebook.BasePhoneBook.unannounce(BasePhoneBook.java:113)        
at com.metamx.druid.coordination.ZkCoordinator.removeSegment(ZkCoordinator.java:279)        
at com.metamx.druid.coordination.SegmentChangeRequestDrop.go(SegmentChangeRequestDrop.java:52)        
at com.metamx.druid.coordination.ZkCoordinator$1.newEntry(ZkCoordinator.java:143)        
at com.metamx.druid.coordination.ZkCoordinator$1.newEntry(ZkCoordinator.java:130)        
at com.netflix.nfdruid.NFZKPhoneBook$InternalYellowPages$UpdatingRunnable.run(NFZKPhoneBook.java:437)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)          
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)        
at java.lang.Thread.run(Thread.java:662)
INFO  2013-02-02 12:51:44,638 [PhoneBook--0] com.metamx.emitter.core.LoggingEmitter: Event [{"feed":"alerts","timestamp":"2013-02-02T12:51:44.638Z","service":"druid/compute","host":"ec2-107-22-63-83.compute-1.amazonaws.com:7102","severity":"component-failure","description":"Failed to remove segment","data":{"segment":"DataSegment{size=537090652, shardSpec=com.netflix.nfdruid.shard.LinearShardSpec@71217006, metrics=[resCount], dimensions=[areacode, city, client_version, company, coordinate, country, device_model, devtype_id, error_code, error_subcode, event_type, firmware_version, nccp_version, network, network_spec, network_type, nrdp_version, region, severity, ui_version, video_id, zipcode], version='2013-01-30T08:00:00.000Z', loadSpec={type=s3_zip, bucket=netflix.s3.genpop.prod, key=druid/realtime/streaming_client_log/2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z/2013-01-30T08:00:00.000Z/16/index.zip}, interval=2013-01-30T08:00:00.000Z/2013-01-30T09:00:00.000Z, dataSource='streaming_client_log'}","exception":"com.metamx.common.IAE: Cannot unannounce node[streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16] on service[/druid/compressed/servedSegments/ec2-107-22-63-83.compute-1.amazonaws.com:7102]"}}]
INFO  2013-02-02 12:51:44,641 [PhoneBook--0] com.metamx.druid.coordination.ZkCoordinator: Completed processing for node[streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16]
INFO  2013-02-02 12:51:44,642 [PhoneBook--0] com.metamx.druid.coordination.ZkCoordinator: streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16 was removed

Thank you
Best, Jae

Eric Tschetter

não lida,
03/02/2013, 12:00:5403/02/13
para druid-de...@googlegroups.com
Jae,

It looks like the node might be getting told to drop that segment more than once.  Can you look through the logs and see if you find multiple lines that look like

New node[streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16] with segmentClass[class com.metamx.druid.coordination.SegmentChangeRequestDrop]

The key point there is the segment id "streaming_client_log_2013-01-30T08:00:00.000Z_2013-01-30T09:00:00.000Z_2013-01-30T08:00:00.000Z_16", if this is the case and it's just getting asked to do it multiple times, then you shouldn't have to worry about the exception.  If you are not seeing multiple lines in there indicating that it was dropping the same segment, then there might be an issue.  Either way, the exception out of the phone book is a little overzealous, it doesn't actually have to be an exception, so I'll adjust that.

--Eric


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/druid-development/-/3aLLLc7lh_YJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Jae Hyeon Bae

não lida,
03/02/2013, 13:13:3303/02/13
para druid-de...@googlegroups.com
I checked the log but I cannot find any multiple SegmentChangeRequestDrop log lines with same segment ID.

Thank you
Best, Jae
Responder a todos
Responder ao autor
Reencaminhar
0 mensagens novas