Segment Handoff failed with too large for storage exception

475 views
Skip to first unread message

GunWoo Kim

unread,
Jul 28, 2016, 11:44:56 PM7/28/16
to Druid Development
Hi guys, I got an error on Indexing task segment hand off.

I checked historical node log and found an 'Failed to load segment' error message.

Log messages as follows:

2016-07-27 06:35:39,398  INFO [io.druid.server.coordination.ZkCoordinator] New request[LOAD: stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z] with zNode[/druid/loadQueue/ndap03.ndap.com:8083/stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z].
2016-07-27 06:35:39,398  INFO [io.druid.server.coordination.ZkCoordinator] Loading segment stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z
2016-07-27 06:35:39,398  WARN [io.druid.server.coordination.BatchDataSegmentAnnouncer] No path to unannounce segment[stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z]
2016-07-27 06:35:39,398  INFO [io.druid.server.coordination.ZkCoordinator] Completely removing [stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z] in [30,000] millis
2016-07-27 06:35:39,399  INFO [io.druid.server.coordination.ZkCoordinator] Completed request [LOAD: stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z]
2016-07-27 06:35:39,399 ERROR [io.druid.server.coordination.ZkCoordinator] Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z], segment=DataSegment{size=4033, shardSpec=LinearShardSpec{partitionNum=0}, metrics=[count], dimensions=[action, cause, dvc_type, scn, topology], version='2016-07-27T05:03:01.960Z', loadSpec={type=hdfs, path=hdfs://ndap09.ndap.com:8020/user/root/druid/segments/stb/20160727T050000.000Z_20160727T060000.000Z/2016-07-27T05_03_01.960Z/0/index.zip}, interval=2016-07-27T05:00:00.000Z/2016-07-27T06:00:00.000Z, dataSource='stb', binaryVersion='9'}}
io.druid.segment.loading.SegmentLoadingException: Exception loading segment[stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z]
Caused by: com.metamx.common.ISE: Segment[stb_2016-07-27T05:00:00.000Z_2016-07-27T06:00:00.000Z_2016-07-27T05:03:01.960Z:4,033] too large for storage[var/druid/segment-cache:-555,499,020].


It is really strange that the value in exception cause message is displayed as a negative number about segment-cache size.
too large for storage[var/druid/segment-cache:-555,499,020]


Historical node can't load segment and so Indexing task can't complete it's task.


My test environment as follows:
- CentOS 6.5
- Druid 0.9.0
- historical node config:
# HTTP server threads
druid.server.http.numThreads=50
druid.server.maxSize=300000000000

# Processing threads and buffers
druid.processing.buffer.sizeBytes=1073741824
druid.processing.numThreads=12

# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:15000000000}]


Any reply is appreciated. =)

Thank you.


Fangjin Yang

unread,
Jul 29, 2016, 6:38:31 PM7/29/16
to Druid Development
Hi GunWoo, how much disk space does your machine have? The maxSize should correlate to the actual available disk space. The default values are just some examples values.

GunWoo Kim

unread,
Aug 1, 2016, 6:54:46 AM8/1/16
to Druid Development
Hi Fangjin Yang, disk space for druid was set about 100Gb but from the druid document, i saw description for druid.server.maxSize property as below:
The maximum number of bytes-worth of segments that the node wants assigned to it. This is not a limit that Historical nodes actually enforces, just a value published to the Coordinator node so it can plan accordingly.

and the default value of it is 0(from druid document), so i did not care about it.

Segment storage property was set as below:
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:15000000000}]

and 66Gb disk available now for historical node.



Thank you for you reply.

Fangjin Yang

unread,
Aug 15, 2016, 6:01:39 PM8/15/16
to Druid Development
Druid.server.maxSize must be set to a non-zero value on historicals otherwise no segments will get downloaded. The documentation on that config should be improved. Druid definitely enforces this limit, but it is the coordinator that enforces the maxSize limit and the coordinator is what tells historicals to download segments.

Saravana Soundararajan

unread,
Apr 4, 2018, 2:35:34 PM4/4/18
to Druid Development
After seeing a similar error in Druid historical
Caused by: com.metamx.common.ISE: Segment[timeseries_dogstatsd_counter_2018-04-04T16:00:00.000Z_2018-04-04T17:00:00.000Z_2018-04-04T16:00:00.000Z_1528210995:152,770,889] too large for storage[/var/tmp/druid/indexCache:22,010].

, we notice that the historical node stops loading new segments from realtime and the realtime nodes starts accumulating segments.

Our maxSize settings goes like this and we had enough free disk space.

druid.server.maxSize=882159184076
druid.segmentCache.locations=[{"path":"/var/tmp/druid/indexCache","maxSize":882159184076}]

Restarting Druid historical fixes the issue. We suspect that there is something going wrong with how Druid calculates the available size i.e., 22,010

Reply all
Reply to author
Forward
0 new messages