Index hadoop task NoSuchElementException

SAURABH JAIN

unread,

Oct 3, 2016, 8:10:28 AM10/3/16

to Druid User

Hello,

We are receiving the following error while indexing using hadoop and using hdfs as deep storage. We are using google connector with gcs as default fs. Attaching full job file as well as index hadoop spec input.

Here is the exception trace :

2016-10-03T11:25:05,679 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_prism-data-15_2016-10-03T10:59:46.123Z, type=index_hadoop, dataSource=prism-data-15}]

java.util.NoSuchElementException

at java.util.ArrayList$Itr.next(ArrayList.java:854) ~[?:1.8.0_101]

at com.google.common.collect.Iterators.getOnlyElement(Iterators.java:297) ~[guava-16.0.1.jar:?]

at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:285) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:202) ~[druid-indexing-service-0.9.2-rc1.jar:0.9.2-rc1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2-rc1.jar:0.9.2-rc1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2-rc1.jar:0.9.2-rc1]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]

Hadoop Index Spec is as follows :

{

"type": "index_hadoop",

"spec": {

"dataSchema": {

"dataSource": "prism-data-15",

"parser": {

"type": "string",

"parseSpec": {

"format": "json",

"dimensionsSpec": {

"dimensions": [

"event_id",

"lang",

"share_clicks",

"ts_bucket",

"old_hash_id",

"ab_test",

"event_name",

"title",

"noti_opened",

"fullstory_time_total",

"ts_back_valid",

"custom_title",

"targeted_city",

"at",

"short_view_event",

"published_dt",

"short_time",

"notification_type",

"variants",

"device_id",

"category",

"toss_opened",

"noti_shown",

"event_source",

"score",

"author",

"bookmark",

"is_video",

"source",

"like_count",

"share_view",

"vid_length",

"content",

"fullstory_view",

"ts_valid",

"targeted_country",

"video_event",

"shortened_url",

"toss_clicked",

"hashId",

"group_id",

"img_url",

"is_deleted"

]

},

"timestampSpec": {

"format": "millis",

"column": "at"

}

},

"metricsSpec": [{

"type": "count",

"name": "count"

}, {

"type": "doubleSum",

"name": "fullstory_total_time",

"fieldName": "fullstory_time_total"

}, {

"type": "longSum",

"name": "total_like_count",

"fieldName": "like_count"

}, {

"type": "longMax",

"name": "total_share_views",

"fieldName": "share_views"

}, {

"type": "longMax",

"name": "total_vid_length",

"fieldName": "vid_length"

}, {

"type": "doubleSum",

"name": "total_short_time",

"fieldName": "short_time"

}, {

"type": "hyperUnique",

"name": "distinct_user",

"fieldName": "device_id"

}, {

"type": "hyperUnique",

"name": "distinct_event",

"fieldName": "event_id"

}, {

"type": "hyperUnique",

"name": "distinct_hash_Id",

"fieldName": "hashId"

}, {

"type": "longSum",

"name": "total_bookmark",

"fieldName": "bookmark"

}, {

"type": "longSum",

"name": "total_fullstory_view",

"fieldName": "fullstory_view"

}, {

"type": "longSum",

"name": "total_noti_opened",

"fieldName": "noti_opened"

}, {

"type": "longSum",

"name": "total_noti_shown",

"fieldName": "noti_shown"

}, {

"type": "longSum",

"name": "total_toss_clicked",

"fieldName": "toss_clicked"

}, {

"type": "longSum",

"name": "total_toss_opened",

"fieldName": "toss_opened"

}, {

"type": "longSum",

"name": "total_share_click",

"fieldName": "share_clicks"

}, {

"type": "longSum",

"name": "total_short_views",

"fieldName": "short_view_event"

}, {

"type": "longSum",

"name": "total_video_views",

"fieldName": "video_event"

}, {

"type": "longSum",

"name": "total_ts_valid",

"fieldName": "ts_valid"

}, {

"type": "longSum",

"name": "total_full_ts_valid",

"fieldName": "ts_back_valid"

}, {

"type": "longMax",

"name": "is_ab",

"fieldName": "ab_test"

}, {

"type": "longMax",

"name": "ab_variants",

"fieldName": "variants"

}],

"granularitySpec": {

"type": "uniform",

"segmentGranularity": "DAY",

"queryGranularity": {

"type": "none"

},

"intervals": [

"2016-01-01T00:00:00.000Z/2017-12-30T00:00:00.000Z"

]

}},

"ioConfig": {

"type": "hadoop",

"inputSpec": {

"type": "static",

"paths": "gs://nis-prism/new/2016/08/02/part-*"

}

},

"tuningConfig": {

"type": "hadoop",

"partitionsSpec": {

"type": "hashed",

"targetPartitionSize": 2500000

},

"numBackgroundPersistThreads" : 1,

"overwriteFiles" : true

}

},

"hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:2.7.2"]

}

Thanks,

Saurabh

log-6.txt

Giri Tata

unread,

Oct 11, 2016, 1:14:41 PM10/11/16

to Druid User

Do you have a metric field name called share_views ? and dimension field called share_view? - Or is it some kind of typo?

"type": "longMax",

"name": "total_share_views",

"fieldName": "share_views"

Message has been deleted

Yogesh Agrawal

unread,

Apr 8, 2017, 5:24:46 PM4/8/17

to Druid User

Giri,

Are you saying that this error occurs when metrics/dimensions are not available in input dataset? I'm also having similar issue where my jobs are failing for fewer datasets but succeeding for other days. wondering if I should look at datasets for metrics/dimensions before suspecting this as druid issue.

Reply all

Reply to author

Forward