batch indexing hadoop task failed with FileNotFoundException.

Lu Xuechao

unread,

Mar 31, 2015, 11:29:18 PM3/31/15

to druid...@googlegroups.com

Hi team,

Our hadoop batch indexing task failed with

2015-04-01 03:16:16,419 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_pulsar_event_merged_2015-04-01T03:15:47.492Z, type=index_hadoop, dataSource=pulsar_event_merged}]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:255)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:218)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:197)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/druid-indexing/pulsar_event_merged/2015-04-01T031547.494Z/segmentDescriptorInfo does not exist
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at io.druid.indexer.IndexGeneratorJob.getPublishedSegments(IndexGeneratorJob.java:128)
        at io.druid.indexer.HadoopDruidIndexerJob$1.run(HadoopDruidIndexerJob.java:74)
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:137)
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:80)
        at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:294)
        ... 11 more
Caused by: java.io.FileNotFoundException: File /tmp/druid-indexing/pulsar_event_merged/2015-04-01T031547.494Z/segmentDescriptorInfo does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:362)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1484)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1524)
        at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
        at io.druid.indexer.IndexGeneratorJob.getPublishedSegments(IndexGeneratorJob.java:121)
        ... 15 more
2015-04-01 03:16:16,430 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_hadoop_pulsar_event_merged_2015-04-01T03:15:47.492Z",
"status" : "FAILED",
"duration" : 21468
}

I've created the folder /tmp/druid-indexing/ on hadoop with "drwxrwxrwx" attributes.

also attached the whole task log

Please advise, thanks.

xulu

index_hadoop_pulsar_event_merged.zip

Himanshu

unread,

Apr 1, 2015, 12:46:32 AM4/1/15

to Lu Xuechao, druid...@googlegroups.com

Can you check if you had some input data size to be processed really? I have seen this happening when you start indexing job with no data to process. We tried to improve the messaging with https://github.com/druid-io/druid/pull/1179 .

-- Himanshu

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/746522cb-4efe-4c2e-ad8c-84a6dfd29863%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lu Xuechao

unread,

Apr 1, 2015, 2:04:09 AM4/1/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Himanshu,

Thanks a lot. That should be the reason. The directory /pulsar/raw_soj/20150316/ has many sub directories(one for every 15 minute, in the day). The .avro files are under the subdirectories. After I changed the path to /pulsar/raw_soj/20150316/00_00, the task started working.

I hope to run a task daily to batch ingest all the data generated in the past day. Because we have 96 directories for a day, how could a task handle all of them?
Also from the task log:

2015-04-01 05:33:12,531 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - hdfs://xx:8020/pulsar/raw_soj/20150316/00_00/sojpulsaravro-m-00005.avro:0+139413666 > map
2015-04-01 05:33:12,533 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - hdfs://xx:8020/pulsar/raw_soj/20150316/00_00/sojpulsaravro-m-00001.avro:0+161695514 > map
2015-04-01 05:33:17,167 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - hdfs://xx:8020/pulsar/raw_soj/20150316/00_00/sojpulsaravro-m-00008.avro:0+143314486 > map
2015-04-01 05:33:17,169 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - hdfs://xx:8020/pulsar/raw_soj/20150316/00_00/sojpulsaravro-m-00002.avro:0+144980510 > map
2015-04-01 05:33:17,170 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - hdfs://xx:8020/pulsar/raw_soj/20150316/00_00/sojpulsaravro-m-00003.avro:0+139182446 > map
2015-04-01 05:42:07,880 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - hdfs://xx:8020/pulsar/raw_soj/20150316/00_00/sojpulsaravro-m-00144.avro:0+81797524 > map

the parallelism is not high. How to improve it?

thanks.

Himanshu

unread,

Apr 1, 2015, 2:14:37 AM4/1/15

to Lu Xuechao, druid...@googlegroups.com

Are you talking about hadoop job launched by druid? Druid does not support avro as input out of the box, what setup are you using?

or is this one of your processing pipeline jobs before you launch druid indexer?

In any case, for parallelism for the later case, in case of map-reduce jobs... number of maps is dependent upon the Hadoop InputFormat you're using (see relevant docs for what to tweak) and number of reducers can be adjusted by specifying "mapreduce.job.reduces" to your hadoop job config.

-- Himanshu

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ef618f11-a95b-481b-a75f-c9f661c77416%40googlegroups.com.

Lu Xuechao

unread,

Apr 1, 2015, 2:22:01 AM4/1/15

to druid...@googlegroups.com, lux...@gmail.com

This is druid hadoop job. I applied patch https://github.com/druid-io/druid/pull/1177/files#r26157693 and provided a customized AvroInputFormat(extends FileInputFormat) to read .avro files.

from my task config:

        "ioConfig": {
            "type": "hadoop",
            "inputSpec": {
                "type": "static",
                "paths": "hdfs://xx:8020/pulsar/raw_soj/20150316/00_00",
                "inputFormat": "com.ebay.tracking.druid.avro.AvroInputFormat"
            }
        }

If the druid task cannot iterate all subdirectories to look for .avro files, I have to submit 96 jobs for one day's data. Correct?

thanks.
xulu

Himanshu

unread,

Apr 1, 2015, 2:43:12 AM4/1/15

to Lu Xuechao, druid...@googlegroups.com

Not really, you shouldn't have to fire 96 indexer jobs. Here are the options..

1) Implementation of "com.ebay.tracking.druid.avro.AvroInputFormat" can recursively follow subdirectories and read data. From what I can remember, FileInputFormat was already capable of doing so, check your impl

(see docs for following options http://druid.io/docs/latest/Batch-ingestion.html#path-specification )

2) in the static "pathSpec" give comma separated list of all 96 directories

3) see if "granularity" path spec is better option for you

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/40c7e487-89d3-43a9-97c5-129b432b8333%40googlegroups.com.

Himanshu

unread,

Apr 1, 2015, 2:45:15 AM4/1/15

to Lu Xuechao, druid...@googlegroups.com

It seems FileInputFormat was buggy for some hadoop versions: https://issues.apache.org/jira/browse/MAPREDUCE-3193

make sure you're using the one that can arbitrarily recurse and read all data in the subdirectories.

-- Himanshu

Himanshu

unread,

Apr 1, 2015, 2:50:06 AM4/1/15

to Lu Xuechao, druid...@googlegroups.com

maybe you're just missing specifying

mapreduce.input.fileinputformat.input.dir.recursive=true

(if you used https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/mapred/FileInputFormat.html )

to job properties.

-- Himanshu

Lu Xuechao

unread,

Apr 1, 2015, 2:52:58 AM4/1/15

to druid...@googlegroups.com, lux...@gmail.com

thanks a lot for all the information. Will try it out and update.

thanks,
xulu

Lu Xuechao

unread,

Apr 8, 2015, 3:39:25 AM4/8/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Himanshu,

Some updates. Now the job can run on remote hadoop and map/reduce jobs run successfully, but the druid index job failed with below exception. I can find the temp files are generated, but there's no segmentDescriptorInfo.

what might be the reason?

thanks.

2015-04-08 07:10:36,091 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_pulsar_event_merged_2015-04-08T07:03:21.251Z, type=index_hadoop, dataSource=pulsar_event_merged}]

java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:255)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:218)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:197)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/druid-indexing/pulsar_event_merged/2015-04-08T070324.336Z/segmentDescriptorInfo does not exist.
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at io.druid.indexer.IndexGeneratorJob.getPublishedSegments(IndexGeneratorJob.java:127)

        at io.druid.indexer.HadoopDruidIndexerJob$1.run(HadoopDruidIndexerJob.java:74)
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:137)
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:80)
        at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:294)
        ... 11 more

Caused by: java.io.FileNotFoundException: File /tmp/druid-indexing/pulsar_event_merged/2015-04-08T070324.336Z/segmentDescriptorInfo does not exist.
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:650)
        at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:98)
        at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
        at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:704)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:704)
        at io.druid.indexer.IndexGeneratorJob.getPublishedSegments(IndexGeneratorJob.java:120)
        ... 15 more

Lu Xuechao

unread,

Apr 8, 2015, 3:42:35 AM4/8/15

to druid...@googlegroups.com, lux...@gmail.com

This is my job definition:

{
    "type": "index_hadoop",
    "spec": {
        "dataSchema": {
            "dataSource": "pulsar_event_merged",
            "parser": {
                "type": "string",
                "parseSpec": {
                    "format": "json",
                    "timestampSpec": {
                        "column": "timestamp",
                        "format": "auto"
                    },
                    "dimensionsSpec": {
                        "dimensions": [],
                        "dimensionExclusions": ["guid", "uid","js_evt_kafka_produce_ts", "timestamp","js_ev_type"],
                        "spatialDimensions": []
                    }
                }
            },
            "metricsSpec": [
                {
                    "type": "count",
                    "name": "count"
                },
                {
                    "type" : "hyperUnique",
                    "fieldName": "guid",
                    "name": "guid_hll"
                },
                {
                     "type" : "longSum",
                     "fieldName": "dwell",
                     "name": "dwell_ag"
                }
            ],
            "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "hour",
                "queryGranularity": "NONE",
                "intervals": [
                    "2015-03-25/2015-03-26"
                ]
            }
        },

        "ioConfig": {
            "type": "hadoop",
            "inputSpec": {
                "type": "static",

                "paths": "hdfs://artemis:8020/pulsar/raw_soj/20150325/00_00",
                "inputFormat": "com.ebay.tracking.druid.avro.AvroInputFormat"
            }
        },
        "tuningConfig": {
            "type": "hadoop",
            "partitionsSpec": {
                "type": "hashed",
                "numShards": 10
            },
            "shardSpecs": {},
            "leaveIntermediate": false,
            "cleanupOnFailure": true,
            "overwriteFiles": false,
            "ignoreInvalidRows": true,
            "jobProperties": {
                "mapreduce.input.fileinputformat.input.dir.recursive": "true",
                "mapreduce.input.fileinputformat.list-status.num-threads": "10",
                "mapreduce.job.reduces": "10"
            },
            "combineText": false,
            "persistInHeap": false,
            "ingestOffheap": false,
            "bufferSize": 134217728,
            "aggregationBufferRatio": 0.5,
            "rowFlushBoundary": 500000

Gian Merlino

unread,

Apr 9, 2015, 2:12:17 AM4/9/15

to druid...@googlegroups.com, lux...@gmail.com

That errors makes me wonder if no segments actually got created. To confirm that: did any index-generator hadoop jobs run? Did they complete successfully? Can you post a log from one of the index-generator reducers?

Also, out of curiosity, what version of Druid are you using?

Lu Xuechao

unread,

Apr 9, 2015, 2:26:26 AM4/9/15

to druid...@googlegroups.com, lux...@gmail.com

Hi,

I am using 0.6.171.

From the task log and Hadoop jobtracker, the hadoop map/reduce jobs ran successfully. The files on hadoop after the job failed is shown in this picture:
https://lh3.googleusercontent.com/-ebb83nGJkO8/VSTa884__EI/AAAAAAAABNg/rgE7m4wnZsw/s1600/Untitled.png

below is a more detailed druid task log, reducer log also attached.

2015-04-08 09:50:52,058 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1426279437401_565266 completed successfully
2015-04-08 09:50:52,236 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 53
        File System Counters
                FILE: Number of bytes read=5175293687
                FILE: Number of bytes written=10880875352
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=36760981983
                HDFS: Number of bytes written=2682017189
                HDFS: Number of read operations=3474
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=20
        Job Counters
                Killed map tasks=6
                Launched map tasks=580
                Launched reduce tasks=10
                Other local map tasks=11
                Data-local map tasks=528
                Rack-local map tasks=41
                Total time spent by all maps in occupied slots (ms)=97614345
                Total time spent by all reduces in occupied slots (ms)=5731419
                Total time spent by all map tasks (ms)=32538115
                Total time spent by all reduce tasks (ms)=1910473
                Total vcore-seconds taken by all map tasks=32538115
                Total vcore-seconds taken by all reduce tasks=1910473
                Total megabyte-seconds taken by all map tasks=133276119040
                Total megabyte-seconds taken by all reduce tasks=7825297408
        Map-Reduce Framework
                Map input records=62555721
                Map output records=52722302
                Map output bytes=36381101937
                Map output materialized bytes=5594877400
                Input split bytes=90118
                Combine input records=0
                Combine output records=0
                Reduce input groups=20
                Reduce shuffle bytes=5594877400
                Reduce input records=52722302
                Reduce output records=52722302
                Spilled Records=104042959
                Shuffled Maps =5740
                Failed Shuffles=0
                Merged Map outputs=5740
                GC time elapsed (ms)=309442
                CPU time spent (ms)=35744580
                Physical memory (bytes) snapshot=1098268250112
                Virtual memory (bytes) snapshot=2925780250624
                Total committed heap usage (bytes)=1219227549696
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        io.druid.indexer.HadoopDruidIndexerConfig$IndexJobCounters
                INVALID_ROW_COUNTER=9833415
        File Input Format Counters
                Bytes Read=36760891865
        File Output Format Counters
                Bytes Written=2682017189
ALERT:/tmp/druid-indexing
ALERT:pulsar_event_merged
ALERT:2015-04-08T093021.124Z
2015-04-08 09:50:52,293 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_
pulsar_event_merged_2015-04-08T09:30:21.121Z, type=index_hadoop, dataSource=pulsar_event_merged}]

java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:255)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:218)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:197)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/druid-indexing/pulsar_event_merged/2015-04-08T093021.124Z/segmentDescriptorIn

fo does not exist.
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at io.druid.indexer.IndexGeneratorJob.getPublishedSegments(IndexGeneratorJob.java:127)
        at io.druid.indexer.HadoopDruidIndexerJob$1.run(HadoopDruidIndexerJob.java:74)
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:137)
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:80)
        at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:294)
        ... 11 more

Caused by: java.io.FileNotFoundException: File /tmp/druid-indexing/pulsar_event_merged/2015-04-08T093021.124Z/segmentDescriptorInfo does not exist.

        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:650)
        at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:98)
        at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
        at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:704)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:704)
        at io.druid.indexer.IndexGeneratorJob.getPublishedSegments(IndexGeneratorJob.java:120)
        ... 15 more

2015-04-08 09:50:52,306 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_hadoop_pulsar_event_merged_2015-04-08T09:30:21.121Z",
"status" : "FAILED",
"duration" : 1223161
}

thanks.

reducer.log

Lu Xuechao

unread,

Apr 10, 2015, 2:20:16 AM4/10/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Team,

please advise. thanks.

Gian Merlino

unread,

Apr 10, 2015, 9:02:15 PM4/10/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Lu, those counters and logs look like they're for the determine-partitions job. Do you have counters and logs for the index-generator job? It should have run immediately after the determine-partitions job finished. That one is interesting because it is supposed to generate the segments determined by your determine-partitions job, but the File Not Found error message you got indicates that the index-generator may not have actually written any segments. I'm hoping to figure out why.

Also, it looks like some nodes in your cluster are not using UTC for their timezone, based on the fact that the shardSpecs are binned under the -07:00 TZ. This may or may not be a problem (I'm not sure if it works properly). Is it possible for you to try with your cluster in UTC?

Gian Merlino

unread,

Apr 10, 2015, 9:50:17 PM4/10/15

to druid...@googlegroups.com, lux...@gmail.com

Also, out of curiosity, what is your fs.defaultFS set to in your core-site.xml?

Gian Merlino

unread,

Apr 10, 2015, 9:56:27 PM4/10/15

to druid...@googlegroups.com, lux...@gmail.com

If it's set to something other than the Hadoop NameNode, then setting it to that might help. (Something like: hdfs://name-node.example.com:9000)

Himanshu

unread,

Apr 12, 2015, 2:49:07 AM4/12/15

to Lu Xuechao, druid...@googlegroups.com

Hi,

I am mostly offline for a while with little connectivity, hence the late reply.

One thing I noticed that you are manually supplying number of reducers in your indexer json. Druid itself figures out correct number of reducer for these MR jobs and you are probably overriding all that. Can you try removing the "mapreduce.job.reduces" and rerun your task?

If it does not work, then please attach your indexer task log.

-- Himanshu

--

You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/05b8f73d-1224-4273-9288-b573a294f3a7%40googlegroups.com.

Lu Xuechao

unread,

Apr 12, 2015, 10:15:32 PM4/12/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Himanshu,

I removed the "mapreduce.job.reduces" and reran the task and failed with the same error. Attached the whole indexing task log.

thanks.

index_hadoop_pulsar_event_merged_2015-04-13T01_50_32.841Z.log

Message has been deleted

Gian Merlino

unread,

Apr 13, 2015, 4:40:10 PM4/13/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Lu, did you have a chance to pull logs from an index-generator reducer? The error you're getting suggests that the index-generator Hadoop job didn't actually write any segment descriptors. The index-generator reducers are supposed to do this. The counters you posted indicated they did actually get some data, so I'm wondering if they wrote their segments to the wrong place.

Lu Xuechao

unread,

Apr 13, 2015, 10:18:41 PM4/13/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Gian,

I think you're right, the IndexGeneratorReducer.reduce() method somehow not get called(setup() is called). Attached the IndexGeneratorReducer's log. thanks.

reducer.txt

Lu Xuechao

unread,

Apr 14, 2015, 2:17:08 AM4/14/15

to druid...@googlegroups.com, lux...@gmail.com

An update. Changed the signature of class IndexGeneratorReducer from

public static class IndexGeneratorReducer extends Reducer<BytesWritable, Writable, BytesWritable, Text>

to

public static class IndexGeneratorReducer extends Reducer<BytesWritable, Text, BytesWritable, Text>

then IndexGeneratorReducer.reduce() is called successfully and index segments generated. Thank you all for the help.

Two issues still exist are

1. there were 480 reducers(480 partitions) while the IndexGeneratorPartitioner.getPartition() returned 310 ~ 319 for 99%+ events, thus 460+ reducers got no data at all and reducer 310~319 all failed with IllegalStateException:

2015-04-14 05:13:06,513 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1426279437401_716745_r_000317_1, Status : FAILED
Error: java.lang.IllegalStateException: Wrote[2627459989] bytes, which is too many.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:172)
        at io.druid.segment.data.GenericIndexedWriter.close(GenericIndexedWriter.java:108)
        at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:518)
        at io.druid.segment.IndexMerger.merge(IndexMerger.java:307)
        at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:169)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:378)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:256)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

2. indexing task cancelled but mapreduce job not killed.

thanks.

Lu Xuechao

unread,

Apr 14, 2015, 5:29:46 AM4/14/15

to druid...@googlegroups.com, lux...@gmail.com

the keys are like below:

00 00 00 10 00 00 01 36 00 00 01 4c 69 7a d5 80 00 00 00 00 00 00 01 4c 69 88 81 80 8d 1a 3c 7c 5b 5d cb fb f1 d8 89 ac 99 87 7c b1
00 00 00 10 00 00 01 3f 00 00 01 4c 69 7a d5 80 00 00 00 09 00 00 01 4c 69 88 91 20 20 92 bc 4d a9 c0 c0 e1 bb b3 44 2b ca 97 4b 1c
00 00 00 10 00 00 01 3e 00 00 01 4c 69 7a d5 80 00 00 00 08 00 00 01 4c 69 88 8d 38 bf 56 13 89 17 ed bb e7 2c d6 a4 04 ba 04 50 d8
00 00 00 10 00 00 01 3f 00 00 01 4c 69 7a d5 80 00 00 00 09 00 00 01 4c 69 88 91 20 1c 93 8f 4f 04 5d 05 86 c2 82 91 8c 27 85 17 53
00 00 00 10 00 00 01 3a 00 00 01 4c 69 7a d5 80 00 00 00 04 00 00 01 4c 69 88 8d 38 74 85 53 f6 5a b3 8d d0 b4 02 2a 43 84 47 59 69
00 00 00 10 00 00 01 3c 00 00 01 4c 69 7a d5 80 00 00 00 06 00 00 01 4c 69 88 91 20 03 b2 d5 0c 27 08 ae d1 31 c8 26 41 a6 cd b6 b7
00 00 00 10 00 00 01 3b 00 00 01 4c 69 7a d5 80 00 00 00 05 00 00 01 4c 69 88 91 20 fa 18 88 33 7b 6c 1a 50 16 ae 8b 75 ab ec 98 0d
00 00 00 10 00 00 01 37 00 00 01 4c 69 7a d5 80 00 00 00 01 00 00 01 4c 69 88 8d 38 e4 08 21 53 e2 29 be c4 47 1c 0d b5 ba d4 ec e1
00 00 00 10 00 00 01 3a 00 00 01 4c 69 7a d5 80 00 00 00 04 00 00 01 4c 69 88 91 20 68 2f cd 16 42 89 51 ff 25 6f 57 5d 9d 3a c8 bd
00 00 00 10 00 00 01 39 00 00 01 4c 69 7a d5 80 00 00 00 03 00 00 01 4c 69 88 8d 38 5b c7 4c 15 69 ef 90 54 ec 98 2a be 2b 5b b0 2f
00 00 00 10 00 00 01 38 00 00 01 4c 69 7a d5 80 00 00 00 02 00 00 01 4c 69 88 8d 38 2c 95 18 ff 99 48 84 20 29 9f 65 28 8e 9e ae fb
00 00 00 10 00 00 01 36 00 00 01 4c 69 7a d5 80 00 00 00 00 00 00 01 4c 69 88 91 20 47 91 c2 71 5e e9 93 aa e4 76 7b ad 13 e1 c1 20
00 00 00 10 00 00 01 39 00 00 01 4c 69 7a d5 80 00 00 00 03 00 00 01 4c 69 88 8d 38 61 fd 2b 12 2d ac 74 b0 21 d6 dc 16 81 10 7d 6f
00 00 00 10 00 00 01 39 00 00 01 4c 69 7a d5 80 00 00 00 03 00 00 01 4c 69 88 8d 38 4e 39 31 1a 7d 06 7d 89 5d 44 79 64 8e 92 26 ca
00 00 00 10 00 00 01 39 00 00 01 4c 69 7a d5 80 00 00 00 03 00 00 01 4c 69 88 91 20 f7 0d 46 66 3e 13 08 51 b4 14 d9 ab 18 02 89 13
00 00 00 10 00 00 01 3f 00 00 01 4c 69 7a d5 80 00 00 00 09 00 00 01 4c 69 88 91 20 ed e8 b2 9e e2 a7 a1 6c 28 18 0d c8 04 e2 24 f4
00 00 00 10 00 00 01 36 00 00 01 4c 69 7a d5 80 00 00 00 00 00 00 01 4c 69 88 91 20 47 91 c2 71 5e e9 93 aa e4 76 7b ad 13 e1 c1 20
00 00 00 10 00 00 01 3c 00 00 01 4c 69 7a d5 80 00 00 00 06 00 00 01 4c 69 88 8d 38 c1 cb 4b 60 94 e1 f3 aa 90 50 70 13 bc 38 ee 11
00 00 00 10 00 00 01 3b 00 00 01 4c 69 7a d5 80 00 00 00 05 00 00 01 4c 69 88 8d 38 4f 26 2e 0c 56 39 a7 a0 3b 3e 7d 52 06 0c 2e 95

Gian Merlino

unread,

Apr 14, 2015, 11:15:45 AM4/14/15

to druid...@googlegroups.com, lux...@gmail.com

The "Wrote[2627459989] bytes, which is too many" business indicates too much data is going to that reducer, and the max column size (2GB) has been exceeded. I guess this is because of the poor data distribution you mentioned.

Are you still using the job definition from: https://groups.google.com/d/msg/druid-user/Gyq2NWt1dw4/9MKttTIZDpkJ? I have a few questions about that:

- Can you try removing the "numShards" from the partitionsSpec and allowing Druid to determine the number on its own?

- What version of Druid are you using? Can you try 0.7.1.1, if you're using a different version? It has the patch from https://github.com/druid-io/druid/pull/1177 and it also has some improvements for spill management during indexing.

Lu Xuechao

unread,

Apr 14, 2015, 11:08:06 PM4/14/15

to druid...@googlegroups.com, lux...@gmail.com

Yes, I am using 0.6.171 and will upgrade to 0.7.1.1.

I just found another issue, the DetermineHashedPartitionsJob generates partitions.json under path like

/tmp/druid-indexing/pulsar_event_merged/2015-04-15T021928.055Z/20150329T000000.000-0700_20150329T010000.000-0700/partitions.json

while later it complains:

2015-04-15 02:25:00,141 INFO [task-runner-0] io.druid.indexer.DetermineHashedPartitionsJob - Path[/tmp/druid-indexing/pulsar_event_merged/2015-04-15T021928.055Z/20150329T000000.000Z_20150329T010000.000Z/partitions.json] didn't exist!?

Is this a bug?

thanks.

Lu Xuechao

unread,

Apr 14, 2015, 11:26:59 PM4/14/15

to druid...@googlegroups.com, lux...@gmail.com

And my indexer task defines granularitySpec as below:

            "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "hour",
                "queryGranularity": "NONE",
                "intervals": [

                    "2015-03-29/2015-03-31"
                ]
            }

then DetermineHashedPartitionsJob generates partitions.json as below:

from
/tmp/druid-indexing/pulsar_event_merged/2015-04-15T021928.055Z/20150328T170000.000-0700_20150328T180000.000-0700/partitions.json
to
/tmp/druid-indexing/pulsar_event_merged/2015-04-15T021928.055Z/20150330T160000.000-0700_20150330T170000.000-0700/partitions.json

offset by 7 hours?

Gian Merlino

unread,

Apr 15, 2015, 12:03:09 AM4/15/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Lu,

Those timestamps are "correct" in that they are the correct instants, although the timezones are inconsistent (20150328T170000.000-0700 is the same instant as UTC 2015-03-29). I think this is caused by running your hadoop cluster and your driver in different timezones. Can you try running them all in UTC?

And yes, I'd consider this a bug. Ideally things should work even if your machines are not all in the same timezone.

Lu Xuechao

unread,

Apr 15, 2015, 4:49:02 AM4/15/15

to druid...@googlegroups.com, lux...@gmail.com

Hi Team,

Setting mapreduce.reduce.java.opts=-server -Xmx8192m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
solved that issue. Now the indexer job can run successfully. Thanks!

Fangjin Yang

unread,

Apr 15, 2015, 1:21:42 PM4/15/15

to druid...@googlegroups.com, lux...@gmail.com

Great to hear you resolved it Xuechao!

an...@stratoshear.com

unread,

May 7, 2015, 2:56:46 AM5/7/15

to druid...@googlegroups.com

hi ,

I m running druid on single node linux system on root user and i have installed hadoop on same system using different user with all permission given to that user.
i have changed all dependency in common.runtime.properties like

druid.extensions.coordinates=["io.druid.extensions:druid-examples","io.druid.extensions:druid-kafka-eight","io.druid.extensions:mysql-metadata-storage","io.druid.extensions:druid-hdfs-storage"]

# Zookeeper
druid.zk.service.host=10.X.X.X

# Metadata Storage (mysql)
druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc\:mysql\://10.X.X.X\:3306/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=diurd

# Deep storage (local filesystem for examples - don't use this in production)

#druid.storage.type=local
#druid.storage.storageDirectory=/tmp/druid/localStorage

druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://10.x.x.x:9000/druid

druid.indexer.task.hadoopWorkingPath=hdfs://10.x.x.x:9000/druid

But when ever i start realtime to ingest data its giving me error like

2015-05-07T06:36:00,426 ERROR [demodruid-2015-05-07T05:20:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[demodruid]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.io.IOException, exceptionMessage=No FileSystem for scheme: hdfs, interval=2015-05-07T05:20:00.000Z/2015-05-07T05:25:00.000Z}
java.io.IOException: No FileSystem for scheme: hdfs

Its announcing segments but not hand off to hdfs.

Thanks.

Nishant Bangarwa

unread,

May 8, 2015, 9:23:43 AM5/8/15

to an...@stratoshear.com, druid...@googlegroups.com

Hi Ankit,

the index task also needs hadoop config files in order to talk to the hadoop cluster.

Can you try adding your hadoop config files to the classpath ?

--

You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/23f10a50-4387-4a73-834b-64b6549858c3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Nishant

Software Engineer

|

METAMARKETS

m	+91-9729200044

nishant....@metamarkets.com

an...@stratoshear.com

unread,

May 11, 2015, 2:02:36 AM5/11/15

to druid...@googlegroups.com, an...@stratoshear.com, Nishant Bangarwa

Hi Nishant,

I really apologize for my late reply to you and give you many thanks for your response.
So as you suggested me to adding hadoop config files to the classpath. I have added my hadoop config files path to the realtime query like below:

java -Xmx512m -Duser.timezone=UTC -Dfile.encoding=UTF-8 /
> -Ddruid.realtime.specFile=examples/demodruid/demodruid.spec /
> -classpath "config/_common:config/realtime:lib/*:/usr/local/hadoop/etc/hadoop/*" /
> io.druid.cli.Main server realtime

But it has given me same error

java.io.IOException: No FileSystem for scheme: hdfs

I am really out of thought right now that what's i am missing so could you please tell me that how to add classpath in runtime query and hadoop storage directory path in common properties file on looking my above realtime query or configuration in common properties file.

I am running kafka,coordinator,historical,zookeeper in my system using same query given in druid.io. site.

Please suggest me some more tips i will be really thankful to you.

Thanks.

tec...@splicky.com

unread,

Jun 26, 2015, 4:23:27 AM6/26/15

to druid...@googlegroups.com, an...@stratoshear.com

I have the same problem. Did you find a solution for that?

Fangjin Yang

unread,

Jun 27, 2015, 1:44:13 AM6/27/15

to druid...@googlegroups.com, tec...@splicky.com, an...@stratoshear.com

What does the config for realtime look like? Just want to make sure the configs there aren't overriding the extensions config in the common.runtime.properties

Reply all

Reply to author

Forward