Re: [druid-user] Ingest local data to s3 deep storage failed

117 views
Skip to first unread message

Gian Merlino

unread,
Aug 26, 2016, 1:58:57 AM8/26/16
to druid...@googlegroups.com
Hey 唐焱,

Did you also put your data on S3? There are a lot of ways to get Hadoop to load data from S3 (Hadoop is fun that way). One of them is documented here: https://imply.io/docs/latest/ingestion-batch#elastic-mapreduce-setup. It involves setting fs.s3*.awsAccessKeyId and fs.s3*.awsSecretAccessKey in your jobProperties.

Gian

On Tue, Aug 9, 2016 at 9:21 PM, 唐焱 <yant...@gmail.com> wrote:
Hello,

I'm trying to load the sample data wikiticker to druid and get following exception. I have used s3 as deep storage.


2016-08-10T03:46:48,072 WARN [Thread-59] org.apache.hadoop.mapred.LocalJobRunner - job_local826592096_0002
java
.lang.Exception: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
        at org
.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
        at org
.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
        at org
.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:70) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:61) ~[hadoop-common-2.3.0.jar:?]
        at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java
.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at org
.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.s3native.$Proxy191.initialize(Unknown Source) ~[?:?]
        at org
.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:272) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2350) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2332) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.FileSystem.get(FileSystem.java:369) ~[hadoop-common-2.3.0.jar:?]
        at org
.apache.hadoop.fs.Path.getFileSystem(Path.java:296) ~[hadoop-common-2.3.0.jar:?]
        at io
.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:691) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io
.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at org
.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org
.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org
.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]
        at org
.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]
        at java
.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[?:1.7.0_101]
        at java
.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_101]
        at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_101]
        at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_101]
        at java
.lang.Thread.run(Thread.java:745) ~[?:1.7.0_101]
2016-08-10T03:46:48,567 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local826592096_0002 failed with state FAILED due to: NA
2016-08-10T03:46:48,572 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 33
       
File System Counters
                FILE
: Number of bytes read=34215426
                FILE
: Number of bytes written=17310701
                FILE
: Number of read operations=0
                FILE
: Number of large read operations=0
                FILE
: Number of write operations=0
       
Map-Reduce Framework
               
Map input records=39244
               
Map output records=39244
               
Map output bytes=16736001
               
Map output materialized bytes=16892983
               
Input split bytes=309
               
Combine input records=0
               
Combine output records=0
               
Reduce input groups=0
               
Reduce shuffle bytes=16892983
               
Reduce input records=0
               
Reduce output records=0
               
Spilled Records=39244
               
Shuffled Maps =1
               
Failed Shuffles=0
               
Merged Map outputs=1
                GC time elapsed
(ms)=96
                CPU time spent
(ms)=0
               
Physical memory (bytes) snapshot=0
               
Virtual memory (bytes) snapshot=0
               
Total committed heap usage (bytes)=1912078336
       
Shuffle Errors
                BAD_ID
=0
                CONNECTION
=0
                IO_ERROR
=0
                WRONG_LENGTH
=0
                WRONG_MAP
=0
                WRONG_REDUCE
=0
       
File Input Format Counters
               
Bytes Read=0
       
File Output Format Counters
               
Bytes Written=0
2016-08-10T03:46:48,577 INFO [task-runner-0-priority-0] io.druid.indexer.JobHelper - Deleting path[var/druid/hadoop-tmp/wikiticker/2016-08-10T034630.036Z/8b44ff31242a48ff96ed6cdbc1547708]
2016-08-10T03:46:48,590 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikiticker_2016-08-10T03:46:29.980Z, type=index_hadoop, dataSource=wikiticker}]
java
.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at com
.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
        at io
.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at io
.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at io
.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at io
.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at java
.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_101]
        at java
.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_101]
        at java
.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_101]
        at java
.lang.Thread.run(Thread.java:745) [?:1.7.0_101]
Caused by: java.lang.reflect.InvocationTargetException
        at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java
.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at io
.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
       
... 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
        at io
.druid.indexer.JobHelper.runJobs(JobHelper.java:343) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io
.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
        at io
.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java
.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at io
.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
       
... 7 more
2016-08-10T03:46:48,596 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_wikiticker_2016-08-10T03:46:29.980Z] status changed to [FAILED].
2016-08-10T03:46:48,598 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
 
"id" : "index_hadoop_wikiticker_2016-08-10T03:46:29.980Z",
 
"status" : "FAILED",
 
"duration" : 14282
}



The attachments is the full log and common conf file.

Thanks!


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/691055d0-f0b8-4c37-a2e6-2b907fbe349c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages