Error: java.lang.RuntimeException: Reducer task failed to copy 663 files: s3://lolo-snowplow-archive/processing/E1HT7595MTEQ1H.2015-06-11-07.f1ef91f2.gz etc
at com.amazon.elasticmapreduce.s3distcp.CopyFilesReducer.cleanup(CopyFilesReducer.java:75)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Error: java.lang.RuntimeException: Reducer task failed to copy 829 files: s3://lolo-snowplow-archive/processing/E1HT7595MTEQ1H.2015-06-13-22.7b288dae.gz etc
at com.amazon.elasticmapreduce.s3distcp.CopyFilesReducer.cleanup(CopyFilesReducer.java:75)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Error: java.lang.RuntimeException: Reducer task failed to copy 528 files: s3://lolo-snowplow-archive/processing/E1HT7595MTEQ1H.2015-06-12-09.d5438eff.gz etc
at com.amazon.elasticmapreduce.s3distcp.CopyFilesReducer.cleanup(CopyFilesReducer.java:75)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:195)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:656)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:394)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
jobflow.action_on_failure = 'CONTINUE' jobflow.keep_job_flow_alive_when_no_steps = true
...
2015-08-12 12:53:23,836 INFO [main] com.amazonaws.latency: StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 9F56F42991799322), S3 Extended Request ID: 8B93RR4HbvioAL9wxLUdGZFIDlw9xUNaXhcxkV9D5h4+wGUKEyZOPgOo9fwaz63I], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[9F56F42991799322], ServiceEndpoint=[https://wh-snowplow-out.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[16.624], HttpRequestTime=[15.356], HttpClientReceiveResponseTime=[12.032], RequestSigningTime=[0.676]
wh-snowplow-out bucket do exist btw. I have enclosed this log file, and the syslog. On the other hand there is enriched data in s3://x-snowplow-out/enriched/good/run=2015-08-13.. so I'm really confused and can't really identify the problem.
Any help is appreciated, thank you so much in advance !
It sounds like Hadoop Enrich is really not enjoying writing your event volumes direct to S3.
I recommend raising the S3DistCp issue you're seeing with AWS support.
Cheers,
Alex
...