--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
2014-07-17T02:31:40.586Z INFO Working dir /mnt/var/lib/hadoop/steps/4 2014-07-17T02:31:40.586Z INFO Executing /usr/java/latest/bin/java -cp /home/hadoop/conf:/usr/java/latest/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-tools.jar:/home/hadoop/hadoop-tools-1.0.3.jar:/home/hadoop/hadoop-core-1.0.3.jar:/home/hadoop/hadoop-core.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/4 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/4/tmp -Djava.library.path=/home/hadoop/native/Linux-amd64-64 org.apache.hadoop.util.RunJar /home/hadoop/lib/emr-s3distcp-1.0.jar --src hdfs:///local/snowplow/shredded-events/ --dest s3n://spenriched/shredded/good/run=2014-07-17-02-11-23/ --srcPattern .*part-.* --s3Endpoint s3-us-west-2.amazonaws.com 2014-07-17T02:31:46.665Z INFO Execution ended with ret val 1 2014-07-17T02:31:46.666Z WARN Step failed with bad retval 2014-07-17T02:31:52.736Z INFO Step created jobs:
SysLog:2014-07-17 02:31:42,198 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Running with args: [Ljava.lang.String;@471719b6 2014-07-17 02:31:45,975 FATAL com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system java.io.FileNotFoundException: File does not exist: hdfs:/local/snowplow/shredded-events at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517) at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:564) at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:549) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:187)Regards.
--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This makes me wonder if the problem is that for some reason, the collector log isn't processed. (E.g. you have the incorrect bucket specified as the 'in' bucket.
When I look at the configuration you shared, I can see that you have the in bucket set to the same bucket as the log bucket. This is a bad idea, because it means that each time you run the data pipeline, it'll try and process the EMR logs generated from the last time the process was run.
This is doubly strange in your case, because you said you had no bad rows at all. Any EMR log that is fed into the Snowplow data pipeline will generate a bad row for every line in those EMR logs, because those logs are not in the correct format for Snowplow.
Things to try: