InvalidInputException: Input path does not exist:

Sonny Rivera

unread,

Feb 16, 2015, 7:19:48 AM2/16/15

to snowpl...@googlegroups.com

I'm having a hard time getting my enrich process to run. I keep getting error indicating there are not files present. If some could point me in the right direction, I would be grateful.

I'm skipping shredding, staging, and archiving.

bundle exec bin/snowplow-emr-etl-runner --debug --skip staging,archive,shred --config config/config.yml.product --enrichments config/enrichments

It appears as if the Enrich-raw task / steps completes but produces not files. There are files in the :raw:in S3 bucket in the Clojure format.

syslog Error

2015-02-16 11:41:47,158 INFO org.apache.hadoop.mapred.JobClient (main): Cleaning up the staging area hdfs://10.147.243.44:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201502161135_0003

2015-02-16 11:41:47,158 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:/tmp/a56ad753-bf35-4133-9819-a618c1f0ce96/files

2015-02-16 11:41:47,158 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Try to recursively delete hdfs:/tmp/a56ad753-bf35-4133-9819-a618c1f0ce96/tempspace

config.txt

Alex Dean

unread,

Feb 16, 2015, 7:27:14 AM2/16/15

to snowpl...@googlegroups.com

Hey Sonny,

Your config looks good - but --skip staging isn't going to work. You need to include the staging step to get your raw logs out of :raw:in into :raw:processing ready for the Elastic MapReduce process to pick up.

Similarly, you will want to archive the raw files out of :raw:processing into the :raw:archive, so don't --skip archive.

--skip shred is fine if you don't have any JSONs to shred into dedicated tables in Redshift

Hope this helps,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Sonny Rivera

unread,

Feb 16, 2015, 7:54:00 AM2/16/15

to snowpl...@googlegroups.com

I have resolved this now.

Since I was skipping staging and archiving, the files never got copied to the ":raw:processing" directory.
This resulted in the enrich process executing but never not having files to operate on. The enrich process completed properly.
The next process "Elasticity S3DistCp Step: Enriched HDFS -> S3" failedbecause no files exist.

Alex Dean

unread,

Feb 16, 2015, 8:12:25 AM2/16/15

to snowpl...@googlegroups.com

Great to hear it - thanks for updating the thread with your findings Sonny!

A

--

You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward