InvalidInputException: Input path does not exist:

519 views
Skip to first unread message

Sonny Rivera

unread,
Feb 16, 2015, 7:19:48 AM2/16/15
to snowpl...@googlegroups.com

I'm having a hard time getting my enrich process to run.  I keep getting error indicating there are not files present.  If some could point me in the right direction, I would be grateful.

I'm skipping shredding, staging, and archiving.

bundle exec bin/snowplow-emr-etl-runner --debug --skip staging,archive,shred --config config/config.yml.product --enrichments config/enrichments

It appears as if the Enrich-raw task / steps completes but produces not files.  There are files in the :raw:in  S3 bucket in the Clojure format.


syslog Error
2015-02-16 11:41:47,158 INFO org.apache.hadoop.mapred.JobClient (main): Cleaning up the staging area hdfs://10.147.243.44:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201502161135_0003
2015-02-16 11:41:47,158 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException as:hadoop cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:/tmp/a56ad753-bf35-4133-9819-a618c1f0ce96/files
2015-02-16 11:41:47,158 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Try to recursively delete hdfs:/tmp/a56ad753-bf35-4133-9819-a618c1f0ce96/tempspace
config.txt

Alex Dean

unread,
Feb 16, 2015, 7:27:14 AM2/16/15
to snowpl...@googlegroups.com
Hey Sonny,

Your config looks good - but --skip staging isn't going to work. You need to include the staging step to get your raw logs out of :raw:in into :raw:processing ready for the Elastic MapReduce process to pick up.

Similarly, you will want to archive the raw files out of :raw:processing into the :raw:archive, so don't --skip archive.

--skip shred is fine if you don't have any JSONs to shred into dedicated tables in Redshift

Hope this helps,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Sonny Rivera

unread,
Feb 16, 2015, 7:54:00 AM2/16/15
to snowpl...@googlegroups.com
I have resolved this now.  
  • Since I was skipping staging and archiving, the files never got copied to the ":raw:processing" directory.  
  • This resulted in the enrich process executing but never not having files to operate on.  The enrich process completed properly.
  • The next process "Elasticity S3DistCp Step: Enriched HDFS -> S3"  failedbecause no files exist.

Alex Dean

unread,
Feb 16, 2015, 8:12:25 AM2/16/15
to snowpl...@googlegroups.com
Great to hear it - thanks for updating the thread with your findings Sonny!

A

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages