--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/X5ILuZT0jlk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.
Thanks, Alex - that's a neat idea.Related question: You say this approach is "much faster" - is there evidence that using S3DistCp is also faster overall when the input isn't a bunch of small files?
On Sunday, October 27, 2013 3:19:42 AM UTC-7, Alex Dean wrote:I haven't had my coffee yet so could be missing something, but couldn't you switch:S3 -> your job -> another S3to:S3 -> S3DistCp -> local HDFS ~> your job -> another S3or:S3 -> your job ~> local HDFS -> S3DistCp -> another S3
Then you only have to use #getStepConfigDef() once I think in your job?Using S3DistCp and reading/writing from local HDFS is generally much faster anyway than reading/writing S3 direct.Links:Hope it's helpful (and sorry for spam - I sent a prior version of this email direct to author by mistake)
A
On Saturday, October 26, 2013 8:00:09 AM UTC+1, redshift-etl-user wrote:I'm need to read from an S3 bucket with certain key and secret, and write to a different bucket with a different key and secret using EMR. The issue with initializing an Hfs tap with a secret key with a slash has been discussed before, but since I'm dealing with two sets of credentials, using Hadoop variables won't work (I can only specify one). Is there a way to do this without making assumptions about whether the secret key contains slashes?As a side note, it seems to me that this restriction with specifying secret keys with slashes to Hfs is unnecessary, since Hadoop's NativeS3FileSystem is able to. Any thoughts on that?Thanks!
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/X5ILuZT0jlk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/X5ILuZT0jlk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/fc841bab-3822-40c3-8b1c-d011529b5ee3%40googlegroups.com.
S3DistCp definitely seems to be performing better than reading directly from S3, at least for many small files.Found a race condition that causes its reducers to fail sometimes, though - you can circumvent the problem by setting the property "s3DistCp.copyfiles.mapper.numWorkers" to 1. There's a performance penalty in that it turns out multithreaded downloads.
Do you use S3DistCp only when reading or when writing as well?
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAOkn%3D1zDGOo4BYLiyWuPw6T-t%3DEY7P7-JBgMJAnMWvSFqRKS%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/E3DB91E8-D840-4740-9BA6-960443A05EFD%40transpac.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/cf169f75-9504-4d16-88d7-e5b042d2d995%40googlegroups.com.
There are AWS credentials config options for the job itself where you can set the "foreign" credentials
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/1ca25b30-2094-4ac0-b48f-a9ce85da00d7%40googlegroups.com.