No Such Bucket

Timothy Dalbey

unread,

Oct 19, 2011, 3:36:08 PM10/19/11

to Segue for R

Hey All,

So, thanks for writing this package in R. I have been fiddling with
getting EMR to work for a week or so and insofar I have found it very
difficult (and slow) to debug. Was super happy to give this package a
try.

That said I am encountering an error when running the test code as
provided here:

https://stat.ethz.ch/pipermail/r-sig-hpc/2010-December/000878.html

Here's the result:

library("segue")
Loading required package: rJava
Loading required package: caTools
Loading required package: bitops
Segue did not find your AWS credentials. Please run the
setCredentials() function.
>
> setCredentials("AKIAJ4AU4YVKFBC4WHPA", "giyQ2cKG3ywr8kn2wsiSMr6s+89APwR5iMzt/rJD", setEnvironmentVariables=TRUE)
>
> myCluster <- createCluster(numInstances=2)
STARTING - 2011-10-19 15:13:07
STARTING - 2011-10-19 15:13:38
STARTING - 2011-10-19 15:14:08
STARTING - 2011-10-19 15:14:39
STARTING - 2011-10-19 15:15:09
STARTING - 2011-10-19 15:15:39
STARTING - 2011-10-19 15:16:09
BOOTSTRAPPING - 2011-10-19 15:16:40
BOOTSTRAPPING - 2011-10-19 15:17:10
BOOTSTRAPPING - 2011-10-19 15:17:40
BOOTSTRAPPING - 2011-10-19 15:18:11
BOOTSTRAPPING - 2011-10-19 15:18:41
BOOTSTRAPPING - 2011-10-19 15:19:11
BOOTSTRAPPING - 2011-10-19 15:19:42
BOOTSTRAPPING - 2011-10-19 15:20:12
BOOTSTRAPPING - 2011-10-19 15:20:42
BOOTSTRAPPING - 2011-10-19 15:21:12
WAITING - 2011-10-19 15:21:43
Your Amazon EMR Hadoop Cluster is ready for action.
Remember to terminate your cluster with stopCluster().
Amazon is billing you!
>
> myList <- NULL
> set.seed(1)
> for (i in 1:10){
+ a <- c(rnorm(999), NA)
+ myList[[i]] <- a
+ }
>
> outputEmr <- emrlapply(myCluster, myList, mean, na.rm=T)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, :
Status Code: 404, AWS Request ID: 81E596F3B6A9D050, AWS Error Code:
NoSuchBucket, AWS Error Message: The specified bucket does not exist,
S3 Extended Request ID:
9eRkwvi8ZVbzMB5qtpAIfRwrJEIEDe952G01KyAadosV2FiQDfQueKvFhWde7hv9

Not sure if I am supposed to specify a bucket name or what. Help?

Thanks so much!
TMD

James Long

unread,

Oct 19, 2011, 3:39:13 PM10/19/11

to seg...@googlegroups.com, Segue for R

You're doing things right. You shouldn't have to manually create any buckets, Segue does that behind the scenes. Can you log onto the S3 dashboard and see if you have any buckets with 'segue' in the name?

-J

Sent from my iPhone.

Timothy Dalbey

unread,

Oct 19, 2011, 3:43:05 PM10/19/11

to Segue for R

Thanks for the super quick response!

Negatory - no new buckets. I noticed that your code does attempt to
create temporary buckets as well. Maybe AWS barfed?

hmmm...

On Oct 19, 3:39 pm, James Long <jdl...@gmail.com> wrote:
> You're doing things right. You shouldn't have to manually create any buckets, Segue does that behind the scenes. Can you log onto the S3 dashboard and see if you have any buckets with 'segue' in the name?
>
> -J
>
> Sent from my iPhone.
>

Timothy Dalbey

unread,

Oct 19, 2011, 3:44:51 PM10/19/11

to Segue for R

Oh - hehe! I just posted my Amazon creds! el-oh-el-oh-el!

Right, changing those now.

On Oct 19, 3:39 pm, James Long <jdl...@gmail.com> wrote:

> You're doing things right. You shouldn't have to manually create any buckets, Segue does that behind the scenes. Can you log onto the S3 dashboard and see if you have any buckets with 'segue' in the name?
>
> -J
>
> Sent from my iPhone.
>

Timothy Dalbey

unread,

Oct 19, 2011, 3:49:14 PM10/19/11

to Segue for R

Update...

Looks like the buckets are created, but perhaps they are deleted when
the job cleans itself up?

Best,
TMD

James Long

unread,

Oct 19, 2011, 3:50:17 PM10/19/11

to seg...@googlegroups.com

Whoops! Glad you caught that. I kept worrying that I was accidentally
putting my creds into the source code when I was writing Segue.

It looks like, for some reason, the Java API tools are not able to
properly create the bucket. The mystery is why this would happen.
After you get your creds reset, try your job again. When you get the
error, go look in S3 before you shut down the Segue cluster. Shutting
down the cluster is supposed to delete the temp buckets so it's
important to look before stopping...

-J

Timothy Dalbey

unread,

Oct 19, 2011, 4:00:26 PM10/19/11

to Segue for R

OK - thanks for taking the time to help resolve this issue.

So, after updating my account credentials the process completed
successfully. Automagical.

I suppose I'll raise the alarm if I run into this one again - until
then I blame it on internet gremlins.

Thanks again,
TMD

On Oct 19, 3:50 pm, James Long <jdl...@gmail.com> wrote:
> Whoops! Glad you caught that. I kept worrying that I was accidentally
> putting my creds into the source code when I was writing Segue.
>
> It looks like, for some reason, the Java API tools are not able to
> properly create the bucket. The mystery is why this would happen.
> After you get your creds reset, try your job again. When you get the
> error, go look in S3 before you shut down the Segue cluster. Shutting
> down the cluster is supposed to delete the temp buckets so it's
> important to look before stopping...
>
> -J
>

James Long

unread,

Oct 19, 2011, 4:03:54 PM10/19/11

to seg...@googlegroups.com

This type of issue part of why Segue is not on CRAN. It seems like
there are too many things that can go wrong in the process of tying
these tools (R, S3, EMR, desktop, cloud) together. One day I'll buy a
real cloud programmer beers and he'll explain to me all the magic
incantations that the pros use to add resilience and tolerance into
this type of code. Until then I'll just hope magic solves all my
errors :)

-J

Lucas Roberts

unread,

Oct 29, 2013, 6:40:34 PM10/29/13

to seg...@googlegroups.com

I've been getting the same error using segue on my mac but when I check the AWs console the buckets exist with the files but the files still need to be aggregated together to give a result on the R system on my local machine. Is there some obvious error? would updating my credentials help as a previous poster found helpful? Also is their a way to apply the reducer to the data in the S3 buckets without needing to rerun the entire process e.g. spawn cluster, pass off job and then wait?

Thank you in advance,

-Lucas

James Long

unread,

Oct 30, 2013, 9:14:58 AM10/30/13

to seg...@googlegroups.com

Lucas, Sorry that you're having troubles. Let me break down the
questions and see if I can understand your issues:

No Such Bucket error: It seems like the computations run and you have
output in the S3 bucket (do the files have serialized R output in
them?). However when Segue asks S3 for the contents of the bucket S3
is saying there is no such bucket. Does this error reproduce with a
simple example? This is not an obvious error to me and it would take
some digging on my part to try and reproduce and then see what's
happening. If you can help build a test case, that would help a lot.

Try updating your creds. Can't hurt. Might help. Very easy :) But I
have no theory as to why that would or would not work.

Is there a way to spawn the reducer to the data in the S3 buckets?
Nope. Not as Segue currently exists. If you want *real* map reduce
with Hadoop you'll need something other than Segue. Probably the
Hadoop packages from Revolution would make sense. Segue is really just
a parallel processing hack that happens to use EMR on the back end.

Keep me posted if you create a test case that reproduces the error. Or
if it magically goes away.

-J

> --
> You received this message because you are subscribed to the Google Groups
> "Segue for R" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to segue-r+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

miller...@gmail.com

unread,

Oct 30, 2013, 2:38:22 PM10/30/13

to seg...@googlegroups.com

I am also having the same problem...

emr.handle <- createCluster(numInstances=2 )
STARTING - 2013-10-29 22:53:51
STARTING - 2013-10-29 22:54:22
STARTING - 2013-10-29 22:54:53
STARTING - 2013-10-29 22:55:25
STARTING - 2013-10-29 22:55:56
STARTING - 2013-10-29 22:56:27
STARTING - 2013-10-29 22:56:58
STARTING - 2013-10-29 22:57:29
BOOTSTRAPPING - 2013-10-29 22:58:00
BOOTSTRAPPING - 2013-10-29 22:58:32
WAITING - 2013-10-29 22:59:03

Your Amazon EMR Hadoop Cluster is ready for action.
Remember to terminate your cluster with stopCluster().
Amazon is billing you!

emr.result <- emrlapply(emr.handle, data, jdm, taskTimeout=10)

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :

Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxxxxx, AWS Error Code: NoSuchBucket, AWS Error Message: The specified bucket does not exist, S3 Extended Request ID: xxxxxx

James Long

unread,

Oct 30, 2013, 3:06:44 PM10/30/13

to seg...@googlegroups.com

Thanks for posting error detail. I'll try to reproduce.

-J

miller...@gmail.com

unread,

Oct 30, 2013, 3:30:42 PM10/30/13

to seg...@googlegroups.com

Thank you! Please let me know what I can do to help. I'm also going to be spending 100% of my time on this issue today.

miller...@gmail.com

unread,

Oct 30, 2013, 4:44:15 PM10/30/13

to seg...@googlegroups.com

I have some updates that may be informative.

I spun up a cluster and tried to run Jeff Breen's example. To my great surprise, it actually worked on my first attempt.

> outputEmr   <- emrlapply(myCluster, myList, mean,  na.rm=T)

RUNNING - 2013-10-30 16:22:03
RUNNING - 2013-10-30 16:22:34
RUNNING - 2013-10-30 16:23:06
WAITING - 2013-10-30 16:23:37

I then tried to use my own function on the same cluster and it failed with the same error message as before.

At that point I deleted the result of the example and tried it again. That produced the same 404 error message again.

> outputEmr   <- emrlapply(myCluster, myList, mean,  na.rm=T)

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  Status Code: 404, AWS Service: Amazon S3, AWS Request ID: 8379F458DD96EC9B, AWS Error Code: NoSuchBucket, AWS Error Message: The specified bucket does not exist, S3 Extended Request ID: 1hjGApzfy5rd5JaM+mhhg35C/DUJ0qSa5V2uGXLjCV3tjTLfSUrM7zqsUCFKHCFH

So I shut down the cluster and spun up another, again running only the example code. That gave me the 404 error again. I tried it another 2 times and got the same error.

miller...@gmail.com

unread,

Oct 30, 2013, 7:46:29 PM10/30/13

to seg...@googlegroups.com

I tried changing Availability Zones. It was taking a long time to get available nodes anyway. That didn't work, but I did stumble onto an AWS page that may have the info you/we need to make Segue automagically select an AZ with available nodes -- http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html

For now I just tried hard-coding us-west-1a and got a slight variation on our earlier error message...

> emr.test <- createCluster(numInstances=10)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  Status Code: 400, AWS Service: AmazonElasticMapReduce, AWS Request ID: 1aa6f347-41bd-11e3-bd0a-1bbb88424050, AWS Error Code: ValidationError, AWS Error Message: Specified Availability Zone is not supported

On Wednesday, October 19, 2011 12:36:08 PM UTC-7, Timothy Dalbey wrote:

Reply all

Reply to author

Forward