EMR instantiation fails with: "Terminated with errorsMaster instance (i-XXXX) failed attempting to download bootstrap action 1 file from S3" unable to run in eu-central-1

1,467 views

Skip to first unread message

Jürgen Weber

unread,

Mar 8, 2016, 11:12:03 PM3/8/16

to snowpl...@googlegroups.com

I continue my fight with snowplow; I solved one error found here https://groups.google.com/forum/#!topic/snowplow-user/HaOqdjoWVYo but I am still having issues with the custom bootstrap. I figured it was more prudent to start a new thread since the error is actually different.

The error I am receiving on the CLI is:

D, [2016-03-09T01:47:40.171000 #22771] DEBUG -- : Waiting a minute to allow S3 to settle (eventual consistency)
D, [2016-03-09T01:48:40.177000 #22771] DEBUG -- : Initializing EMR jobflow
D, [2016-03-09T01:48:42.091000 #22771] DEBUG -- : EMR jobflow j-1X3D78SQAAMIB started, waiting for jobflow to complete...
I, [2016-03-09T01:48:42.095000 #22771]  INFO -- : SnowplowTracker::Emitter initialized with endpoint http://snowplow.XXXX.com:80/i
I, [2016-03-09T01:48:42.264000 #22771]  INFO -- : Attempting to send 1 request
I, [2016-03-09T01:48:42.266000 #22771]  INFO -- : Sending GET request to http://snowplow.carzada.com:80/i…
…
I, [2016-03-09T02:04:25.037000 #22771]  INFO -- : GET request to http://snowplow.carzada.com:80/i finished with status code 200
I, [2016-03-09T02:08:25.495000 #22771]  INFO -- : Attempting to send 1 request
I, [2016-03-09T02:08:25.496000 #22771]  INFO -- : Sending GET request to http://snowplow.carzada.com:80/i...
I, [2016-03-09T02:08:25.538000 #22771]  INFO -- : GET request to http://snowplow.carzada.com:80/i finished with status code 200
W, [2016-03-09T02:08:25.661000 #22771]  WARN -- : Job failed. 0 tries left...
F, [2016-03-09T02:08:25.661000 #22771] FATAL -- :

Snowplow::EmrEtlRunner::BootstrapFailureError (EMR jobflow j-2J8AHYK2XFP76 failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow ETL: TERMINATING [BOOTSTRAP_FAILURE] ~ elapsed time n/a [ - ]
 - 1. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 2. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]
 - 3. Elasticity S3DistCp Step: Enriched HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 4. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 5. Elasticity Scalding Step: Enrich Raw Events: CANCELLED ~ elapsed time n/a [ - ]):
    /root/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:465:in `run'
    /root/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /root/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /root/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    /root/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:68:in `run'
    /root/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /root/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /root/snowplow/snowplow-emr-etl-runner!/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `common_method_added'
    file:/root/snowplow/snowplow-emr-etl-runner!/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `(root)'
    org/jruby/RubyKernel.java:1091:in `load'
    file:/root/snowplow/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    org/jruby/RubyKernel.java:1072:in `require'
    file:/root/snowplow/snowplow-emr-etl-runner!/META-INF/main.rb:1:in `(root)'
    /tmp/jruby3956567750091989649extract/jruby-stdlib-1.7.20.1.jar!/META-INF/jruby.home/lib/ruby/shared/rubygems/core_ext/kernel_require.rb:1:in `(root)'

but when looking at the cluster list/in the cluster details I see this:

Terminated with errorsMaster instance (i-5c2cb9e0) failed attempting to download bootstrap action 1 file from S3

The boot strap action/CLI export shows it is trying to get this file, etc 'aws emr create-cluster --applications Name=Hadoop --bootstrap-actions '[{"Path":"s3://snowplow-hosted-assets/common/emr/snowplow-ami4-bootstrap-0.1.0.sh”…..’. I figured it was permissions to get to the bucket but I do not see how that is possible. I am able to download it directly from any machine directly from the bucket and from the cloud front endpoint.

# wget http://d2io1hx8u877l0.cloudfront.net/common/emr/snowplow-ami4-bootstrap-0.1.0.sh

--2016-03-09 03:44:12-- http://d2io1hx8u877l0.cloudfront.net/common/emr/snowplow-ami4-bootstrap-0.1.0.sh

Resolving d2io1hx8u877l0.cloudfront.net (d2io1hx8u877l0.cloudfront.net)... 54.239.168.76, 54.239.168.114, 54.239.168.153, ...

Connecting to d2io1hx8u877l0.cloudfront.net (d2io1hx8u877l0.cloudfront.net)|54.239.168.76|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 1941 (1.9K) [application/x-sh]

Saving to: ‘snowplow-ami4-bootstrap-0.1.0.sh.1’

100%[======================================================================================================================================>] 1,941 --.-K/s in 0s

2016-03-09 03:44:12 (526 MB/s) - ‘snowplow-ami4-bootstrap-0.1.0.sh.1’ saved [1941/1941]

root@ip-10-0-1-174:~# aws --region us-east-1 s3 cp s3://snowplow-hosted-assets/common/emr/snowplow-ami4-bootstrap-0.1.0.sh .

download: s3://snowplow-hosted-assets/common/emr/snowplow-ami4-bootstrap-0.1.0.sh to ./snowplow-ami4-bootstrap-0.1.0.sh

The problem is there is no error, it does not tell me why it can not download it. I can find no output anywhere.

Unless there is of course an issue with the snowplow emr runner + elasticity and it is not setting the region/the region is wrong when trying to download from the bucket using that URL, the thing is for us our primary setup is in Frankfurt, the buckets for this process are in Ireland now (due to the v4 auth issue). I changed my emr: region to us-east-1 and placement -t use-east-1b and this got much further. The CLI gave me the same error but the cluster got passed the bootstrap phase. The new error I received with this failure is: 'Terminated with errorsShut down as step failed’ and this error looks like a configuration error. So the question is why can I not launch this in Frankfurt? You have a bucket permissions region issue.

Lastly our config:

aws:

access_key_id: XXXX

secret_access_key: XXXX

s3:

region: eu-west-1

buckets:

assets: s3://snowplow-hosted-assets

jsonpath_assets:

log: s3://XXXX-snowplow-analytics-production/logs

raw:

in:

- s3://XXXX-snowplow-analytics-production/instance-logs/publish/e-skrw2fappv/i-XXXX

processing: s3://XXXX-snowplow-analytics-production/raw/processing

archive: s3://XXXX-snowplow-analytics-production/raw/archive

enriched:

good: s3://XXXX-snowplow-analytics-production/enriched/good

bad: s3://XXXX-snowplow-analytics-production/enriced/bad

errors:

archive: s3://XXXX-snowplow-analytics-production/enriched/archive

shredded:

good: s3://XXXX-snowplow-analytics-production/shredded/good

bad: s3://XXXX-snowplow-analytics-production/shredded/bad

errors:

archive: s3://XXXX-snowplow-analytics-production/shredded/archive

emr:

ami_version: 4.3.0

region: eu-central-1

jobflow_role: EMR_EC2_DefaultRole

service_role: EMR_DefaultRole

placement: eu-central-1b

ec2_subnet_id:

ec2_key_name: XXXX

bootstrap: []

software:

hbase:

lingual:

jobflow:

master_instance_type: m3.xlarge

core_instance_count: 2

core_instance_type: m3.xlarge

task_instance_count: 0

task_instance_type: m3.xlarge

task_instance_bid: 0.015

bootstrap_failure_tries: 3

additional_info:

collectors:

format: clj-tomcat

enrich:

job_name: Snowplow ETL

versions:

hadoop_enrich: 1.6.0

hadoop_shred: 0.8.0

hadoop_elasticsearch: 0.1.0

continue_on_unexpected_error: false

output_compression: NONE

storage:

download:

folder:

targets:

- name: snowplow

type: redshift

host: XXXX

database: dwh

port: 5476

ssl_mode: verify-ca

table: atomic.events

username: XXXX

password: XXXX

maxerror: 100

comprows: 200000

monitoring:

tags: {}

logging:

level: DEBUG

snowplow:

method: get

app_id: snowplow

collector: XXXX.com

any ideas to why eu-central-1 fails?

Thanks

Alex Dean

unread,

Mar 9, 2016, 8:27:08 PM3/9/16

to Snowplow

Hi Juergen,

It definitely sounds like some kind of bug - please raise a ticket in GitHub!

Thanks,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Reply all

Reply to author

Forward

0 new messages