Mysterious error 403 while EMR step

99 views
Skip to first unread message

Eugene Polyhaev

unread,
Jun 16, 2015, 7:12:34 AM6/16/15
to snowpl...@googlegroups.com
Hello, I am upgrading from 0.9.7 to r65. 
0.9.7 worked very well for more than half year.

I have performed all update recommendations for each intermediate release in between the mentioned two. EMR_EC2_DefaultRole, EMR_DefaultRole are generated as adviced.

Now, when I run 

bundle exec bin/snowplow-emr-etl-runner --skip shred --config config/config.yml --enrichments config/enrichments --debug

I get the following message.

D, [2015-06-16T10:55:23.497222 #24040] DEBUG -- : Waiting a minute to allow S3 to settle (eventual consistency)
D, [2015-06-16T10:56:23.497708 #24040] DEBUG -- : Initializing EMR jobflow
F, [2015-06-16T10:56:23.876479 #24040] FATAL -- :

RestClient::Forbidden (403 Forbidden):
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/rest-client-1.7.3/lib/restclient/abstract_response.rb:48:in `return!'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/rest-client-1.7.3/lib/restclient/request.rb:495:in `process_result'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/rest-client-1.7.3/lib/restclient/request.rb:421:in `block in transmit'
    /usr/local/rvm/rubies/ruby-1.9.3-p547/lib/ruby/1.9.1/net/http.rb:746:in `start'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/rest-client-1.7.3/lib/restclient/request.rb:413:in `transmit'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/rest-client-1.7.3/lib/restclient/request.rb:176:in `execute'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/rest-client-1.7.3/lib/restclient/request.rb:41:in `execute'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/rest-client-1.7.3/lib/restclient.rb:69:in `post'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/elasticity-4.0.5/lib/elasticity/aws_request.rb:34:in `submit'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/elasticity-4.0.5/lib/elasticity/emr.rb:191:in `run_job_flow'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/elasticity-4.0.5/lib/elasticity/job_flow.rb:147:in `run'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:294:in `run'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `block in common_method_added'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:60:in `run'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'
    /opt/snowplow/repo/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `block in common_method_added'
    bin/snowplow-emr-etl-runner:39:in `<main>'

It feels like Amazon does not allow my EMR user to perform some operation, but I can't identify what gets performed and how to grant corresponding permissions.
Could you please help?

Here's my config:

:logging:
  :level: DEBUG # You can optionally switch to INFO for production
:aws:
  :access_key_id: XXXXX
  :secret_access_key: XXXXXX
:s3:
  :region: us-east-1
  :buckets:
    :assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket
    :log: s3://mybucket1/log
    :raw:
      :in: s3://elasticbeanstalk-us-east-1-mybucket2/resources/environments/logs/publish/e-myenv1
      :processing: s3://mybucket1/processing
      :archive: s3://mybucket1/archive    # e.g. s3://my-archive-bucket/raw
    :enriched:
      :good: s3://mybucket1/out       # e.g. s3://my-out-bucket/enriched/good
      :bad: s3://mybucket1/bad        # e.g. s3://my-out-bucket/enriched/bad
      :errors: s3://mybucket1/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
    :shredded:
      :good: s3://mybucket1/out       # e.g. s3://my-out-bucket/shredded/good
      :bad: s3://mybucket1/bad        # e.g. s3://my-out-bucket/shredded/bad
      :errors: s3://mybucket1/errors     # Leave blank unless :continue_on_unexpected_error: set to true below
:emr:
  :region: us-east-1        # Always set this
  :jobflow_role: EMR_EC2_DefaultRole # Added in r64-palila, http://snowplowanalytics.com/blog/2015/04/16/snowplow-r64-palila-released/
  :service_role: EMR_DefaultRole # Added in r64-palila, http://snowplowanalytics.com/blog/2015/04/16/snowplow-r64-palila-released/
  :placement: us-east-1b     # Set this if not running in VPC. Leave blank otherwise
#  :ec2_subnet_id: ADD HERE # Set this if running in VPC. Leave blank otherwise
  :ec2_key_name: emr-keypair
  :bootstrap: []
  :software:
    :hbase: "0.92.0"                # To launch on cluster, provide version, "0.92.0", keep quotes
    :lingual: "1.1"             # To launch on cluster, provide version, "1.1", keep quotes
  # Adjust your Hadoop cluster below
  :jobflow:
    :master_instance_type: m1.medium
    :core_instance_count: 2
    :core_instance_type: m1.medium
    :task_instance_count: 0 # Increase to use spot instances
    :task_instance_type: m1.medium
    :task_instance_bid: 0.015 # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances
:etl:
  :job_name: Snowplow ETL # Give your job a name
  :versions:
    :hadoop_enrich: 0.14.1 # Version of the Hadoop Enrichment process
    :hadoop_shred: 0.4.0 # Version of the Hadoop Shredding process
  :collector_format: clj-tomcat # Or 'clj-tomcat' for the Clojure Collector
  :continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL
:iglu:
  :schema: iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0
  :data:
    :cache_size: 500
    :repositories:
      - :name: "Iglu Central"
        :priority: 0
        :vendor_prefixes:
          - com.snowplowanalytics
        :connection:
          :http:
            :uri: http://iglucentral.com

Alex Dean

unread,
Jun 16, 2015, 7:15:14 AM6/16/15
to snowpl...@googlegroups.com
Hey Eugene!

The tail-end of this thread should help you out:

https://groups.google.com/forum/#!topic/snowplow-user/iMrsNpxCr3w

It's a new IAM permission you need.

Cheers,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Eugene Polyhaev

unread,
Jun 16, 2015, 8:15:47 AM6/16/15
to snowpl...@googlegroups.com
Thank you for this reference, I should have used search properly.

In wiki it was hard to find the page that corresponded to solution, here's link for the future: https://github.com/snowplow/snowplow/wiki/Setup-IAM-permissions-for-operating-Snowplow
Reply all
Reply to author
Forward
0 new messages