turning off ip anonymization

26 views
Skip to first unread message

Daniel Weitzenfeld

unread,
Mar 30, 2016, 10:14:43 PM3/30/16
to Snowplow
Hi,
Is the correct way to turn off ip anonymization to set enabled: false in the anon_ip.json, or to delete the json entirely? 

I tried the former and have been getting strange errors in the EMR-enrich - the job terminates after about 15 minutes, and none of the steps generate any logs at all.  

The job itself says:

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-T6FX866FLCJR failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.

Snowplow ETL: TERMINATED_WITH_ERRORS [] ~ elapsed time n/a [ - 2016-03-31 01:56:37 +0000]

 - 1. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]

 - 2. Elasticity Scalding Step: Shred Enriched Events: CANCELLED ~ elapsed time n/a [ - ]

 - 3. Elasticity S3DistCp Step: Enriched HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]

 - 4. Elasticity Scalding Step: Enrich Raw Events: CANCELLED ~ elapsed time n/a [ - ]

 - 5. Elasticity S3DistCp Step: Raw S3 -> HDFS: CANCELLED ~ elapsed time n/a [ - ]):

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:315:in `run'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `block in common_method_added'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:60:in `run'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/method_reference.rb:46:in `send_to'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts.rb:305:in `call_with'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/vendor/bundle/ruby/1.9.1/gems/contracts-0.7/lib/contracts/decorators.rb:159:in `block in common_method_added'

    /home/ubuntu/snowplow/3-enrich/emr-etl-runner/bin/snowplow-emr-etl-runner:39:in `<main>'



-Dan 

Ihor Tomilenko

unread,
Mar 30, 2016, 10:39:17 PM3/30/16
to Snowplow
Hi Daniel,

Either way should do.

Have you checked the Hadoop logs for more details on the reason for EMR failure as suggested in the error message: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce?

The article has outdated screenshots but should give you an idea what to look for. If you go to EMR service on AWS you should be able to see the cluster list with the name you gave to it in the configuration file. It will be accompanied by the job ID. Click on the one which has failed and extend the Steps section. It should list all the attempted steps and their statuses. From there you should be able to access the corresponding logs too.

Regards,
Ihor
Reply all
Reply to author
Forward
0 new messages