Error setting up a static iglu repo

Sambhav Sharma

unread,

May 6, 2015, 2:40:04 PM5/6/15

to snowpl...@googlegroups.com

I am trying to setup a static iglu repository to test custom unstructured events so that they are added to a separate table in postgresql (for example atomic.test)

But when I am running the emr-etl I am getting the following error stacktrace:

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at com.twitter.scalding.Job$.apply(Job.scala:47)
	at com.twitter.scalding.Tool.getJob(Tool.scala:48)
	at com.twitter.scalding.Tool.run(Tool.scala:68)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner$.main(JobRunner.scala:33)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner.main(JobRunner.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: com.snowplowanalytics.snowplow.enrich.common.FatalEtlError: NonEmptyList(error: Resolver configuration failed JSON Schema validation
    level: "error"
, error: Verifying schema as iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0 failed: found iglu:com.bd/resolver-config/jsonschema/1-0-0
    level: "error"
)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:139)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:139)
	at scalaz.Validation$class.fold(Validation.scala:64)
	at scalaz.Failure.fold(Validation.scala:330)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob.<init>(EtlJob.scala:138)

... 15 more

This my emr-etl config file:

:logging:

:level: DEBUG # You can optionally switch to INFO for production

:aws:

:access_key_id: XXXX

:secret_access_key: XXXX

:s3:

:region: us-west-2

:buckets:

:assets: s3://snowplow-hosted-assets # DO NOT CHANGE unless you are hosting the jarfiles etc yourself in your own bucket

:log: s3://snowplow-dev/logs

:raw:

:in: s3://snowplowpoc-logs

:processing: s3://snowplow-dev/processing

:archive: s3://snowplow-dev-archive/raw # e.g. s3://my-archive-bucket/raw

:enriched:

:good: s3://snowplow-dev-out/enriched/good # e.g. s3://my-out-bucket/enriched/good

:bad: s3://snowplow-dev-out/enriched/bad # e.g. s3://my-out-bucket/enriched/bad

:errors: # Leave blank unless :continue_on_unexpected_error: set to true below

:shredded:

:good: s3://snowplow-dev-out/shredded/good # e.g. s3://my-out-bucket/shredded/good

:bad: s3://snowplow-dev-out/shredded/bad # e.g. s3://my-out-bucket/shredded/bad

:errors: # Leave blank unless :continue_on_unexpected_error: set to true below

:emr:

:ami_version: 2.4.2 # Choose as per http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-ami.html

:region: us-west-2 # Always set this

:jobflow_role: EMR_EC2_DefaultRole # Created using $ aws emr create-default-roles

:service_role: EMR_DefaultRole # Created using $ aws emr create-default-roles

:placement: # Set this if not running in VPC. Leave blank otherwise

:ec2_subnet_id: subnet-68ba390d # Set this if running in VPC. Leave blank otherwise

:ec2_key_name: tanay

:bootstrap: [] # Set this to specify custom boostrap actions. Leave empty otherwise

:software:

:hbase: "0.92.0" # To launch on cluster, provide version, "0.92.0", keep quotes

:lingual: "1.1" # To launch on cluster, provide version, "1.1", keep quotes

# Adjust your Hadoop cluster below

:jobflow:

:master_instance_type: m1.small

:core_instance_count: 1

:core_instance_type: m1.small

:task_instance_count: 0 # Increase to use spot instances

:task_instance_type: m1.small

:task_instance_bid: # In USD. Adjust bid, or leave blank for non-spot-priced (i.e. on-demand) task instances

:etl:

:job_name: Snowplow Dev ETL # Give your job a name

:versions:

:hadoop_enrich: 0.14.1 # Version of the Hadoop Enrichment process

:hadoop_shred: 0.4.0 # Version of the Hadoop Shredding process

:collector_format: cloudfront # Or 'clj-tomcat' for the Clojure Collector, or 'thrift' for Thrift records, or 'tsv/com.amazon.aws.cloudfront/wd_access_log' for Cloudfront access logs

:continue_on_unexpected_error: false # Set to 'true' (and set :out_errors: above) if you don't want any exceptions thrown from ETL

:iglu:

:schema: iglu:com.bd/resolver-config/jsonschema/1-0-0

:data:

:cache_size: 500

:repositories:

- :name: "BD Test"

:priority: 0

:vendor_prefixes:

- com.bd

:connection:

:http:

:uri: http://iglu.s3.amazonaws.com

This is how I am sending data to snowplow (I am using cloud front collector)

big_decisions('trackUnstructEvent',{

"schema": "iglu:com.bd/test/jsonschema/1-0-0",

"data": {

"name": "Sam",

"email": "X...@XX.com",

"message": "test"

}

});

Path to Schema:

http://iglu.s3.amazonaws.com/com.bd/test/jsonschema/1-0-0

This how the schema looks:

{

"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",

"self": {

"vendor": "com.bd",

"name": "test",

"format": "jsonschema",

"version": "1-0-0"

},

"type": "object",

"properties": {

"name": {

"type": "string"

},

"email": {

"type": "string"

},

"message": {

"type": "string"

}

},

"required": ["name"],

"additionalProperties": false

}

I have also copied the resolver config to my static iglu repo.

Kindly guide me through the process. I am stuck here for days now.

Thank you.

BigDecisions - Take control by making smarter data driven decisions.

Follow us on

Twitter | Facebook | LinkedIn

This message and its attachments may contain legally privileged or confidential information. It is intended solely for the named addressee. If you are not the addressee indicated in this message (or responsible for delivery of the message to the addressee), you may not copy or deliver this message or its attachments to anyone. Rather, you should permanently delete this message and its attachments and kindly notify the sender by reply e-mail. Any content of this message and its attachments that does not relate to the official business of News Corporation or its subsidiaries must be taken not to have been sent or endorsed by any of them. No representation is made that this email or its attachments are without defect.

Alex Dean

unread,

May 6, 2015, 3:54:28 PM5/6/15

to snowpl...@googlegroups.com

Hi Sambhav,

:repositories: in the config takes an array so that you can add your Iglu repostiory alongside the existing Iglu Central, rather than removing Iglu Central and copying Iglu Central schemas across.

The error message tells you what is going on:

Resolver configuration failed JSON Schema validation
level: "error"
, error: Verifying schema as iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0 failed: found iglu:com.bd/resolver-config/jsonschema/1-0-0

Put Iglu Central back into the mix and revert your schema changes and you should be good to go!

A

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Sambhav Sharma

unread,

May 7, 2015, 4:11:01 PM5/7/15

to snowpl...@googlegroups.com

Hi,

Thanks, it worked fine. The cluster completed all steps and terminated successfully.

Reply all

Reply to author

Forward