Error during enrichment: Could not find schema with key iglu:com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0 in any repository

611 views
Skip to first unread message

saianirudh kantabathina

unread,
Mar 22, 2016, 1:48:48 PM3/22/16
to Snowplow
Hi Snowplow team,
I am currently blocked at the enrichment step please find the attached resolver.json. Let me know If i need to make any additional changes. Let me know if there are any changes required to handle corporate proxy,I am unable to narrow down the exact lines of code in scala where the download request is made to  iglucentral.com . wget from ec2 instance works fine though.

Exception in thread "main" java.lang.RuntimeException: NonEmptyList(error: NonEmptyList(error: Could not find schema with key iglu:com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0 in any repository, tried:
    level: "error"
    repositories: ["Iglu Client Embedded [embedded]","Iglu Central [HTTP]"]
)
    level: "error"
)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$$anonfun$10.apply(KinesisEnrichApp.scala:151)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$$anonfun$10.apply(KinesisEnrichApp.scala:151)
at scalaz.Validation$class.fold(Validation.scala:64)
at scalaz.Failure.fold(Validation.scala:330)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$delayedInit$body.apply(KinesisEnrichApp.scala:150)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$.main(KinesisEnrichApp.scala:77)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp.main(KinesisEnrichApp.scala)
resolver.json

Ihor Tomilenko

unread,
Mar 22, 2016, 2:32:57 PM3/22/16
to Snowplow
Hi Saianirudh,

Could you share you config.yml (with sensitive info removed), please?

Regards,
Ihor

saianirudh kantabathina

unread,
Mar 22, 2016, 3:03:23 PM3/22/16
to Snowplow

Please see the below text of my conf file  and also I assume its not yml extension I just had a .conf extension. I am not sure if that matters 

# Default Configuration for Scala Kinesis Enrich.

enrich {
  # Sources currently supported are:
  # 'kinesis' for reading Thrift-serialized records from a Kinesis stream
  # 'stdin' for reading Base64-encoded Thrift-serialized records from stdin
  source = "kinesis"

  # Sinks currently supported are:
  # 'kinesis' for writing enriched events to one Kinesis stream and invalid events to another.
  # 'stdouterr' for writing enriched events to stdout and invalid events to stderr.
  #    Using "sbt assembly" and "java -jar" is recommended to disable sbt
  #    logging.
  sink = "kinesis"

  # AWS credentials
  #
  # If both are set to 'cpf', a properties file on the classpath is used.
  #
  # If both are set to 'iam', use AWS IAM Roles to provision credentials.
  #
  # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  aws {
    access-key: "iam"
    secret-key: "iam"
  }

  streams {
    in: {
      raw: "{{mystream-good}}"

      # After enrichment, are accumulated in a buffer before being sent to Kinesis.
      # The buffer is emptied whenever:
      # - the number of stored records reaches record-limit or
      # - the combined size of the stored records reaches byte-limit or
      # - the time in milliseconds since it was last emptied exceeds time-limit when
      #   a new event enters the buffer
      buffer: {
        byte-limit: 500
        record-limit: 1
        time-limit: 500
      }
    }

    out: {
      enriched: "{{mystream-enriched}}"
      bad: "{{mystreams-bad}}"

      # Minimum and maximum backoff periods
      # - Units: Milliseconds
      backoffPolicy: {
        minBackoff: 10
        maxBackoff: 100
      }
    }

    # "app-name" is used for a DynamoDB table to maintain stream state.
    # You can set it automatically using: "SnowplowKinesisEnrich-$\\{enrich.streams.in.raw\\}"
    app-name: "{{enrichStreamsAppName}}"

    # LATEST: most recent data.
    # TRIM_HORIZON: oldest available data.
    # Note: This only effects the first run of this application
    # on a stream.
    initial-position = "TRIM_HORIZON"

    region: "{{my-region}}"
my.conf

Alex Dean

unread,
Mar 22, 2016, 8:33:59 PM3/22/16
to Snowplow
Hi Saianirudh,

The file ext on the config.yml doesn't matter; your iglu_resolver looks good - if I take that path:

http://iglucentral.com

And add on /schemas and then the schema:

http://iglucentral.com/schemas/
com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0

Then I can confirm that I can access that file.

So it's a bit of a puzzle - what is this corporate proxy you mention?

A

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

saianirudh kantabathina

unread,
Mar 23, 2016, 12:01:30 AM3/23/16
to Snowplow

Hi Alex, 
I made a curl and wget to the schema-link  from ec2 instance and was able to download. But when I run jar somehow I still see the same error.  It could be because we have a corporate proxy  which we need to accommodate while making an http request.
I  successfully made changes to snowplow code to deal with corporate proxy when communicating with kinesis but unfortunately during the enrichment step I am unable to identify the scala file where download(schema) request is made. 
Hope this gives an idea. Let me know if you need more details.

Thanks,
Sai

Anton Parkhomenko

unread,
Mar 24, 2016, 2:46:29 AM3/24/16
to Snowplow
Hello Saianirudh,

I afraid I don't know what exactly happens during your enrichment process, but at least I think I can help you with question about how download of Schema is happenning. 

I assume you're using Scala Kinesis Enrich version 0.6.0 (pre-r78) which includes Scala Iglu Client of version 0.1.1 and Scala Common Enrich 0.15 (however, future versions work similar way).

Scala Iglu Client is basically an HTTP client specialized on resolving Schemas from Iglu repositories. It provides Resolver class with lookupSchema method, which is responsible for requesting Iglu server (or getting it from cache). Many methods inside Kinesis Enrich and Common Enrich passing around Resolver object implicitly, but all actual downloading and storing happens inside Resolver. For example, when you see invoke of verifySchemaAndValidate you need to know it implicitly accepts Resolver object which will provide (download or get from cache) required Schema.

What basically happens before your exception in KinesisEnrichApp: it tries to parse and create an EnrichmentRegistry in order to do that it calls lookupSchema on
implicit Resolver object, which returns a failure (scalaz Validation) about tried repositories which is thrown as RuntimeException and which you can see in traceback.

I hope it help,
Anton

saianirudh kantabathina

unread,
Mar 24, 2016, 2:24:09 PM3/24/16
to Snowplow
Hi Anton,
Thanks a lot for your detailed explanation, Which helped me in identifying Scala Iglu Client is the place   where I need to make code changes(to make code Proxy aware). How do I make snowplow code base talk to my own custom compiled version of scala Iglu client ?
On sbt compile where do I place the jar file in snowplow code. I am new to scala, Please bear with me on this.

Will look forward to hear from you.

Thanks,
Sai

Alex Dean

unread,
Mar 24, 2016, 9:25:22 PM3/24/16
to Snowplow
Hi Saianirudh,

Is the work you're doing on Iglu Scala Client something that we should be integrating into Snowplow to support companies running Snowplow behind corporate firewalls, or is it something specific to your organization?

If it's the latter, then an alternative solution you may prefer is just to keep a copy of Iglu Central inside your corporate firewall. It should be pretty straightforward to do that and keep it up to date, versus maintaining a whole hierarchy of forked libraries/apps...

Cheers,

Alex

saianirudh kantabathina

unread,
Mar 25, 2016, 9:47:28 AM3/25/16
to Snowplow
Hi Alex,
Once I get this working completely. Will submit a pull request on Readme file on what additional changes need to be made to facilitate corporate proxy. Infact an ideal way this  would work is to pull proxy parameters from a config file but that can be one more pull request in long term.

Thanks,
Sai 

Alex Dean

unread,
Mar 25, 2016, 3:07:21 PM3/25/16
to Snowplow
Sounds good, I've created a ticket to track this: https://github.com/snowplow/iglu-scala-client/issues/52

saianirudh kantabathina

unread,
Mar 25, 2016, 4:55:14 PM3/25/16
to Snowplow

Hi Anton & Alex,
I am able to generate jar file for iglu-scala-client. Please let me know steps on linking it  with scala-kinesis-enrich repo. It will unblock me, Will look forward to hear from you soon.

Thanks,
Sai 

On Wednesday, March 23, 2016 at 11:46:29 PM UTC-7, Anton Parkhomenko wrote:

Anton Parkhomenko

unread,
Mar 25, 2016, 5:15:09 PM3/25/16
to snowpl...@googlegroups.com
Hello Saianirudh,

Process of linking iglu-scala-client into SKE is fairly simple and includes following steps:

1. In iglu-scala-client
  a. Make your changes
  b. Change a version to something unique, like 0.5.0-PROXY
  c. Publish it locally (sbt publishLocal); It should appear now in your ~/.ivy2/local

2. In scala-common-enrich:
  a. Change iglu-scala-client dependency to your version
  b. Change scala-common-enrich version to something unique, like 0.22.0-PROXY
  c. Publish it locally

3. In stream-enrich (scala-kinesis-enrich):
  a. Change versions of iglu-scala-client and scala-common-enrich to your versions
  b. Assembly it! (sbt assembly)

That’s it. You have a fatjar compiled in snowplow/3-enrich/stream-enrich/target/scala-2.10.
Please, also notice that I’m referring to latest versions of Snowplow platform and Iglu Client, and in your previous message you mentioned Scala Kinesis Enrich. Scala Kinesis Enrich was renamed into Stream Enrich in current r78 release. I think it would be better to use latest versions if you’re going to make a PR.

Cheers,
Anton

26 марта 2016 г., в 3:55, saianirudh kantabathina <emai...@gmail.com> написал(а):

saianirudh kantabathina

unread,
Mar 29, 2016, 2:02:15 PM3/29/16
to Snowplow

Hi Anton & Alex,
Thanks for all the help provided.I made the code changes and got over the issue caused by corporate proxy, In-Detail explanation really helped. I am currently blocked on the below issues :

1. Exception in thread "main" java.net.UnknownHostException: ip-10-203-208-252: ip-10-203-208-252: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1496)
at com.snowplowanalytics.snowplow.enrich.kinesis.sources.KinesisSource.run(KinesisSource.scala:81)
Explanation: To get over above issue, I hard coded  InetAddress.getLocalHost().getCanonicalHostName() to "http://10.203.208.252" in KinesisSource.scala . To get over this issue gracefully I am sure I need to make some changes to /etc/hosts but  am not clear on what changes have to be made.

2. If I hard code InetAddress.getLocalHost().getCanonicalHostName() to "http://10.203.208.252" in KinesisSource.scala then I see this issue :
Exception in thread "main" com.amazonaws.AmazonClientException: Unable to load credentials from Amazon EC2 metadata service
at com.amazonaws.auth.InstanceProfileCredentialsProvider.handleError(InstanceProfileCredentialsProvider.java:244)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.loadCredentials(InstanceProfileCredentialsProvider.java:225)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:124)
at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2478)
at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:867)
at io.github.cloudify.scala.aws.kinesis.ClientImpl$$anonfun$execute$3.apply(Client.scala:88)
at io.github.cloudify.scala.aws.kinesis.ClientImpl$$anonfun$execute$3.apply(Client.scala:85)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Server returned HTTP response code: 503 for URL: http://169.254.169.254/latest/meta-data/iam/security-credentials/
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1677)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1675)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1673)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1246)
at com.amazonaws.internal.EC2MetadataClient.readResponse(EC2MetadataClient.java:113)
at com.amazonaws.internal.EC2MetadataClient.readResource(EC2MetadataClient.java:92)
at com.amazonaws.internal.EC2MetadataClient.getDefaultCredentials(EC2MetadataClient.java:55)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.loadCredentials(InstanceProfileCredentialsProvider.java:186)
... 12 more
Caused by: java.io.IOException: Server returned HTTP response code: 503 for URL: http://169.254.169.254/latest/meta-data/iam/security-credentials/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1628)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at com.amazonaws.internal.EC2MetadataClient.readResponse(EC2MetadataClient.java:110)
Explanation: My Initial thoughts were my EC2 instance doesn't have appropriate IAM role. But I am running Scala collector from the same instance successfully. and I am able to run a GET request from commandline in EC2 to 
 
successfully

Will look forward to hear from you soon.

Thanks,
Sai

saianirudh kantabathina

unread,
Mar 29, 2016, 5:22:04 PM3/29/16
to Snowplow
Hi Anton & Alex,

I am able to move ahead and resolve below mentioned Issue2, still curious on Issue1.., Let me know if I need to make any changes to /etc/hosts. I know hardcoding is not the right solution and currently stuck at a point where we are trying to get access to dynamodb. 

Will post updates as I get the suite running completely and also will work on pull request Once All hurdles are overcome.
 
Thanks for all support ,
Sai

Anton Parkhomenko

unread,
Mar 30, 2016, 11:29:58 AM3/30/16
to Snowplow
Hello Saianirudh,

I afraid your problem is very tightly coupled with your actual network architecture. 
Just a little guess: do you hardcoded canonical host name as "http://10.203.208.252"? Obviously this is not a valid host name because of "http://" prefix.
Also, what is your result workerId? 
And if you are going to continue your investigation with hardcoded canonical host name, make sure you inserted IP for right network interface, you may have several ones.

I hope it help,
Anton


saianirudh kantabathina

unread,
Mar 30, 2016, 1:21:48 PM3/30/16
to Snowplow
Hi Anton,
I removed http and my workedId looks like: 10.203.208.252:258025ed-1548-4b7d-affb-0d39d73b45fc . I am able to get access to dynamodb and please the attached log file which shows the point of failure, I am unable to exactly point out the issue. I turned on the debug-mode to clearly see errors. I think we are at the last step of enrichment and please bear with us. Let me know your thoughts. Will look forward to hear from you.

Thanks,
Sai
Error.log

Anton Parkhomenko

unread,
Mar 30, 2016, 1:48:10 PM3/30/16
to snowpl...@googlegroups.com
Hi Saianirudh,

Seems this one will little bit harder. Is it possible for you to share somehow (ideally though Github) changes you made in source code? I also think your enrich.conf could be useful as well.

Not probably cause of your problems, but I also advice you to use Oracle JDK instead of Open JDK. Later one is constant source of unexpected problems.

Cheers,
Anton
 
30 марта 2016 г., в 20:21, saianirudh kantabathina <emai...@gmail.com> написал(а):

<Error.log>

saianirudh kantabathina

unread,
Mar 30, 2016, 3:25:32 PM3/30/16
to Snowplow
Hi Anton,
Please find this github url :https://github.com/devsaik/snowplow-modified-files . I just added stream-enrich related files let me know if you need iglu-client or collector related files and also dependencies or buildsettings. I will change the open-jdk to Oracle JDK in few mins and run again.

I anonymized proxy-settings.  

Thanks for all the help. Look forward to hear from you

---
Sai  
...

saianirudh kantabathina

unread,
Apr 1, 2016, 7:34:30 PM4/1/16
to Snowplow
Hi Anton,
Just to give a perspective: I am able to run this javascript code from ec2 instance :https://github.com/devsaik/aws-dynamodb-proxy
Thanks,
Sai

saianirudh kantabathina

unread,
Apr 4, 2016, 5:03:41 PM4/4/16
to Snowplow
Hi Anton & Alex,

My Issue is resolved and I am able to see the enriched stream now. I made code changes to KinessisSource.scala to facilitate proxy by looking at code here .

Thanks for helping me through the process. As I move on to next step, I might have new questions will create a new post as needed. Will add all code changes to a pull request once I reach the MVP and will use .config file to pull proxy settings. Will take advice from either of you as required.

Thanks for your guidance and help.
---
Sai

Alex Dean

unread,
Apr 4, 2016, 8:13:00 PM4/4/16
to Snowplow
Great to hear Saianirudh! We look forward to your PR in due course...

Cheers,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anton Parkhomenko

unread,
Apr 5, 2016, 3:35:46 AM4/5/16
to snowpl...@googlegroups.com
Hey Saianirudh!

I’m glad everything works now! Feel free to ask if you have any further questions.

Regards,
Anton

5 апр. 2016 г., в 3:12, Alex Dean <al...@snowplowanalytics.com> написал(а):
Reply all
Reply to author
Forward
0 new messages