Setting up Kinesis Collector

292 views
Skip to the first unread message

Joao Correia

unread,
25 Feb 2016, 18:21:2125/02/2016
to Snowplow

Hi Snowplowers,


I'm trying to setup Snowplow with Kinesis, first getting the collector running. I took the following steps:

  1. Install the collector binary
    https://github.com/snowplow/snowplow/wiki/Install-the-Scala-Stream-Collector

  2. Get the sample config file and customize it

  3. Run it  ./snowplow-stream-collector-0.5.0 --config collector.conf

I'm getting an error: 

./snowplow-stream-collector-0.5.0 --config collector.conf

15:20:06.429 [scala-stream-collector-akka.actor.default-dispatcher-2] INFO  akka.event.slf4j.Slf4jLogger - Slf4jLogger started

15:20:06.443 [scala-stream-collector-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started

15:20:06.443 [scala-stream-collector-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - Default Loggers started

15:20:06.451 [main] INFO  c.s.s.c.s.sinks.KinesisSink - Creating thread pool of size 10

Feb 25, 2016 3:20:07 PM com.amazonaws.http.AmazonHttpClient executeHelper

INFO: Unable to execute HTTP request: kinesis.us-west-2b.amazonaws.com: Name or service not known

java.net.UnknownHostException: kinesis.us-west-2b.amazonaws.com: Name or service not known

at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)

at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1314)

at java.net.InetAddress.getAllByName0(InetAddress.java:1267)

at java.net.InetAddress.getAllByName(InetAddress.java:1183)

at java.net.InetAddress.getAllByName(InetAddress.java:1119)

at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:44)


This is my config file:


collector {

  # The collector runs as a web service specified on the following   

  # interface and port.

  interface = "0.0.0.0"

  port = 80


 # Production mode disables additional services helpful for configuring and

  # initializing the collector, such as a path '/dump' to view all

  # records stored in the current stream.

  production = true

   

  # Configure the P3P policy header.

  p3p {

    policyref = "/w3c/p3p.xml"

    CP = "NOI DSP COR NID PSA OUR IND COM NAV STA"

  }


  # The collector returns a cookie to clients for user identification

  # with the following domain and expiration.

  cookie {

    # Set to 0 to disable the cookie

    expiration = 365 days

    # The domain is optional and will make the cookie accessible to other

    # applications on the domain. Comment out this line to tie cookies to

    # the collector's full domain

    # domain = ""

  }


  sink {

    # Sinks currently supported are:

    # 'kinesis' for writing Thrift-serialized records to a Kinesis stream

    # 'stdout' for writing Base64-encoded Thrift-serialized records to stdout

    #    Recommended settings for 'stdout' so each line printed to stdout

    #    is a serialized record are:

    #      1. Setting 'akka.loglevel = OFF' and 'akka.loggers = []'

    #         to disable all logging.

    #      2. Using 'sbt assembly' and 'java -jar ...' to disable

    #         sbt logging.

    enabled = "kinesis"   


    kinesis {

      thread-pool-size: 10 # Thread pool size for Kinesis API requests


      # The following are used to authenticate for the Amazon Kinesis sink.

      #

      # If both are set to 'cpf', a properties file on the classpath is used.

      # http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/ClasspathPropertiesFileCredentialsProvider

      #

      # If both are set to 'iam', use AWS IAM Roles to provision credentials.

      #

      # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

      aws {

        access-key: "xxxxxxxxxxxxx"

        secret-key: "xxxxxxxxxxxxx"

      }


      # Data will be stored in the following stream.

      stream {

        region: "us-west-2b"

        good: "collector.stream.good"

        bad: "collector.stream.bad"  

      }


      # Minimum and maximum backoff periods

      backoffPolicy: {

        minBackoff: 3000 # 3 seconds

        maxBackoff: 600000 # 5 minutes

      }


      # Incoming events are stored in a buffer before being sent to Kinesis.

      # The buffer is emptied whenever:

      # - the number of stored records reaches record-limit or

      # - the combined size of the stored records reaches byte-limit or

      # - the time in milliseconds since the buffer was last emptied reaches time-limit

      buffer {

        byte-limit: 500000

        record-limit: 1000

        time-limit: 10000 

      }

    }  

  }    

}      

       

# Akka has a variety of possible configuration options defined at

# http://doc.akka.io/docs/akka/2.2.3/general/configuration.html

akka {

  loglevel = DEBUG # 'OFF' for no logging, 'DEBUG' for all logging.

  loggers = ["akka.event.slf4j.Slf4jLogger"]

}

 

# spray-can is the server the Stream collector uses and has configurable

# options defined at

# https://github.com/spray/spray/blob/master/spray-can/src/main/resources/reference.conf

spray.can.server {

  # To obtain the hostname in the collector, the 'remote-address' header

  # should be set. By default, this is disabled, and enabling it

  # adds the 'Remote-Address' header to every request automatically.

  remote-address-header = on


  uri-parsing-mode = relaxed

  raw-request-uri-header = on


  # Define the maximum request length (the default is 2048)

  parsing {

    max-uri-length = 32768

  }

}  

   


What am I missing here?




Alex Dean

unread,
25 Feb 2016, 18:25:2825/02/2016
to snowpl...@googlegroups.com

Hey Joao,

"us-west-2b" isn't a region - can you try "us-west-2"?

Cheers,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joao Correia

unread,
25 Feb 2016, 18:30:5725/02/2016
to Snowplow
region: "us-west-2b" doesn't work

region: "us-west-2" works

Ihor Tomilenko

unread,
25 Feb 2016, 18:54:4725/02/2016
to Snowplow
Hi Joao,

Glad it worked out for you.

Looks like this confusion is common. When a letter is added at the end of a region it becomes (what Amazon call) an "availability zone".

The availability zone in which the instance is located. Availability Zones are distinct locations within a region that are engineered to be insulated from failures in other Availability Zones.

I guess we might need to make it more clear to avoid confusion between a "region" and its "availability zone".

Regards,
Ihor

Joao Correia

unread,
25 Feb 2016, 21:01:0825/02/2016
to Snowplow
Thanks!
Reply all
Reply to author
Forward
0 new messages