Setting up Kinesis Collector

Joao Correia

unread,

25 Feb 2016, 18:21:2125/02/2016

to Snowplow

Hi Snowplowers,

I'm trying to setup Snowplow with Kinesis, first getting the collector running. I took the following steps:

Install the collector binary
https://github.com/snowplow/snowplow/wiki/Install-the-Scala-Stream-Collector
Get the sample config file and customize it
Run it ./snowplow-stream-collector-0.5.0 --config collector.conf

I'm getting an error:

./snowplow-stream-collector-0.5.0 --config collector.conf

15:20:06.429 [scala-stream-collector-akka.actor.default-dispatcher-2] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started

15:20:06.443 [scala-stream-collector-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started

15:20:06.443 [scala-stream-collector-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - Default Loggers started

15:20:06.451 [main] INFO c.s.s.c.s.sinks.KinesisSink - Creating thread pool of size 10

Feb 25, 2016 3:20:07 PM com.amazonaws.http.AmazonHttpClient executeHelper

INFO: Unable to execute HTTP request: kinesis.us-west-2b.amazonaws.com: Name or service not known

java.net.UnknownHostException: kinesis.us-west-2b.amazonaws.com: Name or service not known

at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)

at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1314)

at java.net.InetAddress.getAllByName0(InetAddress.java:1267)

at java.net.InetAddress.getAllByName(InetAddress.java:1183)

at java.net.InetAddress.getAllByName(InetAddress.java:1119)

at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:44)

This is my config file:

collector {

# The collector runs as a web service specified on the following

# interface and port.

interface = "0.0.0.0"

port = 80

# Production mode disables additional services helpful for configuring and

# initializing the collector, such as a path '/dump' to view all

# records stored in the current stream.

production = true

# Configure the P3P policy header.

p3p {

policyref = "/w3c/p3p.xml"

CP = "NOI DSP COR NID PSA OUR IND COM NAV STA"

}

# The collector returns a cookie to clients for user identification

# with the following domain and expiration.

cookie {

# Set to 0 to disable the cookie

expiration = 365 days

# The domain is optional and will make the cookie accessible to other

# applications on the domain. Comment out this line to tie cookies to

# the collector's full domain

# domain = ""

}

sink {

# Sinks currently supported are:

# 'kinesis' for writing Thrift-serialized records to a Kinesis stream

# 'stdout' for writing Base64-encoded Thrift-serialized records to stdout

# Recommended settings for 'stdout' so each line printed to stdout

# is a serialized record are:

# 1. Setting 'akka.loglevel = OFF' and 'akka.loggers = []'

# to disable all logging.

# 2. Using 'sbt assembly' and 'java -jar ...' to disable

# sbt logging.

enabled = "kinesis"

kinesis {

thread-pool-size: 10 # Thread pool size for Kinesis API requests

# The following are used to authenticate for the Amazon Kinesis sink.

#

# If both are set to 'cpf', a properties file on the classpath is used.

# http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/ClasspathPropertiesFileCredentialsProvider

#

# If both are set to 'iam', use AWS IAM Roles to provision credentials.

#

# If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

aws {

access-key: "xxxxxxxxxxxxx"

secret-key: "xxxxxxxxxxxxx"

}

# Data will be stored in the following stream.

stream {

region: "us-west-2b"

good: "collector.stream.good"

bad: "collector.stream.bad"

}

# Minimum and maximum backoff periods

backoffPolicy: {

minBackoff: 3000 # 3 seconds

maxBackoff: 600000 # 5 minutes

}

# Incoming events are stored in a buffer before being sent to Kinesis.

# The buffer is emptied whenever:

# - the number of stored records reaches record-limit or

# - the combined size of the stored records reaches byte-limit or

# - the time in milliseconds since the buffer was last emptied reaches time-limit

buffer {

byte-limit: 500000

record-limit: 1000

time-limit: 10000

}

# Akka has a variety of possible configuration options defined at

# http://doc.akka.io/docs/akka/2.2.3/general/configuration.html.

akka {

loglevel = DEBUG # 'OFF' for no logging, 'DEBUG' for all logging.

loggers = ["akka.event.slf4j.Slf4jLogger"]

}

# spray-can is the server the Stream collector uses and has configurable

# options defined at

# https://github.com/spray/spray/blob/master/spray-can/src/main/resources/reference.conf

spray.can.server {

# To obtain the hostname in the collector, the 'remote-address' header

# should be set. By default, this is disabled, and enabling it

# adds the 'Remote-Address' header to every request automatically.

remote-address-header = on

uri-parsing-mode = relaxed

raw-request-uri-header = on

# Define the maximum request length (the default is 2048)

parsing {

max-uri-length = 32768

}

What am I missing here?

Alex Dean

unread,

25 Feb 2016, 18:25:2825/02/2016

to snowpl...@googlegroups.com

Hey Joao,

"us-west-2b" isn't a region - can you try "us-west-2"?

Cheers,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joao Correia

unread,

25 Feb 2016, 18:30:5725/02/2016

to Snowplow

region: "us-west-2b" doesn't work

region: "us-west-2" works

Ihor Tomilenko

unread,

25 Feb 2016, 18:54:4725/02/2016

to Snowplow

Hi Joao,

Glad it worked out for you.

Looks like this confusion is common. When a letter is added at the end of a region it becomes (what Amazon call) an "availability zone".

The availability zone in which the instance is located. Availability Zones are distinct locations within a region that are engineered to be insulated from failures in other Availability Zones.

I guess we might need to make it more clear to avoid confusion between a "region" and its "availability zone".

Regards,

Ihor

Joao Correia

unread,

25 Feb 2016, 21:01:0825/02/2016

to Snowplow

Thanks!

Reply all

Reply to author

Forward