Hi Snowplowers,
I'm trying to setup Snowplow with Kinesis, first getting the collector running. I took the following steps:
./snowplow-stream-collector-0.5.0 --config collector.conf
15:20:06.429 [scala-stream-collector-akka.actor.default-dispatcher-2] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
15:20:06.443 [scala-stream-collector-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started
15:20:06.443 [scala-stream-collector-akka.actor.default-dispatcher-2] DEBUG akka.event.EventStream - Default Loggers started
15:20:06.451 [main] INFO c.s.s.c.s.sinks.KinesisSink - Creating thread pool of size 10
Feb 25, 2016 3:20:07 PM com.amazonaws.http.AmazonHttpClient executeHelper
INFO: Unable to execute HTTP request: kinesis.us-west-2b.amazonaws.com: Name or service not known
java.net.UnknownHostException: kinesis.us-west-2b.amazonaws.com: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1314)
at java.net.InetAddress.getAllByName0(InetAddress.java:1267)
at java.net.InetAddress.getAllByName(InetAddress.java:1183)
at java.net.InetAddress.getAllByName(InetAddress.java:1119)
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:44)
This is my config file:
collector {
# The collector runs as a web service specified on the following
# interface and port.
interface = "0.0.0.0"
port = 80
# Production mode disables additional services helpful for configuring and
# initializing the collector, such as a path '/dump' to view all
# records stored in the current stream.
production = true
# Configure the P3P policy header.
p3p {
policyref = "/w3c/p3p.xml"
CP = "NOI DSP COR NID PSA OUR IND COM NAV STA"
}
# The collector returns a cookie to clients for user identification
# with the following domain and expiration.
cookie {
# Set to 0 to disable the cookie
expiration = 365 days
# The domain is optional and will make the cookie accessible to other
# applications on the domain. Comment out this line to tie cookies to
# the collector's full domain
# domain = ""
}
sink {
# Sinks currently supported are:
# 'kinesis' for writing Thrift-serialized records to a Kinesis stream
# 'stdout' for writing Base64-encoded Thrift-serialized records to stdout
# Recommended settings for 'stdout' so each line printed to stdout
# is a serialized record are:
# 1. Setting 'akka.loglevel = OFF' and 'akka.loggers = []'
# to disable all logging.
# 2. Using 'sbt assembly' and 'java -jar ...' to disable
# sbt logging.
enabled = "kinesis"
kinesis {
thread-pool-size: 10 # Thread pool size for Kinesis API requests
# The following are used to authenticate for the Amazon Kinesis sink.
#
# If both are set to 'cpf', a properties file on the classpath is used.
#
# If both are set to 'iam', use AWS IAM Roles to provision credentials.
#
# If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
aws {
access-key: "xxxxxxxxxxxxx"
secret-key: "xxxxxxxxxxxxx"
}
# Data will be stored in the following stream.
stream {
region: "us-west-2b"
good: "collector.stream.good"
bad: "collector.stream.bad"
}
# Minimum and maximum backoff periods
backoffPolicy: {
minBackoff: 3000 # 3 seconds
maxBackoff: 600000 # 5 minutes
}
# Incoming events are stored in a buffer before being sent to Kinesis.
# The buffer is emptied whenever:
# - the number of stored records reaches record-limit or
# - the combined size of the stored records reaches byte-limit or
# - the time in milliseconds since the buffer was last emptied reaches time-limit
buffer {
byte-limit: 500000
record-limit: 1000
time-limit: 10000
}
}
}
}
# Akka has a variety of possible configuration options defined at
# http://doc.akka.io/docs/akka/2.2.3/general/configuration.html.
akka {
loglevel = DEBUG # 'OFF' for no logging, 'DEBUG' for all logging.
loggers = ["akka.event.slf4j.Slf4jLogger"]
}
# spray-can is the server the Stream collector uses and has configurable
# options defined at
# https://github.com/spray/spray/blob/master/spray-can/src/main/resources/reference.conf
spray.can.server {
# To obtain the hostname in the collector, the 'remote-address' header
# should be set. By default, this is disabled, and enabling it
# adds the 'Remote-Address' header to every request automatically.
remote-address-header = on
uri-parsing-mode = relaxed
raw-request-uri-header = on
# Define the maximum request length (the default is 2048)
parsing {
max-uri-length = 32768
}
}
What am I missing here?
Hey Joao,
"us-west-2b" isn't a region - can you try "us-west-2"?
Cheers,
Alex
The availability zone in which the instance is located. Availability Zones are distinct locations within a region that are engineered to be insulated from failures in other Availability Zones.