Fwd: Snowplow and Kinesis

349 views
Skip to first unread message

Alexandre Santos

unread,
Feb 10, 2014, 8:57:18 AM2/10/14
to snowplow-user
Hi,
I am trying to set up Snowplow Analytics with Kinesis. I have the scala collector running in an EC2 instance, it seems to create and connect the new stream fine. When I test it with the JS tracker, it sends the HTTP request and the server responds with a pixel. All seems ok, except that I don't see any other reaction (the collector console doesn't show any message, and I can't see any request in the graphs of my Kinesis stream).
Is this the correct behavior or am I doing something wrong? I can share more details about my setup.

Thank you,
Alex

Alex Dean

unread,
Feb 10, 2014, 9:02:24 AM2/10/14
to snowpl...@googlegroups.com
Hi Alex

That sounds fine. It does take a little while for the requests to filter through to the graphs in the Kinesis UI - is there still nothing? I would recommend going ahead and setting up the Scala Kinesis Enrich, and configuring it to source from your raw stream, and sink to stdout - so you can see the events as they come in...

Let us know if that works!

Cheers,

Alex


--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

alex...@yieldify.com

unread,
Feb 11, 2014, 12:06:25 PM2/11/14
to snowpl...@googlegroups.com
Hi Alex,

You're right, there was just a delay on updating the graphs, but it seems to be working fine.
I haven't found information on installing the StorageLoader using only Kinesis. (It seems like the guide only covers EMR, or I might be just overlooking something). Is there any documentation about this topic?

Thank you

Alex Dean

unread,
Feb 11, 2014, 12:16:50 PM2/11/14
to snowpl...@googlegroups.com
Hi Alex,

Ah that's good to hear. Unfortunately the StorageLoader is EMR-flow only. If you jump to this diagram:

http://snowplowanalytics.com/blog/2014/02/04/snowplow-0.9.0-released-with-beta-kinesis-support/#overview

the equivalent to the StorageLoader for Kinesis will be "Redshift drip-feeder Kinesis app". Alack that box is in grey as it's not yet been built yet. We will likely build this leveraging https://github.com/awslabs/amazon-kinesis-connectors if that helps.

Cheers,

Alex

Matteo Centenaro

unread,
Apr 22, 2014, 11:18:02 AM4/22/14
to snowpl...@googlegroups.com
Hi there,


On Tuesday, February 11, 2014 6:16:50 PM UTC+1, Alex Dean wrote:
Hi Alex,

Ah that's good to hear. Unfortunately the StorageLoader is EMR-flow only. If you jump to this diagram:

http://snowplowanalytics.com/blog/2014/02/04/snowplow-0.9.0-released-with-beta-kinesis-support/#overview

the equivalent to the StorageLoader for Kinesis will be "Redshift drip-feeder Kinesis app". Alack that box is in grey as it's not yet been built yet. We will likely build this leveraging https://github.com/awslabs/amazon-kinesis-connectors if that helps.

Is there any news about support for kinesis in the StorageLoader?

Cheers,
matteo.

Alex Dean

unread,
Apr 23, 2014, 8:00:39 AM4/23/14
to snowpl...@googlegroups.com
Hi Matteo,


Is there any news about support for kinesis in the StorageLoader?

Not currently. Are you in interested in helping out? Phil Kallos (pkallos) who has done a ton of awesome work on the Kinesis flow may take a look at Kinesis -> Redshift soonish. Let me know if you are keen to help out.

Cheers,

Alex


--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bugant

unread,
Apr 23, 2014, 3:00:42 PM4/23/14
to snowpl...@googlegroups.com, Giovanni Cappellotto
Hi Alex,

On Wed, Apr 23, 2014 at 2:00 PM, Alex Dean <al...@snowplowanalytics.com> wrote:
> Is there any news about support for kinesis in the StorageLoader?
>
> Not currently. Are you in interested in helping out? Phil Kallos (pkallos)
> who has done a ton of awesome work on the Kinesis flow may take a look at
> Kinesis -> Redshift soonish. Let me know if you are keen to help out.

I would be really happy to help but I have no knowledge of Scala.

Today me and Giovanni (who reads in CC) have been able to have a
working example tweaking the Java code from awslab
https://github.com/awslabs/amazon-kinesis-connectors

Here is what we did:

1. read from the output stream written by the Scala Kinesis Enrich
(http://d2io1hx8u877l0.cloudfront.net/3-enrich/scala-kinesis-enrich/snowplow-kinesis-enrich-0.1.0)

2. put the records on S3 files

3. write a manifest file to S3

4. publish to a kinesis stream the name of the file

5. read from the above stream (via another kinesis app), take the file
from S3 and load records to Redshift

We exepect that the output form the Enrich is in the Canonical Output format.

It's all fine, except that the last step fail. Indeed there seems that
data produced from the Enrich block mismatch with the table structure
on Redshift: more precisely, the page_title field is where Redshift
expects to read geo_longitude; so it finds a String where a Double is
expected.

Do you have any hint? We used the jar file to run the Enrich step, is
there any possibility that, the jar is somewhat old? Do you have any
other idea?

Let us know if we can help in some way... We'd also like to learn a
bit of Scala :) but it will require some time.

Cheers,
matteo.

Alex Dean

unread,
Apr 23, 2014, 4:41:52 PM4/23/14
to snowpl...@googlegroups.com, Giovanni Cappellotto
Hi Matteo,

Ah - you were very, very close! The problem is probably that you were using the latest version of the Redshift atomic.events table, but actually snowplow-kinesis-enrich 0.1.0 is still outputting the previous version of the atomic.events definition. We'll update that in the next snowplow-kinesis-enrich release. For now this should be the correct version:

https://github.com/snowplow/snowplow/blob/0.9.0/4-storage/redshift-storage/sql/atomic-def.sql

Let me know if that works - then I look forward to catching up on next steps!

A



Cheers,
matteo.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bugant

unread,
Apr 23, 2014, 4:46:12 PM4/23/14
to snowpl...@googlegroups.com, Giovanni Cappellotto
Hi Alex,

On Wed, Apr 23, 2014 at 10:41 PM, Alex Dean <al...@snowplowanalytics.com> wrote:
> Ah - you were very, very close! The problem is probably that you were using
> the latest version of the Redshift atomic.events table, but actually
> snowplow-kinesis-enrich 0.1.0 is still outputting the previous version of
> the atomic.events definition. We'll update that in the next
> snowplow-kinesis-enrich release. For now this should be the correct version:
>
> https://github.com/snowplow/snowplow/blob/0.9.0/4-storage/redshift-storage/sql/atomic-def.sql
>
> Let me know if that works - then I look forward to catching up on next
> steps!

Thank you very much for the quick feedback! We'll have it a try
tomorrow morning (we're in Italy :) and let you know.
Do you think we can contribute to snowplow with the Java code?

Cheers,
matteo.

Alex Dean

unread,
Apr 23, 2014, 4:51:02 PM4/23/14
to snowpl...@googlegroups.com
Hi Matteo,


Do you think we can contribute to snowplow with the Java code?

We've prototyped things before in Java before porting them across into Scala, so I don't see why not! It might be worth opening a pull request into snowplow/snowplow with your work tweaking AWS' Java connector as a first step? I can then review, as can Phil... Our CLA for contributors is here:

https://github.com/snowplow/snowplow/wiki/CLA

Cheers,

Alex




Cheers,
matteo.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Simon Rumble

unread,
Apr 23, 2014, 5:42:29 PM4/23/14
to snowpl...@googlegroups.com

Wow this is very exciting. Nice work Matteo and Giovanni!

bugant

unread,
Apr 24, 2014, 8:36:08 AM4/24/14
to snowpl...@googlegroups.com
Hi Alex,

On Wed, Apr 23, 2014 at 10:51 PM, Alex Dean <al...@snowplowanalytics.com> wrote:
> Hi Matteo,
>
>
> Do you think we can contribute to snowplow with the Java code?
>
> We've prototyped things before in Java before porting them across into
> Scala, so I don't see why not! It might be worth opening a pull request into
> snowplow/snowplow with your work tweaking AWS' Java connector as a first
> step? I can then review, as can Phil... Our CLA for contributors is here:
>
> https://github.com/snowplow/snowplow/wiki/CLA

For sure, this is OK!
We've just tested against the old schema you pointed out and it worked!

Now, we are going to clean-up the code and send a PR shortly (maybe
today, or at the beginning of the next week)

ciao ciao,
matteo.

Alex Dean

unread,
Apr 24, 2014, 8:41:17 AM4/24/14
to snowpl...@googlegroups.com
Hi Matteo,

Awesome! Look forward to the PR...

Cheers,

Alex


matteo.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bugant

unread,
Apr 24, 2014, 9:37:43 AM4/24/14
to snowpl...@googlegroups.com
Alex,

On Thu, Apr 24, 2014 at 2:41 PM, Alex Dean <al...@snowplowanalytics.com> wrote:
> Awesome! Look forward to the PR...

Is there any issue in adding code from the amazon-kinesis-connectors
project inside snowplow? Here is the license:
https://github.com/awslabs/amazon-kinesis-connectors/blob/master/LICENSE.txt

I'm not a license expert so we need an advice on this.
We plan to fork snowplow and add a storage implementation which will
include parts of the above mentioned project. Is that OK for you?


ciao ciao,
matteo.

Alex Dean

unread,
Apr 24, 2014, 9:51:09 AM4/24/14
to snowpl...@googlegroups.com
Hi Matteo,

Thanks for raising this. The Spark project has been going through the exact same issue recently: https://issues.apache.org/jira/browse/LEGAL-198

In the case of Spark, they were using the Kinesis Client Library (KCL) as a dependency - their solution was to make KCL an optional dependency.

In this case, your code is obviously a fork of the AWS Kinesis Connectors code, as well as a KCL dependency. In this case, my preference is to keep your code in a folder:

/4-storage/kinesis-redshift-sink

maintaining the Amazon LICENSE.txt file in that folder (i.e. respecting the existing license). We already have sub-components using alternate licenses when they are based on forks (see e.g. the JS Tracker). As/when we re-build in Scala, we can license the Scala implementation using Apache 2.0.

BTW, Phil is following a similar folder structure (/4-storage/kinesis-s3-sink) in this PR: https://github.com/snowplow/snowplow/pull/546

(Phil's kinesis-s3-sink and your kinesis-redshift-sink are complementary not competing - I can explain the difference in this thread if that's helpful)...

A



ciao ciao,
matteo.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bugant

unread,
Apr 24, 2014, 9:59:04 AM4/24/14
to snowpl...@googlegroups.com
On Thu, Apr 24, 2014 at 3:51 PM, Alex Dean <al...@snowplowanalytics.com> wrote:
> In this case, your code is obviously a fork of the AWS Kinesis Connectors
> code, as well as a KCL dependency. In this case, my preference is to keep
> your code in a folder:
>
> /4-storage/kinesis-redshift-sink
>
> maintaining the Amazon LICENSE.txt file in that folder (i.e. respecting the
> existing license). We already have sub-components using alternate licenses
> when they are based on forks (see e.g. the JS Tracker). As/when we re-build
> in Scala, we can license the Scala implementation using Apache 2.0.

That's perfect! We'll go for it.

ciao ciao,
matteo.

bugant

unread,
Apr 28, 2014, 7:04:16 AM4/28/14
to snowpl...@googlegroups.com, Giovanni Cappellotto
Hi there,
Me and Giovanni just sent a PR with the Java Kinesis Apps discussed
above: https://github.com/snowplow/snowplow/pull/689

Hope it helps.

ciao ciao,
matteo.

Alex Dean

unread,
Apr 28, 2014, 7:08:10 AM4/28/14
to snowpl...@googlegroups.com
Nice work guys! Have followed up in thread...

A



ciao ciao,
matteo.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kurt Johnson

unread,
Aug 11, 2014, 3:19:19 AM8/11/14
to snowpl...@googlegroups.com
Hey there,

We're looking at implementing the drip feeder ourselves. Wondering how close you guys are to getting this out? Do you have a branch we can help to work on? (better than reinventing the wheel right?)

Cheers,
Kurt

Alex Dean

unread,
Aug 12, 2014, 7:35:50 AM8/12/14
to snowpl...@googlegroups.com
Hi Kurt,

Once the mobile support is released (0.9.7), then Kinesis will be the next focus for our releases. But yes that's a way off. In the meantime, I have merged Matteo's PR (#689) and Phil's PR (#546) into a dedicated branch:

https://github.com/snowplow/snowplow/tree/feature/kinesis-sinks

That branch is a fork off of the 0.9.7 work, and in any case the sinks are silo'ed into its own folder so should be easy to merge in due course.

Cheers,

Alex
Reply all
Reply to author
Forward
0 new messages