enrichment of unstruct events

34 views
Skip to first unread message

Gil Danziger

unread,
Mar 31, 2016, 9:25:11 AM3/31/16
to Snowplow
Hey,

I wonder how is it possible to add custom enrichment code to my unstruct events?
Assuming the event is already processed by the pipeline, I'd like to now perform action is (de)normalizing fields or any other custom enrichment style code.

 I'm totally fine with compiling my version of scala-common-enrich for this purpose but couldn't find the appropriate place in the code to perform such operations.

Best,
Gil.

Anton Parkhomenko

unread,
Mar 31, 2016, 9:37:35 AM3/31/16
to snowpl...@googlegroups.com
Hello Gil,

Seems you have following options:
1) Use JavaScript enrichment. Very powerful, but very dangerous and not safe way. But this is the only one available out of the box option.
2) Wait for r79, where new API Request Enrichment will be available allowing you to send HTTP requests to your own REST-server. This one is declarative, much safer than JS Enrichment, but not so powerful and not available at the moment.
3) If you’re really okay with making your own version of SCE, I advice you to take a look at commits implementing recent enrichments: weathercookie extractorapi request. These should give you a basic idea how we’re implementing enrichments, but feel free to ask about details.

Best regards,
Anton

31 марта 2016 г., в 16:25, Gil Danziger <gil....@gmail.com> написал(а):

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gil Danziger

unread,
Mar 31, 2016, 9:56:31 AM3/31/16
to Snowplow
Thanks.

That's basically what I had in mind.
My main problem (well, except that it's not safe) with the JS enrichment is that it's creating a new context which results in a new table.

I'm looking for a way to just extend the unstruct event table with the enriched fields, other than creating a new context/table, in the same style that snowplow enrichments are working, so I guess option #3 is the correct way.

Looking at existing enrichments (e.g weather), since I'm looking for changing fields value of my unstruct event, is the current enrichment process extracting the unstruct event in a style that can allow me to change them? from what I've seen the EnrichmnetManager is changing the events (main) table directly but not really handling the unstruct event data.

Thanks,
Gil.

Anton Parkhomenko

unread,
Mar 31, 2016, 10:57:18 AM3/31/16
to snowpl...@googlegroups.com
Hey Gil,

First of all, EnrichedEvent is mutable object (it has vars inside, not vals). And theoretically, when we’re passing EnrichedEvent object to JS Enrichment we can mutate it (and unstruct_event property is a String property with JSON where your event is stored). EnrichmentManager mutating EnrichedEvent’s properties this way. But to say very least this destructive update is not what we advice you to do even as a last resort, as it goes against core idea of non-destructive enrichment as middle JOIN (as opposed eager JOIN in tracker and late JOIN with Redshift). You inevitable will end up with full enriched/bad bucket and unsupported version of SCE. So, it probably would be better to try to reconsider your requirements if it is possible.

If you still want to do this (with your own enrichment), you may find these changes I made in upcoming r79 useful. Using it you can extract JsonNode with your unstruct event data, modify it and set back using event.setUnstruct_event().

Despite all this strong discouragement, can I ask (just out of curiosity), what exactly use case led you to idea of mutating unstruct event during enrichment?

Cheers,
Anton

31 марта 2016 г., в 16:56, Gil Danziger <gil....@gmail.com> написал(а):

Gil Danziger

unread,
Mar 31, 2016, 11:51:19 AM3/31/16
to snowpl...@googlegroups.com
Thanks for the detailed explanation! It's all very clear now.

I agree that when doing a typical enrichment (weather, geo location data, etc) you don't wanna mess with your original entity, it's just a bad practice.

I'm more interested in a process which is denormalizing data (e.g transferring code fields to strings) or decoding, also in some of the cases to copy few fields from another object. They can of course be represented in a separate context / table but it's not the natural way to represent the data, especially since the extra field aren't really grouped into a single context.
Those extra fields are unknown during (or hidden from) the event tracking stage, it's related to the source tracking the event.

I guess a more correct design for my case would be some kind of a pipeline which collects the event into a staging stream, then transferring as needed and storing them into the stream processed by the enrichment part. 

But since the collector is and the enricher are using the thrift event format I found and are easily connectable I found the 
 the enrichment process as the best / simplest place to fetch and use them, also in terms of performance and throughput.

Best,
Gil.

--
You received this message because you are subscribed to a topic in the Google Groups "Snowplow" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/snowplow-user/lGEYWsZAgzI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to snowplow-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages