Issues with EMR after upgrade to Snowplow 63

dani...@simplybusiness.co.uk

unread,

Apr 15, 2015, 11:11:33 AM4/15/15

to snowpl...@googlegroups.com

Hello all,
Today we've upgraded our Snowplow installation from v60 to v63 (Red-Cheeked Cordon-Bleu) and we've tried to reprocess all our data. But we've start seeing failed task attemps in the job tracker until the job (enrich step) has failed. We've also noticed that the job was running much slower: usually it runs in 20min and this time it died after 50min and only about 50% of the mappers completed. We've tried to repeat the process 3 times in case that was an Amazon issue, all with the same result. Then we downgraded to v60 and the reprocessing finished normally.

The stack traces of the failed attempts were all the same:

java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 137.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

Searching online this stack trace it seems that Linux is killing the JVM because is using too much memory. Have you found this issue too? What could have caused this increase in memory? Any insights are highly appreciated.

Thanks,
Dani

dani...@simplybusiness.co.uk

unread,

Apr 15, 2015, 11:19:41 AM4/15/15

to snowpl...@googlegroups.com

Forgot to mention that the instance type we're using is c3.xlarge.

Alex Dean

unread,

Apr 15, 2015, 11:26:41 AM4/15/15

to snowpl...@googlegroups.com

Hi Dani,

Did you upgrade from Scala Hadoop Enrich 0.12.0 to 0.14.0? I suspect the problem is somehow related to 0.14.0 - we have large customers on 0.13.0 and haven't seen any issues.

There's a lot of new functionality in Scala Hadoop Enrich 0.14.0! Let's try and narrow things down:

Have you enabled any of the new enrichments?
Have you changed anything else (e.g. sending new event types or cross-domain-linking)?

Thanks,

Alex

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

dani...@simplybusiness.co.uk

unread,

Apr 15, 2015, 11:38:57 AM4/15/15

to snowpl...@googlegroups.com

Hi Alex,
Yes, we upgraded from 0.12.0 to 0.14.0. The only enrichment we have enabled is the new user agent parsing, we'll try to rerun it again (in a few days time) with this enrichment disabled to see if that makes a difference. Now that you say it, it might well be that all those new regexes add a lot of memory. Also, I was checking its implementation and maybe creating a new parser per event (which parses the yaml file and loads all the regexes) might also be adding pressure to the GC?

Thanks for the fast answer!
Dani

Alex Dean

unread,

Apr 15, 2015, 11:49:55 AM4/15/15

to snowpl...@googlegroups.com

Ouch - that will be it. New Parser per event is not sane - shame we didn't pick that up in code review. We will get that fixed in the next release!

https://github.com/snowplow/snowplow/issues/1616

A

dani...@simplybusiness.co.uk

unread,

Apr 15, 2015, 11:54:18 AM4/15/15

to snowpl...@googlegroups.com

Thanks Alex, I'll let you know how it goes when we try to reprocess again with this enrichment disabled.

Dani

Alex Dean

unread,

Apr 15, 2015, 11:59:17 AM4/15/15

to snowpl...@googlegroups.com

Cheers Dani,

The next release should be out this week so hopefully you will be able to complete the upgrade shortly...

A

Robert Kingston

unread,

Apr 15, 2015, 9:48:10 PM4/15/15

to snowpl...@googlegroups.com

Hi guys,

We're encountering similar problems on r62 (Scala Hadoop Enrich 0.13.0). After nearly 2 hours the job died at Enrich Raw Events (2x m1.small and 1x m3.xlarge).

Enrichment worked fine until enabling campaign attribution. Stderr output is:

Exception in thread "main" cascading.flow.FlowException: step failed: (1/2) .../snowplow/enriched-events, with job id: job_201504152123_0002, please see cluster logs for failure messages
	at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:221)
	at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
	at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
	at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)

Not sure if this helps in any way. Would be more than happy to share additional details.

Cheers,

Rob

Alex Dean

unread,

Apr 16, 2015, 3:27:30 AM4/16/15

to snowpl...@googlegroups.com

Hi Rob,

Hmm - we are running 0.13.0 with the campaign_attribution enabled for many customers and haven't seen any issues. So I'm not sure it's the same problem...

Could you start a new thread with some diagnostics?

Cheers,

Alex

Robert Kingston

unread,

Apr 16, 2015, 11:51:42 PM4/16/15

to snowpl...@googlegroups.com

Based on further tests, I believe it is different. Posting details in another thread.

dani...@simplybusiness.co.uk

unread,

Apr 20, 2015, 10:17:12 AM4/20/15

to snowpl...@googlegroups.com

Hi again,
I've tried again the reprocessing with the new user agent parsing disabled and it worked as it used! Incidentally, I've found another issue in JsonUtils.stripInstanceEtc(): in some case the message was null and this method was causing a NPE. Just changing `message` with `Option(message).getOrElse("")` fixed it. If you're interested, I can try to find out which exception caused the message to be null.

Regards,
Dani

Alex Dean

unread,

Apr 20, 2015, 10:19:17 AM4/20/15

to snowpl...@googlegroups.com

Hey Dani,

Ah good to hear about the fix. Yep we spotted that same NPE through Rob's thread late last week:

https://groups.google.com/forum/#!topic/snowplow-user/JuNn4BIz_EM

Cheers,

Alex

dani...@simplybusiness.co.uk

unread,

Apr 20, 2015, 10:21:55 AM4/20/15

to snowpl...@googlegroups.com

Oh, I missed that one. Good to know that you're already on it :)

Thanks!
Dani

Reply all

Reply to author

Forward