EMR ETL Runner: java.lang.IllegalStateException: Joda-time 2.2 or later version is required, but found version: null

Daniel Zohar

unread,

Jun 5, 2015, 5:00:46 AM6/5/15

to snowpl...@googlegroups.com

Hi guys,

I'm getting this error today when running the enrichment step.


java.lang.IllegalStateException: Joda-time 2.2 or later version is required, but found version: null
 at com.amazonaws.util.DateUtils.handleException(DateUtils.java:147)
 at com.amazonaws.util.DateUtils.parseRFC822Date(DateUtils.java:195)
 at com.amazonaws.services.s3.internal.ServiceUtils.parseRfc822Date(ServiceUtils.java:73)
 at com.amazonaws.services.s3.internal.AbstractS3ResponseHandler.populateObjectMetadata(AbstractS3ResponseHandler.java:115)
 at com.amazonaws.services.s3.internal.S3MetadataResponseHandler.handle(S3MetadataResponseHandler.java:32)
 at com.amazonaws.services.s3.internal.S3MetadataResponseHandler.handle(S3MetadataResponseHandler.java:25)
 at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:974)
 at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:701)
 at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460)
 at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295)
 at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3736)
 at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1027)
 at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1005)
 at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:199)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
 at com.sun.proxy.$Proxy30.retrieveMetadata(Unknown Source)
 at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:743)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1402)
 at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:637)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:910)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:803)
 at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:186)
 at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:133)
 at org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.init(MapTask.java:822)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: java.lang.IllegalArgumentException: Invalid format: "Fri, 05 Jun 2015 08:24:51 GMT" is malformed at "GMT"
 at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:747)
 at com.amazonaws.util.DateUtils.parseRFC822Date(DateUtils.java:193)


Googling, the only thing I found was this http://mail-archives.us.apache.org/mod_mbox/spark-user/201504.mbox/%3CCADRmTZJm3r5+6B6zq2HXqy9...@mail.gmail.com%3E

Where the issues was resolved by moving from AMI version 3.6.0 to 3.5.0. I tried that but still getting the same error.

Any idea?

Alex Dean

unread,

Jun 5, 2015, 5:04:39 AM6/5/15

to snowpl...@googlegroups.com

Which EMR job step is throwing the exception, and how far into its operation?

A

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Zohar

unread,

Jun 5, 2015, 5:20:39 AM6/5/15

to snowpl...@googlegroups.com

Hi Alex,

It fails on "Elasticity Scalding Step: Enrich Raw Events".

It fails 8 minutes into the process. My guess is that it goes through some records but when it stumbles upon a record with GMT in it, it fails.

Do you want me to attach more logs or anything else?

Cheers


Googling, the only thing I found was this http://mail-archives.us.apache.org/mod_mbox/spark-user/201504.mbox/%3CCADRmTZJm3r5+6B6zq2HXqy9xyewSp_47HeH2KQDjS7JtdjCqTw@mail.gmail.com%3E


Where the issues was resolved by moving from AMI version 3.6.0 to 3.5.0. I tried that but still getting the same error.

Any idea?

Alex Dean

unread,

Jun 5, 2015, 5:54:46 AM6/5/15

to snowpl...@googlegroups.com

Hmm,

From the stacktrace your Enrichment process is reading some object or objects from S3 8 minutes into the run... I'd like to reproduce the error with our r66 jobs. Need to figure out what is different between your runs and ours:

Are you running with the --debug flag?
Are you using s3:// paths in the geo-IP enrichment?
Anything else you can think of which is non-standard in your runs?
What did you change between today and yesterday?

Cheers,

Alex


Googling, the only thing I found was this http://mail-archives.us.apache.org/mod_mbox/spark-user/201504.mbox/%3CCADRmTZJm3r5+6B6zq2HXqy9...@mail.gmail.com%3E


Where the issues was resolved by moving from AMI version 3.6.0 to 3.5.0. I tried that but still getting the same error.

Any idea?

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.

To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Co-founder
Snowplow Analytics
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7881 622 925
@alexcrdean

Daniel Zohar

unread,

Jun 5, 2015, 6:04:43 AM6/5/15

to snowpl...@googlegroups.com

1. No, but I can run that now. What am I looking for?

2. Yes, I'm using the ip_lookups.json from the repository.

3. Nope, it was all working fine until today for a few days now

4. Yes. I changed :continue_on_unexpected_error: to true from false, because I was encountering another error. I found this thread where you recommended enabling the error bucket. The logs for that error are below



2015-06-05 07:36:19,947 INFO [IPC Server handler 33 on 35925] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1433489536106_0003_m_000010_0: Error: cascading.pipe.OperatorException: [com.snowplowanalytics....][com.twitter.scalding.RichPipe.each(RichPipe.scala:471)] operator Each failed executing operation
 at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:107)
 at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
 at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80)
 at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
 at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
 at com.twitter.scalding.FlatMapFunction$$anonfun$operate$2.apply(Operations.scala:48)
 at com.twitter.scalding.FlatMapFunction$$anonfun$operate$2.apply(Operations.scala:46)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at com.twitter.scalding.FlatMapFunction.operate(Operations.scala:46)
 at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
 at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
 at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
 at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
 at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)


 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)

Caused by: java.lang.NullPointerException
 at com.snowplowanalytics.snowplow.enrich.common.utils.JsonUtils$.stripInstanceEtc(JsonUtils.scala:240)
 at com.snowplowanalytics.snowplow.enrich.common.utils.JsonUtils$.extractJson(JsonUtils.scala:204)
 at com.snowplowanalytics.snowplow.enrich.common.utils.JsonUtils$.validateAndReformatJson(JsonUtils.scala:189)
 at com.snowplowanalytics.snowplow.enrich.common.utils.JsonUtils$$anonfun$1.apply(JsonUtils.scala:59)
 at com.snowplowanalytics.snowplow.enrich.common.utils.JsonUtils$$anonfun$1.apply(JsonUtils.scala:58)
 at com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$$anonfun$4.apply(EnrichmentManager.scala:103)
 at com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$$anonfun$4.apply(EnrichmentManager.scala:103)
 at com.snowplowanalytics.snowplow.enrich.common.utils.MapTransformer$$anonfun$1.apply(MapTransformer.scala:158)
 at com.snowplowanalytics.snowplow.enrich.common.utils.MapTransformer$$anonfun$1.apply(MapTransformer.scala:155)
 at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
 at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
 at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at com.snowplowanalytics.snowplow.enrich.common.utils.MapTransformer$.com$snowplowanalytics$snowplow$enrich$common$utils$MapTransformer$$_transform(MapTransformer.scala:155)
 at com.snowplowanalytics.snowplow.enrich.common.utils.MapTransformer$TransformableClass.transform(MapTransformer.scala:132)
 at com.snowplowanalytics.snowplow.enrich.common.enrichments.EnrichmentManager$.enrichEvent(EnrichmentManager.scala:193)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(EtlPipeline.scala:81)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2$$anonfun$apply$3.apply(EtlPipeline.scala:80)
 at scalaz.NonEmptyList$class.map(NonEmptyList.scala:29)
 at scalaz.NonEmptyListFunctions$$anon$4.map(NonEmptyList.scala:164)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(EtlPipeline.scala:80)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(EtlPipeline.scala:78)
 at scalaz.Validation$class.map(Validation.scala:114)
 at scalaz.Success.map(Validation.scala:329)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1.apply(EtlPipeline.scala:78)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1$$anonfun$apply$1.apply(EtlPipeline.scala:76)
 at scala.Option.map(Option.scala:145)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1.apply(EtlPipeline.scala:76)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$$anonfun$1.apply(EtlPipeline.scala:74)
 at scalaz.Validation$class.map(Validation.scala:114)
 at scalaz.Success.map(Validation.scala:329)
 at com.snowplowanalytics.snowplow.enrich.common.EtlPipeline$.processEvents(EtlPipeline.scala:74)
 at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$7.apply(EtlJob.scala:172)
 at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$7.apply(EtlJob.scala:171)
 at com.twitter.scalding.MapFunction.operate(Operations.scala:58)
 at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
 ... 21 more


Googling, the only thing I found was this http://mail-archives.us.apache.org/mod_mbox/spark-user/201504.mbox/%3CCADRmTZJm3r5+6B6zq2HXqy9xyewSp_47HeH2KQDjS7JtdjCqTw@mail.gmail.com%3E


Where the issues was resolved by moving from AMI version 3.6.0 to 3.5.0. I tried that but still getting the same error.

Any idea?

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.

To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Zohar

unread,

Jun 5, 2015, 6:08:08 AM6/5/15

to snowpl...@googlegroups.com

Was looking into ip_lookups.json, it's actually using "uri": "http://snowplow-hosted-assets.s3.amazonaws.com/third-party/maxmind". But as I said, it is taking from the repo and was working previously

...

Alex Dean

unread,

Jun 5, 2015, 6:23:17 AM6/5/15

to snowpl...@googlegroups.com

Hi Daniel,

Aha - 4. is the relevant one. Basically when you continue on unexpected error, Cascading has an "escape hatch" to write the offending input row to a path. That path is S3, so Cascading is triggering the AWS S3 file code, which is unhappy about JodaTime for some reason. (Note that Snowplow's fatjar bundles JodaTime 2.1.)

That's why we haven't seen the error yet on our side. It's particularly odd as the job is happily running other AWS S3 code - for example, it's able to write the bad rows out successfully.

I haven't tried this, but you might be able to work around this by setting your error bucket to hdfs:///local/snowplow/error-events/

Relevant tickets:

https://github.com/snowplow/snowplow/issues/1747
https://github.com/snowplow/snowplow/issues/1748
https://github.com/snowplow/snowplow/issues/1622

A

--

You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Zohar

unread,

Jun 5, 2015, 6:44:12 AM6/5/15

to snowpl...@googlegroups.com

Thanks Alex. Will try it now. The error bucket being "buckets.enriched.errors", right?

Alex Dean

unread,

Jun 5, 2015, 6:52:18 AM6/5/15

to snowpl...@googlegroups.com

That's the one!

Daniel Zohar

unread,

Jun 5, 2015, 8:19:33 AM6/5/15

to snowpl...@googlegroups.com

Great news - that did the job!

Just want to make sure I understand what was happening:

* Some entries are failed to be response because of some IE URL length limitation

* Which failed the entire job because continue_on_unexpected_error was set to false

* When that was set to true, a new error surfaced, which is related to Amazon's support to some time stamp used by Snowplow to store the failed logs on S3

* That was solved by setting the error output to the ephemeral HDFS storage

Is that pretty much it?

I'm guessing the only problem with this solution is that bad rows are essentially gone as soon as the cluster shuts down?

Alex Dean

unread,

Jun 7, 2015, 7:39:17 PM6/7/15

to snowpl...@googlegroups.com

That's pretty much it:

* Some entries failed JSON parsing because of some IE URL length limitation

* These failures caused an unexpected error (i.e. one not safely captured in the bad bucket) because of https://github.com/snowplow/snowplow/issues/1622

* Which failed the entire job because continue_on_unexpected_error was set to false

* When that was set to true, a new error surfaced, which is related to a bug in Amazon's S3 Java library as installed in the 3.x AMI - a bug which only occurs when Cascading's error trap code tries to write to S3

* That was solved by setting the error output to the ephemeral HDFS storage

bad rows are essentially gone as soon as the cluster shuts down?

Correct, it's not a long-term solution. If you update:

:hadoop_enrich: 1.0.0

you can try a new version which fixes these two bugs:

https://github.com/snowplow/snowplow/issues/1748
https://github.com/snowplow/snowplow/issues/1622

Cheers,

Alex

Leandro Kersting de Freitas

unread,

Jun 24, 2015, 10:26:30 AM6/24/15

to snowpl...@googlegroups.com

In my case with the Maven 3.3.3 and 'Wildfly 9.0.0-RC2', i need add in pom.xml

<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.8.1</version>
</dependency>

This solved my problem.

If you look at the source code of the 'aws-sdk-java' you will see the following validation.

https://github.com/aws/aws-sdk-java/blob/1.10.1/aws-java-sdk-core/src/main/java/com/amazonaws/util/DateUtils.java

/**
* Returns the original runtime exception iff the joda-time being used
* at runtime behaves as expected.
*
* @throws IllegalStateException if the joda-time being used at runtime
* doens't appear to be of the right version.
*/
private static <E extends RuntimeException> E handleException(E ex) {
if (JodaTime.hasExpectedBehavior())
return ex;
throw new IllegalStateException("Joda-time 2.2 or later version is required, but found version: " + JodaTime.getVersion(), ex);
}

...

Alex Dean

unread,

Jun 24, 2015, 6:06:50 PM6/24/15

to snowpl...@googlegroups.com

Thanks for the additional insight Leandro!

Alex

--

You received this message because you are subscribed to the Google Groups "Snowplow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to snowplow-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward