Scalding: which version of scala to choose for EMR jobs?

瀏覽次數:967 次
跳到第一則未讀訊息

Lakshmi

未讀,
2015年6月26日 清晨6:55:222015/6/26
收件者:cascadi...@googlegroups.com
Hello,

I would like to run my Scalding job on EMR using the latest AMI here (version 3.8):

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/ami-versions-supported.html

Since this AMI uses Scala 2.11.1, I switched my scalding jar and application jar to the same version as well. But when I run my job on EMR, it fails on the following error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;

at com.aggregation.job.DataAggregation$.<init>(DataAggregation.scala:30)

at com.aggregation.job.DataAggregation$.<clinit>(DataAggregation.scala)

at com.aggregation.job.DataAggregation.main(DataAggregation.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


I made sure to do an sbt clean and sbt assembly to update the scala version in both projects. Has anyone run into this with jobs on EMR? Does the version of Hadoop have anything to do with this?


Thanks in advance!


Lakshmi

Ian O'Connell

未讀,
2015年6月26日 中午12:27:392015/6/26
收件者:cascadi...@googlegroups.com
This usually suggests you have a mix of scala versions on the class path. Build an ivy resolve and see if anything looks like a scala 2.10 dependency. Probably one in there somewhere

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/a3ba173a-72ba-41a6-adfd-64f6e4960b62%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Lakshmi

未讀,
2015年6月26日 下午3:28:372015/6/26
收件者:cascadi...@googlegroups.com
Hi Ian,

Thanks! I haven't built an ivy resolve before. Can you please point me to some instructions for it?

Is there usually an issue if the minor version is different in the EMR AMI (2.11.1) versus 2.11.6 in my jar. I am seeing that even though I specify 2.11.1 in scalding project/Build.scala, the 'sbt assembly' command shows me this below. Is there something in scalding sbt or configs that I need to update? 


[info] downloading https://repo1.maven.org/maven2/org/scala-lang/scala-library/2.11.1/scala-library-2.11.1.jar ...

[info] [SUCCESSFUL ] org.scala-lang#scala-library;2.11.1!scala-library.jar (1323ms)

[info] downloading https://repo1.maven.org/maven2/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar ...

[info] [SUCCESSFUL ] org.mockito#mockito-all;1.8.5!mockito-all.jar (520ms)

[info] downloading https://repo1.maven.org/maven2/org/scalacheck/scalacheck_2.11/1.12.2/scalacheck_2.11-1.12.2.jar ...

[info] [SUCCESSFUL ] org.scalacheck#scalacheck_2.11;1.12.2!scalacheck_2.11.jar (456ms)

[info] downloading https://repo1.maven.org/maven2/org/scalatest/scalatest_2.11/2.2.4/scalatest_2.11-2.2.4.jar ...

[info] [SUCCESSFUL ] org.scalatest#scalatest_2.11;2.2.4!scalatest_2.11.jar(bundle) (1214ms)

[info] downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.6.6/slf4j-log4j12-1.6.6.jar ...

[info] [SUCCESSFUL ] org.slf4j#slf4j-log4j12;1.6.6!slf4j-log4j12.jar (354ms)

[info] downloading https://repo1.maven.org/maven2/org/scala-lang/scala-library/2.11.5/scala-library-2.11.5.jar ...

[info] [SUCCESSFUL ] org.scala-lang#scala-library;2.11.5!scala-library.jar (997ms)

[info] downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parser-combinators_2.11/1.0.2/scala-parser-combinators_2.11-1.0.2.jar ...

[info] [SUCCESSFUL ] org.scala-lang.modules#scala-parser-combinators_2.11;1.0.2!scala-parser-combinators_2.11.jar(bundle) (398ms)

[info] downloading https://repo1.maven.org/maven2/org/scala-lang/scala-reflect/2.11.2/scala-reflect-2.11.2.jar ...

[info] [SUCCESSFUL ] org.scala-lang#scala-reflect;2.11.2!scala-reflect.jar (854ms)



Now, I am not sure if I need to override any settings as I see 2 versions of scala-library going into the assembly step. 

Thanks,
Lakshmi

Oscar Boykin

未讀,
2015年6月26日 下午4:01:172015/6/26
收件者:cascadi...@googlegroups.com
two versions of scala 2.11 should not be a problem. A transitive dependency on 2.10 will be a problem. Google/stackoverflow how to find all your transitive dependencies.


For more options, visit https://groups.google.com/d/optout.



--
Oscar Boykin :: @posco :: http://twitter.com/posco

Lakshmi Gopalan

未讀,
2015年6月26日 下午4:26:042015/6/26
收件者:cascadi...@googlegroups.com
Thank you, Oscar. I'll try that.

Lakshmi

--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/MI7EPfb0_Kw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

Lakshmi

未讀,
2015年6月26日 晚上7:53:142015/6/26
收件者:cascadi...@googlegroups.com
So I have tried cleaning, rebuilding and verifying that I have the right version of scala everywhere. Still have the error. Attached is my dependency tree that I got by running the sbt tool for generating one. I posted this in the scala-user forum and waiting for answers, but I figured I would check here as well, as I am not sure where this is breaking.

Thanks, 
Lakshmi
sbt-dependency-tree.txt

Lakshmi

未讀,
2015年6月28日 中午12:25:392015/6/28
收件者:cascadi...@googlegroups.com
Hi,

I would really appreciate getting help on this issue. I have spent the last 48+ hours trying to debug the versioning error above. All the tutorials for scalding emr seem to be using older scalding and scala versions, and so I guess they don't apply to the latest Amazon EMR AMI which uses Scala 2.11.1 (link: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/ami-versions-supported.html).  

Based on my project setup below, can someone please confirm if my scalding/scala version combination will work for the AMI scala version of 2.11.1 or 2.11.5? Also - JVM version on my mac is "1.7.0_80" while the one on the EMR instance is "1.7.0_76". Is this something to worry about?


Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;

 at com.aggregation.job.DataAggregation$.<init>(DataAggregation.scala:30)



Either way, here is what I have tried:

- manually upgraded the scala binary installation in the Amazon EMR instance to 2.11.5
- compiled my project with 2.11.2, because one of the transitive dependencies inside scalding json uses scala-reflect:2.11.2 instead of 2.11.1  
- tried to play around with the jvm "target version" by setting it to 1.7 although this probably doesn't help as scala is targeted at java 6

I have pasted  the current state of my project. 


- dependencies.sbt

val hadoopVersion = "1.2.1"

val scaldingVersion = "0.15.0"

libraryDependencies ++= Seq(
 
"com.twitter" %% "scalding-core" % scaldingVersion,
 
"com.twitter" %% "scalding-json" % scaldingVersion,
 
"com.twitter" %% "scalding-jdbc" % scaldingVersion,
 
"com.github.nscala-time" %% "nscala-time" % "2.0.0",
// include Hadoop runtime to run locally in "local" mode
  //"org.apache.hadoop" % "hadoop-core" % hadoopVersion
  // to run on hadoop in "hdfs" mode, replace above with the following to exclude Hadoop from assembly jar
  "org.apache.hadoop" % "hadoop-core" % hadoopVersion % "provided"
)

resolvers ++= Seq(
 
"Conjars repo" at "http://conjars.org/repo"
)


build.sbt

organization := "com.abc"

name := "aggregator"

scalaVersion := "2.11.2"

ivyScala := ivyScala.value map {
  _
.copy(overrideScalaVersion = true)
}

javacOptions ++= Seq("-source", "1.6", "-target", "1.7") // not sure if i need this, but my Amazon EMR AMI is 3.8.0 and it uses Java 7.


- assembly.sbt

import AssemblyKeys._

assemblySettings

mergeStrategy in assembly := Merge.mergeStrategy


- project/build.properties

sbt.version=0.13.1


- project/assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.0")


Thanks,
Lakshmi

Lakshmi

未讀,
2015年6月28日 中午12:33:152015/6/28
收件者:cascadi...@googlegroups.com
Here is the latest sbt dependency tree for the project (attached)

...
sbt-dependency-tree.txt

Ryan Desmond

未讀,
2015年6月29日 上午11:29:282015/6/29
收件者:cascadi...@googlegroups.com
Hi Lakshmi,

I am able to run Scalding jobs on EMR using:

- EMR AMI 3.1.0
- Scalding 0.13.1
- Scala 2.10.4

Does this configuration work for you?

- Ryan



For more options, visit https://groups.google.com/d/optout.



--
Ryan Desmond
Solutions Architect
Concurrent Inc.

Lakshmi

未讀,
2015年6月29日 下午1:42:492015/6/29
收件者:cascadi...@googlegroups.com
Hi Ryan,

Thanks!  I will try it out because at this point I'll be glad to have anything working at all :). I am trying to start the new project at the latest version, so we don't have to deal with an upgrade immediately.  

If you could please share any config files  to reproduce a working environment, that will be very helpful. 

In your experience how different is the newer version of Scalding (0.15) to the one you are using? 

Thanks again,
Lakshmi

Alex Dean

未讀,
2015年6月30日 凌晨4:55:542015/6/30
收件者:cascadi...@googlegroups.com
Hey Lakshmi,

We have recently done an upgrade of Snowplow to recent AMI / Scalding / Hadoop. Here is the build file:

https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-hadoop-enrich/project/BuildSettings.scala

Dependencies file:

https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-hadoop-enrich/project/Dependencies.scala

This runs fine on AMI 3.6.0.

Here is the bootstrap we use to clean Scala 2.11 and old commons-codec from the cluster:

https://github.com/snowplow/snowplow/blob/master/3-enrich/emr-etl-runner/bin/snowplow-ami3-bootstrap.sh

Hope this is helpful,

Alex

Lakshmi Gopalan

未讀,
2015年6月30日 下午2:10:512015/6/30
收件者:cascadi...@googlegroups.com
Hi Alex,

Thank you, this will be very helpful to me. I was able to learn quickly from your example project as well. I'll try this out and get back.

Thanks,
Lakshmi

--
You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/MI7EPfb0_Kw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

Lakshmi Gopalan

未讀,
2015年6月30日 下午2:26:112015/6/30
收件者:cascadi...@googlegroups.com
Alex,

I have some questions:

1. I am trying to build my scalding job at the latest scalding version 0.15 - will this work fine?
2. I would like to ideally use the 2.11 version of scala (meaning I won't need to clean scala 2.11 as you do in the bootstrap script) - will this work for 2.11 as well? 

I can try it out, but want to know what to expect.

Thank you.

Lakshmi

Lakshmi

未讀,
2015年6月30日 下午4:54:092015/6/30
收件者:cascadi...@googlegroups.com
Based on the bootstrap script from Alex, I found that there were 2 versions of scala hiding in there apart from the installed location for the 2.11 binaries. So I got rid of the 2.10 jar in all instances of the cluster, and my job completed successfully. Now I will create a bootstrap action for this.

$ sudo find / -name "scala-library-2.10.*.jar" -exec rm -rf {} \;


FYI. these are the paths it was present under:

[ec2-user@ip-172-31-72-130 ~]$ sudo find / -name "scala-library-2.11.*.jar"

/home/hadoop/.versions/hbase-0.94.18/lib/scala-library-2.11.0.jar

/usr/share/doc/scala/api/jars/scala-library-2.11.1-javadoc.jar


[ec2-user@ip-172-31-72-130 ~]$ sudo find / -name "scala-library-2.10.*.jar"

/usr/share/aws/emr/emrfs/lib/scala-library-2.10.5.jar


I could not have guessed this. Thanks a bunch, Alex! 

Lakshmi
回覆所有人
回覆作者
轉寄
0 則新訊息