Namenode connection conflict

Pitt Fagan

unread,

Apr 9, 2015, 10:10:18 AM4/9/15

to geotrel...@googlegroups.com

Hi guys,

OK, so I have the Mesos-leader and one Mesos-follower up on AWS. Running the example of parallelizing a list of numbers and collecting a filtered list back to the driver (in the README file of the GitHub repo) works fine. When running the attached ingestion script, the rasters fail to be ingested into Accumulo. From the command line, if I run something like: hadoop fs -ls /accumulo, I get back a directory listing. I was able to create directories and place files in HDFS manually. I believe that the issue is with the value for the CATALOG variable on L22 of the attached file. The current CATALOG value is 'hdfs://namenode.service.geotrellis-spark.internal/accumulo/data/catalog' This directory exists in HDFS and is empty.

Any assistance would be appreciated.

Thanks,

Pitt

Below is the entire output from the script.

ubuntu@ip-10-0-1-42:~$ python ./scripts/raster_processing.py

Input file size is 2591, 2502

0...10...20...30...40...50...60...70...80...90...100 - done.

Spark assembly has been built with Hive, including Datanucleus jars on classpath

13:55:55 Slf4jLogger: Slf4jLogger started

13:55:55 Remoting: Starting remoting

13:55:55 Remoting: Remoting started; listening on addresses :[akka.tcp://spark...@zookeeper.service.geotrellis-spark.internal:42507]

13:55:55 NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

I0409 13:55:56.597615 16781 sched.cpp:137] Version: 0.21.1

2015-04-09 13:55:56,597:16658(0x7f0658fd4700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5

2015-04-09 13:55:56,597:16658(0x7f0658fd4700):ZOO_INFO@log_env@716: Client environment:host.name=ip-10-0-1-42

2015-04-09 13:55:56,597:16658(0x7f0658fd4700):ZOO_INFO@log_env@723: Client environment:os.name=Linux

2015-04-09 13:55:56,597:16658(0x7f0658fd4700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-48-generic

2015-04-09 13:55:56,597:16658(0x7f0658fd4700):ZOO_INFO@log_env@725: Client environment:os.version=#80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015

2015-04-09 13:55:56,597:16658(0x7f0658fd4700):ZOO_INFO@log_env@733: Client environment:user.name=ubuntu

2015-04-09 13:55:56,597:16658(0x7f0658fd4700):ZOO_INFO@log_env@741: Client environment:user.home=/home/ubuntu

2015-04-09 13:55:56,598:16658(0x7f0658fd4700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ubuntu

2015-04-09 13:55:56,598:16658(0x7f0658fd4700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=zookeeper.service.geotrellis-spark.internal:2181 sessionTimeout=10000 watcher=0x7f065ae8e6a0 sessionId=0 sessionPasswd=<null> context=0x7f0654010cb0 flags=0

2015-04-09 13:55:56,600:16658(0x7f0650ff9700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.1.42:2181]

2015-04-09 13:55:56,601:16658(0x7f0650ff9700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.1.42:2181], sessionId=0x14c7724138c0061, negotiated timeout=10000

I0409 13:55:56.602052 16782 group.cpp:313] Group process (group(1)@10.0.1.42:34543) connected to ZooKeeper

I0409 13:55:56.602093 16782 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)

I0409 13:55:56.602123 16782 group.cpp:385] Trying to create path '/mesos' in ZooKeeper

I0409 13:55:56.602905 16782 detector.cpp:138] Detected a new leader: (id='2')

I0409 13:55:56.603024 16782 group.cpp:659] Trying to get '/mesos/info_0000000002' in ZooKeeper

I0409 13:55:56.603466 16786 detector.cpp:433] A new leading master (UPID=mas...@10.0.1.42:5050) is detected

I0409 13:55:56.603582 16782 sched.cpp:234] New master detected at mas...@10.0.1.42:5050

I0409 13:55:56.603708 16782 sched.cpp:242] No credentials provided. Attempting to register without authentication

I0409 13:55:56.604648 16783 sched.cpp:408] Framework registered with 20150401-224001-704708618-5050-1958-0086

Exception in thread "main" java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status; Host Details : local host is: "ip-10-0-1-42/10.0.1.42"; destination host is: "namenode.service.geotrellis-spark.internal":8020;

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760)

at org.apache.hadoop.ipc.Client.call(Client.java:1229)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)

at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)

at geotrellis.spark.io.hadoop.HdfsUtils$.ensurePathExists(HdfsUtils.scala:45)

at geotrellis.spark.io.hadoop.HadoopCatalog$.apply(HadoopCatalog.scala:229)

at geotrellis.spark.ingest.HadoopIngestCommand$.main(HadoopIngestCommand.scala:27)

at geotrellis.spark.ingest.HadoopIngestCommand$.main(HadoopIngestCommand.scala:19)

at com.quantifind.sumac.ArgMain$class.mainHelper(ArgApp.scala:45)

at com.quantifind.sumac.ArgMain$class.main(ArgApp.scala:34)

at geotrellis.spark.ingest.HadoopIngestCommand$.main(HadoopIngestCommand.scala:19)

at geotrellis.spark.ingest.HadoopIngestCommand.main(HadoopIngestCommand.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status

at com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:81)

at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.buildParsed(RpcPayloadHeaderProtos.java:1094)

at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.access$1300(RpcPayloadHeaderProtos.java:1028)

at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:986)

at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:938)

at org.apache.hadoop.ipc.Client$Connection.run(Client.java:836)

raster_processing.py

Pitt Fagan

unread,

Apr 9, 2015, 10:21:20 AM4/9/15

to geotrel...@googlegroups.com

Forgot to mention that I am running the Spark job with Spark version 1.2.0. Looking at some posts for this error online it possibly could be a Hadoop version mistmatch. Eugene recommended that I export the following variables and recreate the jar file. I did this prior to receiving the error.

export SPARK_HADOOP_VERSION="2.5.0-cdh5.3.3"

export SPARK_VERSION="1.2.0-cdh5.3.3"

I then recreated the uber jar file: ./sbt "project spark" assembly

On Thursday, April 9, 2015 at 9:10:18 AM UTC-5, Pitt Fagan wrote:

Hi guys,

OK, so I have the Mesos-leader and one Mesos-follower up on AWS. Running the example of parallelizing a list of numbers and collecting a filtered list back to the driver (in the README file of the GitHub repo) works fine. When running the attached ingestion script, the rasters fail to be ingested into Accumulo. From the command line, if I run something like: hadoop fs -ls /accumulo, I get back a directory listing. I was able to create directories and place files in HDFS manually. I believe that the issue is with the value for the CATALOG variable on L22 of the attached file. The current CATALOG value is 'hdfs://namenode.service.geotrellis-spark.internal/accumulo/data/catalog' This directory exists in HDFS and is empty.

Any assistance would be appreciated.

Thanks,
Pitt

Below is the entire output from the script.

ubuntu@ip-10-0-1-42:~$ python ./scripts/raster_processing.py
Input file size is 2591, 2502
0...10...20...30...40...50...60...70...80...90...100 - done.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
13:55:55 Slf4jLogger: Slf4jLogger started
13:55:55 Remoting: Starting remoting

13:55:55 Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@zookeeper.service.geotrellis-spark.internal:42507]

Eugene Cheipesh

unread,

Apr 9, 2015, 10:59:45 AM4/9/15

to geotrel...@googlegroups.com

Hi Pitt,

I completely didn’t catch it looking at your script the first time that you are using your own distro of spark.

The ansible roles that are part of the AMI creation install the Cloudera ubuntu packages for spark and HDFS on all nodes.

Sourced from: http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh/dists/

Then the trick is to match the geotrellis assembly to depend on Cloudera distributed maven artifacts by setting those environment variables you mentioned. This ensures that all the transient dependencies versions match when you build the geotrellis assembly.

Those versions float, so it’s good to check what is actually installed by using:

apt-cache show spark-core | grep Version

Make sure "MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so” is in your environment

Then you should be able to use spark-submit that is already on the machine like so:

spark-submit \

--class geotrellis.spark.ingest.HadoopIngestCommandt \

--master mesos://zk://zookeeper.service.geotrellis-spark.internal:2181/mesos \

--conf spark.mesos.coarse=true \

--conf spark.executor.memory=20g \

--conf spark.executorEnv.SPARK_LOCAL_DIRS="/media/ephemeral0,/media/ephemeral1" \

--driver-library-path /usr/local/lib spark/target/scala-2.10/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar \

--input s3n://$AWS_ID:$AWS_KEY@bucket/key

--layerName myLayer --crs EPSG:3857 --clobber true \

--catalog hdfs://namenode.service.geotrellis-spark.internal/gt-catalog

Adjust the values in bold to match the machine types you’re using, ex: m3 larges only have one ephemeral mount point.

Note: "--driver-library-path" is given so spark job can find the GDAL JNI bindings which are installed across the cluster.

--
Eugene Cheipesh

--
You received this message because you are subscribed to the Google Groups "geotrellis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pitt Fagan

unread,

Apr 9, 2015, 11:30:00 AM4/9/15

to geotrel...@googlegroups.com

OK thanks Eugene.

The apt-cache command results in the following output:

ubuntu@ip-10-0-1-42:~/geotrellis$ apt-cache show spark-core | grep Version

Version: 1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17~trusty-cdh5.3.2

This matches the value of the SPARK_VERSION variable I specified.

I already had the $MESOS_NATIVE_LIBRARY set before starting this process, but oddly enough when I try to list out the variable I get this result, which is odd.

ubuntu@ip-10-0-1-42:~/geotrellis$ export MESOS_NATIVE_LIBRARY="/usr/local/lib/libmesos.so"

ubuntu@ip-10-0-1-42:~/geotrellis$ $MESOS_NATIVE_LIBRARY

Segmentation fault (core dumped)

I do not need to roll with my own Spark. I would be happy using the Spark distribution that comes with GeoTrellis if it would smooth things out.

At any rate, I will get to work making these changes that you specify and I'll let you know how it goes!

Pitt

On Thursday, April 9, 2015 at 9:10:18 AM UTC-5, Pitt Fagan wrote:

Hi guys,

OK, so I have the Mesos-leader and one Mesos-follower up on AWS. Running the example of parallelizing a list of numbers and collecting a filtered list back to the driver (in the README file of the GitHub repo) works fine. When running the attached ingestion script, the rasters fail to be ingested into Accumulo. From the command line, if I run something like: hadoop fs -ls /accumulo, I get back a directory listing. I was able to create directories and place files in HDFS manually. I believe that the issue is with the value for the CATALOG variable on L22 of the attached file. The current CATALOG value is 'hdfs://namenode.service.geotrellis-spark.internal/accumulo/data/catalog' This directory exists in HDFS and is empty.

Any assistance would be appreciated.

Thanks,
Pitt

Below is the entire output from the script.

ubuntu@ip-10-0-1-42:~$ python ./scripts/raster_processing.py
Input file size is 2591, 2502
0...10...20...30...40...50...60...70...80...90...100 - done.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
13:55:55 Slf4jLogger: Slf4jLogger started
13:55:55 Remoting: Starting remoting

13:55:55 Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@zookeeper.service.geotrellis-spark.internal:42507]

Hector Castro

unread,

Apr 9, 2015, 11:48:59 AM4/9/15

to geotrel...@googlegroups.com

On Thu, Apr 9, 2015 at 11:30 AM, Pitt Fagan <pitt...@gmail.com> wrote:
> OK thanks Eugene.
>
> The apt-cache command results in the following output:
>
> ubuntu@ip-10-0-1-42:~/geotrellis$ apt-cache show spark-core | grep Version
> Version: 1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17~trusty-cdh5.3.2
>
> This matches the value of the SPARK_VERSION variable I specified.

Catching up on this thread, it looks like you may have specified
SPARK_VERSION with a trailing `cdh5.3.3` vs. `cdh5.3.2`. Not 100% on
the difference that makes in this context, but one thing I know makes
a difference is that you were executing a prebuilt Spark distribution
for cdh4 from your Python script.

When Eugene said "should be able to use spark-submit that is already
on the machine", that means the Spark version installed via APT
automatically places the `spark-submit` and `spark-shell` binaries in
a location that is part of the default PATH. Your subprocess for
`spark-submit` in your Python script should end up being something
like:

spark-submit ....

Versus the current:

/home/ubuntu/spark-1.2.0-bin-cdh4/bin/spark-submit ...

> I already had the $MESOS_NATIVE_LIBRARY set before starting this process,
> but oddly enough when I try to list out the variable I get this result,
> which is odd.
>
> ubuntu@ip-10-0-1-42:~/geotrellis$ export
> MESOS_NATIVE_LIBRARY="/usr/local/lib/libmesos.so"
> ubuntu@ip-10-0-1-42:~/geotrellis$ $MESOS_NATIVE_LIBRARY
> Segmentation fault (core dumped)

This one is going to require that you prefix the environment variable
name with `echo`.

>> :[akka.tcp://spark...@zookeeper.service.geotrellis-spark.internal:42507]

Pitt Fagan

unread,

Apr 9, 2015, 12:07:29 PM4/9/15

to geotrel...@googlegroups.com

Hi Hector,

Yes, after I first posted the message, I caught the 5.3.3 vs 5.3.2 so I had already changed the SPARK_VERSION variable to reflect this.

Also, thanks for the tip about always using echo! the SPARK_VERSION variable listed out without any need for this but the MESOS_NATIVE_LIBRARY variable needs it.

Anyway, I'm almost done making Eugene's recommended changes so will hopefully post something very soon.

Pitt

>> :[akka.tcp://sparkDriver@zookeeper.service.geotrellis-spark.internal:42507]

Pitt Fagan

unread,

Apr 9, 2015, 12:32:16 PM4/9/15

to geotrel...@googlegroups.com

Hi guys,

OK, so the Mesos-leader is an r3.large and the one Mesos-follower is an m3.large. For the --input arguement below, there is one GeoTiff file in this directory.

Here is the command I am running from the command line (I put the backslashes here for readability):

ubuntu@ip-10-0-1-42:~/geotrellis$ spark-submit \

--class geotrellis.spark.ingest.HadoopIngestCommand \

--master mesos://zk://zookeeper.service.geotrellis-spark.internal:2181/mesos \

--conf spark.mesos.coarse=true \

--conf spark.executor.memory=5g \

--conf spark.executorEnv.SPARK_LOCAL_DIRS="/media/ephemeral0" \

--driver-library-path /usr/local/lib /home/ubuntu/geotrellis/spark/target/scala-2.10/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar \

--crs EPSG:3857 \

--pyramid false \

--clobber true \

--input file:/home/ubuntu/datasets/s3/backups/2015/04/09/16/tiles/ls8r/LC80340322013292LGN00/1295534/calibration/ \

--catalog hdfs://namenode.service.geotrellis-spark.internal:8020/accumulo/data/catalog \

--layerName s7

The good news is that I am past the previous issue, so thanks for that! Here is the current output.

15/04/09 16:29:55 INFO spark.SecurityManager: Changing view acls to: ubuntu

15/04/09 16:29:55 INFO spark.SecurityManager: Changing modify acls to: ubuntu

15/04/09 16:29:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)

15/04/09 16:29:56 INFO slf4j.Slf4jLogger: Slf4jLogger started

15/04/09 16:29:56 INFO Remoting: Starting remoting

15/04/09 16:29:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark...@zookeeper.service.geotrellis-spark.internal:38369]

15/04/09 16:29:56 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark...@zookeeper.service.geotrellis-spark.internal:38369]

15/04/09 16:29:56 INFO util.Utils: Successfully started service 'sparkDriver' on port 38369.

15/04/09 16:29:56 INFO spark.SparkEnv: Registering MapOutputTracker

15/04/09 16:29:56 INFO spark.SparkEnv: Registering BlockManagerMaster

15/04/09 16:29:56 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150409162956-0c09

15/04/09 16:29:56 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB

15/04/09 16:29:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/04/09 16:29:57 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-f23b0dde-e84c-4cb9-a692-0e12c7e1ccda

15/04/09 16:29:57 INFO spark.HttpServer: Starting HTTP Server

15/04/09 16:29:57 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/04/09 16:29:57 INFO server.AbstractConnector: Started SocketC...@0.0.0.0:40998

15/04/09 16:29:57 INFO util.Utils: Successfully started service 'HTTP file server' on port 40998.

15/04/09 16:29:57 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/04/09 16:29:57 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:4040

15/04/09 16:29:57 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

15/04/09 16:29:57 INFO ui.SparkUI: Started SparkUI at http://zookeeper.service.geotrellis-spark.internal:4040

15/04/09 16:29:57 INFO spark.SparkContext: Added JAR file:/home/ubuntu/geotrellis/spark/target/scala-2.10/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar at http://10.0.1.42:40998/jars/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar with timestamp 1428596997663

I0409 16:29:57.810159 2127 sched.cpp:137] Version: 0.21.1

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@716: Client environment:host.name=ip-10-0-1-42

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@723: Client environment:os.name=Linux

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-48-generic

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@725: Client environment:os.version=#80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@733: Client environment:user.name=ubuntu

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@741: Client environment:user.home=/home/ubuntu

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ubuntu/geotrellis

2015-04-09 16:29:57,814:1890(0x7f52b0cf4700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=zookeeper.service.geotrellis-spark.internal:2181 sessionTimeout=10000 watcher=0x7f52b2c8c6a0 sessionId=0 sessionPasswd=<null> context=0x7f52f9519ab0 flags=0

2015-04-09 16:29:57,817:1890(0x7f52ac4eb700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.1.42:2181]

2015-04-09 16:29:57,819:1890(0x7f52ac4eb700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.1.42:2181], sessionId=0x14c7724138c006f, negotiated timeout=10000

I0409 16:29:57.819597 2131 group.cpp:313] Group process (group(1)@10.0.1.42:34064) connected to ZooKeeper

I0409 16:29:57.819675 2131 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)

I0409 16:29:57.819743 2131 group.cpp:385] Trying to create path '/mesos' in ZooKeeper

I0409 16:29:57.820658 2128 detector.cpp:138] Detected a new leader: (id='2')

I0409 16:29:57.820783 2128 group.cpp:659] Trying to get '/mesos/info_0000000002' in ZooKeeper

I0409 16:29:57.829401 2128 detector.cpp:433] A new leading master (UPID=mas...@10.0.1.42:5050) is detected

I0409 16:29:57.829483 2128 sched.cpp:234] New master detected at mas...@10.0.1.42:5050

I0409 16:29:57.829601 2128 sched.cpp:242] No credentials provided. Attempting to register without authentication

I0409 16:29:57.830821 2132 sched.cpp:408] Framework registered with 20150401-224001-704708618-5050-1958-0100

15/04/09 16:29:57 INFO mesos.CoarseMesosSchedulerBackend: Registered as framework ID 20150401-224001-704708618-5050-1958-0100

15/04/09 16:29:58 INFO netty.NettyBlockTransferService: Server created on 56196

15/04/09 16:29:58 INFO storage.BlockManagerMaster: Trying to register BlockManager

15/04/09 16:29:58 INFO storage.BlockManagerMasterActor: Registering block manager zookeeper.service.geotrellis-spark.internal:56196 with 265.4 MB RAM, BlockManagerId(<driver>, zookeeper.service.geotrellis-spark.internal, 56196)

15/04/09 16:29:58 INFO storage.BlockManagerMaster: Registered BlockManager

15/04/09 16:29:58 INFO mesos.CoarseMesosSchedulerBackend: Mesos task 0 is now TASK_RUNNING

15/04/09 16:29:58 INFO mesos.CoarseMesosSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string

at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)

at org.apache.hadoop.fs.Path.<init>(Path.java:135)

at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:244)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)

at geotrellis.spark.io.hadoop.HdfsUtils$.putFilesInConf(HdfsUtils.scala:58)

at geotrellis.spark.io.hadoop.package$HadoopConfigurationWrapper.withInputDirectory(package.scala:62)

at geotrellis.spark.io.hadoop.HadoopSparkContextMethods$class.hadoopGeoTiffRDD(HadoopSparkContextMethods.scala:29)

at geotrellis.spark.io.hadoop.package$HadoopSparkContextMethodsWrapper.hadoopGeoTiffRDD(package.scala:50)

at geotrellis.spark.ingest.HadoopIngestCommand$.main(HadoopIngestCommand.scala:28)

at geotrellis.spark.ingest.HadoopIngestCommand$.main(HadoopIngestCommand.scala:19)

at com.quantifind.sumac.ArgMain$class.mainHelper(ArgApp.scala:45)

at com.quantifind.sumac.ArgMain$class.main(ArgApp.scala:34)

at geotrellis.spark.ingest.HadoopIngestCommand$.main(HadoopIngestCommand.scala:19)

at geotrellis.spark.ingest.HadoopIngestCommand.main(HadoopIngestCommand.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

On Thursday, April 9, 2015 at 9:10:18 AM UTC-5, Pitt Fagan wrote:

Hi guys,

OK, so I have the Mesos-leader and one Mesos-follower up on AWS. Running the example of parallelizing a list of numbers and collecting a filtered list back to the driver (in the README file of the GitHub repo) works fine. When running the attached ingestion script, the rasters fail to be ingested into Accumulo. From the command line, if I run something like: hadoop fs -ls /accumulo, I get back a directory listing. I was able to create directories and place files in HDFS manually. I believe that the issue is with the value for the CATALOG variable on L22 of the attached file. The current CATALOG value is 'hdfs://namenode.service.geotrellis-spark.internal/accumulo/data/catalog' This directory exists in HDFS and is empty.

Any assistance would be appreciated.

Thanks,
Pitt

Below is the entire output from the script.

ubuntu@ip-10-0-1-42:~$ python ./scripts/raster_processing.py
Input file size is 2591, 2502
0...10...20...30...40...50...60...70...80...90...100 - done.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
13:55:55 Slf4jLogger: Slf4jLogger started
13:55:55 Remoting: Starting remoting

13:55:55 Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@zookeeper.service.geotrellis-spark.internal:42507]

Rob Emanuele

unread,

Apr 9, 2015, 1:12:23 PM4/9/15

to geotrel...@googlegroups.com

Hey Pitt,

Are you trying to do an Accumulo ingest? The deploy should have set up Accumulo, and I'd recommend using it. It seems like you're trying to write to accumulo directly with the Hadoop ingest..."catalog hdfs://namenode.service.geotrellis-spark.internal:8020/accumulo/data/catalog". Instead, you should use the AccumuloIngestCommand.

Here is a gist of a script that should help you do that:

https://gist.github.com/lossyrob/fda457994a4be6db598c

Want to try that out?

Thanks,

Rob

--

You received this message because you are subscribed to the Google Groups "geotrellis-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geotrellis-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Pitt Fagan

unread,

Apr 9, 2015, 1:42:57 PM4/9/15

to geotrel...@googlegroups.com

Hi rob,

Yes, I am trying to ingest the raster into Accumulo, but what you write below is probably is an issue. When I was working on this locally, I was ingesting the rasters into HDFS. I remember you saying that Accumulo was preferable and the AWS machines are my first trial with Accumulo. Let me give your ist a try and see what's what.

Thanks,

pitt

15/04/09 16:29:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@zookeeper.service.geotrellis-spark.internal:38369]
15/04/09 16:29:56 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@zookeeper.service.geotrellis-spark.internal:38369]

Pitt Fagan

unread,

Apr 9, 2015, 2:44:50 PM4/9/15

to geotrel...@googlegroups.com

Howdy Rob,

OK, here is the command I am running, based on your gist. I did not know what to put in for the user and password values so I left your default values in.

spark-submit --class geotrellis.spark.ingest.AccumuloIngestCommand /home/ubuntu/geotrellis/spark/target/scala-2.10/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar --instance geotrellis-accumulo-cluster --user root --password secret --zookeeper zookeeper.service.geotrellis-spark.internal --crs EPSG:3857 --pyramid false --clobber true --input file:/home/ubuntu/datasets/s3/backups/2015/04/09/16/tiles/ls8r/LC80340322013292LGN00/1295534/calibration --layerName s7 --table 1295534

Here is part of the output, including the error. What precedes this is a huge list of jar files which I did not include.

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:os.version=3.13.0-48-generic

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:user.name=ubuntu

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/ubuntu

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/ubuntu

15/04/09 18:36:18 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zookeeper.service.geotrellis-spark.internal sessionTimeout=30000 watcher=org.apache.accumulo.fate.zookeeper.ZooSession$ZooWatcher@462cc1e9

15/04/09 18:36:18 INFO zookeeper.ClientCnxn: Opening socket connection to server zookeeper.service.geotrellis-spark.internal/10.0.1.42:2181. Will not attempt to authenticate using SASL (unknown error)

15/04/09 18:36:19 INFO zookeeper.ClientCnxn: Socket connection established to zookeeper.service.geotrellis-spark.internal/10.0.1.42:2181, initiating session

15/04/09 18:36:19 INFO zookeeper.ClientCnxn: Session establishment complete on server zookeeper.service.geotrellis-spark.internal/10.0.1.42:2181, sessionid = 0x14c7724138c007f, negotiated timeout = 30000

Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string

at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)

at org.apache.hadoop.fs.Path.<init>(Path.java:135)

at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:244)

at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)

at geotrellis.spark.io.hadoop.HdfsUtils$.putFilesInConf(HdfsUtils.scala:58)

at geotrellis.spark.io.hadoop.package$HadoopConfigurationWrapper.withInputDirectory(package.scala:62)

at geotrellis.spark.io.hadoop.HadoopSparkContextMethods$class.hadoopGeoTiffRDD(HadoopSparkContextMethods.scala:29)

at geotrellis.spark.io.hadoop.package$HadoopSparkContextMethodsWrapper.hadoopGeoTiffRDD(package.scala:50)

at geotrellis.spark.ingest.AccumuloIngestCommand$.main(AccumuloIngestCommand.scala:35)

at geotrellis.spark.ingest.AccumuloIngestCommand$.main(AccumuloIngestCommand.scala:26)

at com.quantifind.sumac.ArgMain$class.mainHelper(ArgApp.scala:45)

at com.quantifind.sumac.ArgMain$class.main(ArgApp.scala:34)

at geotrellis.spark.ingest.AccumuloIngestCommand$.main(AccumuloIngestCommand.scala:26)

at geotrellis.spark.ingest.AccumuloIngestCommand.main(AccumuloIngestCommand.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

-- -------------------------

Pitt Fagan

unread,

Apr 10, 2015, 12:24:18 PM4/10/15

to geotrel...@googlegroups.com

This might be a red herring but I was playing around with variations of the --input argument. I am running the following command from the directory /home/ubuntu/geotrellis. The tif file I am loading is also in this same directory.

ubuntu@ip-10-0-1-42:~/geotrellis$ spark-submit --class geotrellis.spark.ingest.AccumuloIngestCommand /home/ubuntu/geotrellis/spark/target/scala-2.10/geotrellis-spark-assembly-0.10.0-SNAPSHOT.jar --instance geotrellis-accumulo-cluster --user root --password secret --zookeeper zookeeper.service.geotrellis-spark.internal --crs EPSG:3857 --pyramid false --clobber true --input file:/ --layerName s7 --table 1295534

Here is the exception when running this. Not sure if the flip-flopping of the file extension and the directory structure is a clue about what is happening, but figured it was interesting to report it.

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://*.tif/home/ubuntu/geotrellis, expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)

at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)

at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:519)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)

at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)

at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)

at org.apache.hadoop.fs.Globber.glob(Globber.java:252)

at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625)

at geotrellis.spark.io.hadoop.HdfsUtils$.listFiles(HdfsUtils.scala:85)

at geotrellis.spark.io.hadoop.package$HadoopConfigurationWrapper.withInputDirectory(package.scala:61)

at geotrellis.spark.io.hadoop.HadoopSparkContextMethods$class.hadoopGeoTiffRDD(HadoopSparkContextMethods.scala:29)

at geotrellis.spark.io.hadoop.package$HadoopSparkContextMethodsWrapper.hadoopGeoTiffRDD(package.scala:50)

at geotrellis.spark.ingest.AccumuloIngestCommand$.main(AccumuloIngestCommand.scala:35)

at geotrellis.spark.ingest.AccumuloIngestCommand$.main(AccumuloIngestCommand.scala:26)

at com.quantifind.sumac.ArgMain$class.mainHelper(ArgApp.scala:45)

at com.quantifind.sumac.ArgMain$class.main(ArgApp.scala:34)

at geotrellis.spark.ingest.AccumuloIngestCommand$.main(AccumuloIngestCommand.scala:26)

at geotrellis.spark.ingest.AccumuloIngestCommand.main(AccumuloIngestCommand.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

On Thursday, April 9, 2015 at 9:10:18 AM UTC-5, Pitt Fagan wrote:

Hi guys,

OK, so I have the Mesos-leader and one Mesos-follower up on AWS. Running the example of parallelizing a list of numbers and collecting a filtered list back to the driver (in the README file of the GitHub repo) works fine. When running the attached ingestion script, the rasters fail to be ingested into Accumulo. From the command line, if I run something like: hadoop fs -ls /accumulo, I get back a directory listing. I was able to create directories and place files in HDFS manually. I believe that the issue is with the value for the CATALOG variable on L22 of the attached file. The current CATALOG value is 'hdfs://namenode.service.geotrellis-spark.internal/accumulo/data/catalog' This directory exists in HDFS and is empty.

Any assistance would be appreciated.

Thanks,
Pitt

Below is the entire output from the script.

ubuntu@ip-10-0-1-42:~$ python ./scripts/raster_processing.py
Input file size is 2591, 2502
0...10...20...30...40...50...60...70...80...90...100 - done.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
13:55:55 Slf4jLogger: Slf4jLogger started
13:55:55 Remoting: Starting remoting

13:55:55 Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@zookeeper.service.geotrellis-spark.internal:42507]

Reply all

Reply to author

Forward