Having trouble getting a multiple node Hadoop cluster

304 views
Skip to first unread message

Christopher Severs

unread,
Oct 17, 2013, 12:55:46 PM10/17/13
to h2os...@googlegroups.com
Hi,

I'm trying to get H2O to work on a Hadoop cluster and I'm able to bring up a single node cluster but if I set nodes > 1 then it ends up hanging.

The console output looks like this:
hadoop jar h2odriver_horton.jar water.hadoop.h2odriver -libjars ../h2o.jar -Dmapred.job.queue.name=hdmi-set -driverif 10.115.201.59 -timeout 1800 -mapperXmx 1g -nodes 2 -output hdfsOutputDirName
13/10/17 08:51:14 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/10/17 08:51:14 INFO security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
Using mapper->driver callback IP address and port: 10.115.201.59:34389
(You can override these with -driverif and -driverport.)
Driver program compiled with MapReduce V1 (Classic)
Memory Settings:
    mapred.child.java.opts:      -Xms1g -Xmx1g
    mapred.map.child.java.opts:  -Xms1g -Xmx1g
    Extra memory percent:        10
    mapreduce.map.memory.mb:     1126
Job name 'H2O_61026' submitted
JobTracker job ID is 'job_201310092016_36664'
Waiting for H2O cluster to come up...
H2O node 10.115.57.45:54321 requested flatfile
H2O node 10.115.5.25:54321 requested flatfile
Sending flatfiles to nodes...
    [Sending flatfile to node 10.115.57.45:54321]
    [Sending flatfile to node 10.115.5.25:54321]
H2O node 10.115.57.45:54321 reports H2O cluster size 1
H2O node 10.115.5.25:54321 reports H2O cluster size 1

It just sits there until it dies or I kill it. I've tried timeouts up to 1800.

The logs from the nodes (pulled from the jobtracker) look like this:

08:51:53.179 main      INFO WATER: ----- H2O started -----
08:51:53.183 main      INFO WATER: Build git branch: master
08:51:53.183 main      INFO WATER: Build git hash: ff8e56c1192c7f79f5a436951165b695ca54116d
08:51:53.183 main      INFO WATER: Build git describe: ff8e56c-dirty
08:51:53.183 main      INFO WATER: Build project version: 1.7.0.99999
08:51:53.183 main      INFO WATER: Built by: 'csevers'
08:51:53.183 main      INFO WATER: Built on: 'Wed Oct 16 16:57:47 PDT 2013'
08:51:53.184 main      INFO WATER: Java availableProcessors: 24
08:51:53.187 main      INFO WATER: Java heap totalMemory: 0.96 gb
08:51:53.187 main      INFO WATER: Java heap maxMemory: 0.96 gb
08:51:53.187 main      INFO WATER: ICE root: '/hadoop/1/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/2/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/3/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/4/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/5/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/6/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/7/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/8/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/9/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/10/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/11/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0,/hadoop/12/scratch/taskTracker/csevers/jobcache/job_201310092016_36664/attempt_201310092016_36664_m_000000_0'
08:51:53.234 main      INFO WATER: Internal communication uses port: 54322
+                                  Listening for HTTP and REST traffic on  http://10.115.5.25:54321/
EmbeddedH2OConfig: notifyAboutEmbeddedWebServerIpPort called (10.115.5.25, 54321)
EmbeddedH2OConfig: fetchFlatfile called
EmbeddedH2OConfig: fetchFlatfile returned
------------------------------------------------------------
10.115.57.45:54321
10.115.5.25:54321

------------------------------------------------------------
08:51:53.404 main      INFO WATER: H2O cloud name: 'H2O_61026'
08:51:53.405 main      INFO WATER: (v1.7.0.99999) 'H2O_61026' on /10.115.5.25:54321, discovery address /237.114.157.191:60786
08:51:53.408 main      INFO WATER: Cloud of size 1 formed [/10.115.5.25:54321]
EmbeddedH2OConfig: notifyAboutCloudSize called (10.115.5.25, 54321, 1)


My guess is something odd is going on in our cluster but any suggestions would help.

I built a custom Hadoop jar against vanilla Hadoop 1.1.2 if that helps. Our cluster is some version of Hortonworks (not 2.0 though).

Thanks,
Chris



 

SriSatish Ambati

unread,
Oct 17, 2013, 1:26:06 PM10/17/13
to Christopher Severs, h2ostream, Tom Kraljevic
Chris,
Let us take a quick peek at the logs and get back to you.

- What version of Horton Hadoop are you on? 
- Can you send the output of /sbin/ifconfig of the server (you can send this info directly to us)
  For multi-homed networks -network option is useful.

thanks,
Sri



--
You received this message because you are subscribed to the Google Groups "H2O Users - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
ceo & co-founder, 0xdata Inc

Tom Kraljevic

unread,
Oct 17, 2013, 1:55:47 PM10/17/13
to Christopher Severs, h2os...@googlegroups.com

Hi Chris,


Here is what I see:

(Note:  from your message I only got the log from one of the two mapper tasks.
So I am assuming the second mapper task output is similar for now.)

1.  You launched the driver on 10.115.201.59.  (good)
2.  Hadoop chose 10.115.57.45 and 10.115.5.25 as the two nodes for an H2O cluster.  (good)
3.  The two nodes started and talked to the driver to get the flatfile listing all the nodes.  (good)
4.  The two nodes are supposed to talk to each other to form a cloud.  This did not happen.  (problem)
5.  (assuming 4 worked, which it did not, the cloud would have told the driver it was up.)


So Chris, we need to diagnose number 4 above.

Make sure that 10.115.57.45 and 10.115.5.25 can talk to each other on ports 
54321 and 54322 for TCP and UDP.  You could ssh in to one of the nodes and try
to 'curl 10.115.<neighbor>:54321' for example.

Generally speaking, they should talk to each other within seconds.  So a timeout
of more than the default of two minutes should not be necessary (except for debugging).

Note that if it's relevant/possible, i recommend you start the mapper H2O tasks to be 
on the same rack (or whatever gives best network bandwidth between them) since
they will communicate with each other (unlike typical Hadoop mappers).

Please let us know how we can help further.


Thanks!
Tom


poro...@gmail.com

unread,
Dec 2, 2016, 7:57:17 AM12/2/16
to H2O Open Source Scalable Machine Learning - h2ostream
Hi Chris,

In my case, for multi-node cluster, it doesn't go beyond that "EmbeddedH2OConfig: notifyAboutEmbeddedWebServerIpPort called (10.115.5.25, 54321)
EmbeddedH2OConfig: fetchFlatfile called". How can I change the EmbeddedH2OConfig settings?

Tom Kraljevic

unread,
Dec 2, 2016, 11:50:10 AM12/2/16
to poro...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream

hi,

if you attach the output of "yarn logs" we can try to help you debug it.

tom

Sent from my iPhone

Tanmay Saha

unread,
Dec 5, 2016, 1:42:56 AM12/5/16
to Tom Kraljevic, H2O Open Source Scalable Machine Learning - h2ostream
Hi Tom,

I have posted the whole stack trace on the h2o community forum, here.

But for your ease, I am reposting the same here too.

tanny@tanny-machine:~/binaries/h2o-3.10.0.8-hdp2.4$   hadoop jar h2odriver.jar -nodes 3 -mapperXmx 6g -output hdfsOutputDir
Determining driver host interface for mapper->driver callback...
    [Possible callback IP address: 192.168.1.10]
    [Possible callback IP address: 127.0.0.1]
Using mapper->driver callback IP address and port: 192.168.1.10:33020
(You can override these with -driverif and -driverport.)
Memory Settings:
    mapreduce.map.java.opts:     -Xms6g -Xmx6g -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Dlog4j.defaultInitOverride=true
    Extra memory percent:        10
    mapreduce.map.memory.mb:     6758
16/12/05 10:30:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/05 10:30:03 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/12/05 10:30:03 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/12/05 10:30:03 INFO mapreduce.JobSubmitter: number of splits:3
16/12/05 10:30:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1112487908_0001
16/12/05 10:30:03 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
Job name 'H2O_59084' submitted
JobTracker job ID is 'job_local1112487908_0001'
For YARN users, logs command is 'yarn logs -applicationId application_local1112487908_0001'
Waiting for H2O cluster to come up...
16/12/05 10:30:03 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/12/05 10:30:03 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/12/05 10:30:03 INFO mapred.LocalJobRunner: Waiting for map tasks
16/12/05 10:30:03 INFO mapred.LocalJobRunner: Starting task: attempt_local1112487908_0001_m_000000_0
16/12/05 10:30:03 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/12/05 10:30:03 INFO mapred.MapTask: Processing split: water.hadoop.h2odriver$EmptySplit@7d27d9b0
POST 0: Entered run
16/12/05 10:30:03 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
POST 11: After setEmbeddedH2OConfig
16/12/05 10:30:04 INFO reflections.Reflections: Reflections took 491 ms to scan 2 urls, producing 159 keys and 1047 values 
16/12/05 10:30:04 INFO reflections.Reflections: Reflections took 420 ms to scan 2 urls, producing 119 keys and 584 values 
16/12/05 10:30:06 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/12/05 10:30:06 INFO server.AbstractConnector: Started SocketC...@0.0.0.0:54321
12-05 10:30:06.161 192.168.1.10:54321  13892  #cutor #0 INFO: ----- H2O started  -----
12-05 10:30:06.166 192.168.1.10:54321  13892  #cutor #0 INFO: Build git branch: rel-turing
12-05 10:30:06.166 192.168.1.10:54321  13892  #cutor #0 INFO: Build git hash: 34b83da423d26dfbcc0b35c72714b31e80101d49
12-05 10:30:06.166 192.168.1.10:54321  13892  #cutor #0 INFO: Build git describe: jenkins-rel-turing-8
12-05 10:30:06.166 192.168.1.10:54321  13892  #cutor #0 INFO: Build project version: 3.10.0.8 (latest version: 3.10.1.1)
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Build age: 1 month and 24 days
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Built by: 'jenkins'
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Built on: '2016-10-10 13:45:37'
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Processed H2O arguments: [-ice_root, /home/tanny/HADOOP_STAGING_DIR/mapred/local/localRunner//tanny/jobcache/job_local1112487908_0001/attempt_local1112487908_0001_m_000000_0, -hdfs_skip, -name, H2O_59084, -ga_hadoop_ver, Hadoop 2.6.0, -user_name, tanny]
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Java availableProcessors: 8
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Java heap totalMemory: 270.0 MB
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Java heap maxMemory: 455.5 MB
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Java version: Java 1.8.0_91 (from Oracle Corporation)
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: JVM launch parameters: [-Xmx1000m, -Djava.library.path=/usr/local/hadoop/lib, -Djava.net.preferIPv4Stack=true, -Dhadoop.log.dir=/usr/local/hadoop/logs, -Dhadoop.log.file=hadoop.log, -Dhadoop.home.dir=/usr/local/hadoop, -Dhadoop.id.str=tanny, -Dhadoop.root.logger=INFO,console, -Dhadoop.policy.file=hadoop-policy.xml, -Djava.net.preferIPv4Stack=true, -Xmx512m, -Dhadoop.security.logger=INFO,NullAppender]
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: OS version: Linux 3.13.0-24-generic (amd64)
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: Machine physical memory: 15.56 GB
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: X-h2o-cluster-id: 1480914004002
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: User name: 'tanny'
12-05 10:30:06.167 192.168.1.10:54321  13892  #cutor #0 INFO: IPv6 stack selected: false
12-05 10:30:06.168 192.168.1.10:54321  13892  #cutor #0 INFO: Possible IP Address: eth0 (eth0), 192.168.1.10
12-05 10:30:06.168 192.168.1.10:54321  13892  #cutor #0 INFO: Possible IP Address: lo (lo), 127.0.0.1
12-05 10:30:06.168 192.168.1.10:54321  13892  #cutor #0 INFO: Internal communication uses port: 54322
12-05 10:30:06.168 192.168.1.10:54321  13892  #cutor #0 INFO: Listening for HTTP and REST traffic on http://192.168.1.10:54321/
EmbeddedH2OConfig: notifyAboutEmbeddedWebServerIpPort called (192.168.1.10, 54321)
EmbeddedH2OConfig: fetchFlatfile called
H2O node 192.168.1.10:54321 requested flatfile
ERROR: Timed out waiting for H2O cluster to come up (120 seconds)
ERROR: (Try specifying the -timeout option to increase the waiting time limit)
Attempting to clean up hadoop job...
16/12/05 10:32:03 WARN mapred.LocalJobRunner: job_local1112487908_0001
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:449)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Killed.

----- YARN cluster metrics -----
Number of YARN worker nodes: 1

----- Nodes -----
Node: http://tanny-machine:8042 Rack: /default-rack, RUNNING, 0 containers used, 0.0 / 8.0 GB used, 0 / 8 vcores used

----- Queues -----
Queue name:            default
    Queue state:       RUNNING
    Current capacity:  0.00
    Capacity:          1.00
    Maximum capacity:  1.00
    Application count: 0

Queue 'default' approximate utilization: 0.0 / 8.0 GB used, 0 / 8 vcores used

----------------------------------------------------------------------

ERROR:   Job memory request (19.8 GB) exceeds available YARN cluster memory (8.0 GB)
WARNING: Job memory request (19.8 GB) exceeds queue available memory capacity (8.0 GB)
ERROR:   Only 1 out of the requested 3 worker containers were started due to YARN cluster resource limitations

----------------------------------------------------------------------

For YARN users, logs command is 'yarn logs -applicationId application_local1112487908_0001'

16/12/05 10:32:08 ERROR hdfs.DFSClient: Failed to close inode 16628
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/tanny/hdfsOutputDir/_temporary/0/_temporary/attempt_local1112487908_0001_m_000000_0/part-m-00000 (inode 16628): File does not exist. Holder DFSClient_NONMAPREDUCE_940153155_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3516)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3604)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3574)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:700)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.complete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:443)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2250)
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2234)
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:938)
at org.apache.hadoop.hdfs.DFSClient.closeOutputStreams(DFSClient.java:976)
at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:899)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2704)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



Thanks,
Tanmay. 

--
You received this message because you are subscribed to a topic in the Google Groups "H2O Open Source Scalable Machine Learning  - h2ostream" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2ostream/zhL9_5jriTo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2ostream+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
With Due Regards
Tanmay Saha,

Tom Kraljevic

unread,
Dec 5, 2016, 1:54:43 AM12/5/16
to Tanmay Saha, H2O Open Source Scalable Machine Learning - h2ostream

hi,

sorry but h2o doesnt run in local hadoop mode.
h2o on hadoop only runs in a "real" hadoop environment.

but you can easily run non-hadoop h2o just fine on your local machine using the plain h2o.jar, R or python packages, or using sparkling water.

tom

Sent from my iPad

Tanmay Saha

unread,
Dec 5, 2016, 2:22:51 AM12/5/16
to Tom Kraljevic, H2O Open Source Scalable Machine Learning - h2ostream
Hi,

Thanks for the response.
I am not sure what you are implying by "local hadoop mode"... This is a 3 node hadoop cluster...
Also if I may add, if I try to start a 1-node H2O cluster on top of the 3-node hadoop cluster, it works; it only fails when I try to start a >1-node H2O cluster on the hadoop cluster.

And I have already been able to set up a simple 3 node H2O cluster. I was just trying to set up a H2O cluster on top of an underlying hadoop cluster. It doesn't work?

Please tell me if \ need any more information on the same, or if I may have missed any important point here.

Tom Kraljevic

unread,
Dec 5, 2016, 7:55:24 AM12/5/16
to Tanmay Saha, H2O Open Source Scalable Machine Learning - h2ostream

> I am not sure what you are implying by "local hadoop mode”


There are three major clues in the output you sent:


1. There are lots of lines of output that refer to “mapred.LocalJobRunner”

LocalJobRunner means you are running in local mode.

2. The jobId is not a real jobId, but rather “job_local1112487908_0001”.

Notice it has “local” actually in the jobId.

3. The H2O output itself (“----- H2O started -----“) is inline in the driver stdout.

Instead of actually starting yarn containers and spawning an H2O in each yarn container (aka linux process), it’s trying to start H2O right inside the driver process itself.

This is why 1 H2O node works (kind of by luck) and >1 doesn’t work.


> ... This is a 3 node hadoop cluster...
> Also if I may add, if I try to start a 1-node H2O cluster on top of the 3-node hadoop cluster, it works; it only fails when I try to start a >1-node H2O cluster on the hadoop cluster.
>
> And I have already been able to set up a simple 3 node H2O cluster. I was just trying to set up a H2O cluster on top of an underlying hadoop cluster. It doesn't work?
>
> Please tell me if \ need any more information on the same, or if I may have missed any important point here.


Yes, you need to study how to configure your hadoop cluster so you can start a “real" yarn job.

You probably want to do this in the cloud or with virtualbox/vmware or something to be sure you’re getting multiple hosts in the picture.


Thanks,
Tom

Tanmay Saha

unread,
Dec 6, 2016, 7:08:38 AM12/6/16
to Tom Kraljevic, H2O Open Source Scalable Machine Learning - h2ostream
Hi Tom,

I am afraid, I don't understand all of what you are pointing at. I was under the impression that seeing all the expected daemons running in all the nodes was sufficient of a check to ascertain Hadoop is running in a cluster mode.
Would you be so kind enough to point me to a link or a source to understand how to start a "real" hadoop environment cluster?

Thanks,
Tanmay.

Tom Kraljevic

unread,
Dec 6, 2016, 8:42:00 AM12/6/16
to Tanmay Saha, Tom Kraljevic, H2O Open Source Scalable Machine Learning - h2ostream

i havent done it myself in years, so i'm sure its changed a lot and gotten easier since then.

the best source of info is to look at the docs from the distro vendors.

here is one example i found from a quick search.  maybe there's easier stuff out there too.

(these days all the customers we work with have full time people that manage their clusters and typically have paid support from one of the vendors, so we really never get asked this question...)

tom

Reply all
Reply to author
Forward
0 new messages