Hello everyone,
I just installed rmr2, ravro, rhdfs and plyrmr on an hadoop cluster (HDP 2.1). The first libraries seems to be working just fine but I have some issues with plyrmr. Evrything was installed following the different tutorials and every library is where it is supposed to be. I'm using the following sequence of command
> Sys.setenv(HADOOP_HOME="/usr/lib/hadoop");
> Sys.setenv(HADOOP_CMD ="/usr/lib/hadoop/bin/hadoop");
> Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming.jar");
> library(rhdfs)
> hdfs.init()
> library(plyrmr)
> bind.cols(mtcars, carb.per.cyl = carb/cyl) # works just fine
> to.dfs(mtcars, output="/tmp/mtcars") # works just fine
> bind.cols(input("/tmp/mtcars"), carb.per.cyl = carb/cyl) # error...
14/12/08 14:04:14 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 360 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://dhadlx21.dns21.socgen:8020/tmp/file3e904e626894' to trash at: hdfs://dhadlx21.dns21.socgen:8020/user/haddadm/.Trash/Current
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.4.0.2.1.2.0-402.jar] /tmp/streamjob7146534454889805371.jar tmpDir=null
14/12/08 14:04:17 INFO client.RMProxy: Connecting to ResourceManager at dhadlx21.dns21.socgen/192.88.64.155:8050
14/12/08 14:04:18 INFO client.RMProxy: Connecting to ResourceManager at dhadlx21.dns21.socgen/192.88.64.155:8050
14/12/08 14:04:18 INFO mapred.FileInputFormat: Total input paths to process : 1
14/12/08 14:04:18 INFO mapreduce.JobSubmitter: number of splits:2
14/12/08 14:04:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1417777962687_0661
14/12/08 14:04:19 INFO impl.YarnClientImpl: Submitted application application_1417777962687_0661
14/12/08 14:04:19 INFO mapreduce.Job: The url to track the job: http://dhadlx21.dns21.socgen:8088/proxy/application_1417777962687_0661/
14/12/08 14:04:19 INFO mapreduce.Job: Running job: job_1417777962687_0661
14/12/08 14:04:25 INFO mapreduce.Job: Job job_1417777962687_0661 running in uber mode : false
14/12/08 14:04:25 INFO mapreduce.Job: map 0% reduce 0%
14/12/08 14:04:29 INFO mapreduce.Job: Task Id : attempt_1417777962687_0661_m_000001_0, Status : FAILED
Error: Java heap space
14/12/08 14:04:29 INFO mapreduce.Job: Task Id : attempt_1417777962687_0661_m_000000_0, Status : FAILED
Error: Java heap space
14/12/08 14:04:32 INFO mapreduce.Job: Task Id : attempt_1417777962687_0661_m_000000_1, Status : FAILED
Error: Java heap space
14/12/08 14:04:32 INFO mapreduce.Job: Task Id : attempt_1417777962687_0661_m_000001_1, Status : FAILED
Error: Java heap space
14/12/08 14:04:36 INFO mapreduce.Job: Task Id : attempt_1417777962687_0661_m_000000_2, Status : FAILED
Error: Java heap space
14/12/08 14:04:37 INFO mapreduce.Job: Task Id : attempt_1417777962687_0661_m_000001_2, Status : FAILED
Error: Java heap space
14/12/08 14:04:41 INFO mapreduce.Job: map 100% reduce 100%
14/12/08 14:04:41 INFO mapreduce.Job: Job job_1417777962687_0661 failed with state FAILED due to: Task failed task_1417777962687_0661_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
14/12/08 14:04:41 INFO mapreduce.Job: Counters: 13
Job Counters
Failed map tasks=7
Killed map tasks=1
Launched map tasks=8
Other local map tasks=6
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=41280
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=20640
Total vcore-seconds taken by all map tasks=20640
Total megabyte-seconds taken by all map tasks=84541440
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
14/12/08 14:04:41 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
In addition: Warning message:
In save(list = obj.names, file = fname, envir = envir) :
'package:plyrmr' may not be available when loading
I think that I should for R to use more java heap space but I'm not sure how to do that. Any idea about that?
Thanks a lot.