Performance issue with simple RMR2 test in prod Vs dev

Arun kv

unread,

Mar 27, 2015, 7:27:22 PM3/27/15

to rha...@googlegroups.com

Hi All,

I'm having performance time difference with below Rscript in Production and development environments.Please let me know if any Memory settings used inside YARN container for R script or what are the areas needs to be focused for tuning a rmr job please provide the references if any for the same.

Distribution:-HDP2.1 version

Please find attached rmr job logs in both the environments.

RMR Benchmark script:

Sys.setenv(HADOOP_CMD="/usr/bin/hadoop",

HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming.jar")

library(rmr2)

library(forecast)

small.ints <- to.dfs(1:100)

rmr.options(backend.parameters=list())

out <-

mapreduce(small.ints,map=function(k,v)keyval(v,v),reduce=function(k,vv){ets(USAccDeaths);keyval(k,vv)},backend.par

ameters=list(hadoop=list(D="mapreduce.job.reduces=36")))

Please let me know if you need more information.

Thanks

Arun

dev_rmr.dat

prod_rmr.dat

Antonio Piccolboni

unread,

Mar 27, 2015, 7:34:10 PM3/27/15

to RHadoop Google Group

I don't know what ets does so I am not in a position to be able to comment. I wrote a manual page that you don't mention having read, so it's help(hadoop.settings). You simply wiped out the defaults, so you must know a few things more than I do.

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arun kv

unread,

Mar 28, 2015, 8:13:03 AM3/28/15

to rha...@googlegroups.com

Hi,

What are the differences between rmr2 3.2.0 and 3.2.1 library versions.

Will there be any differences of execution times of R program running on 3.2.0 will have different job completion times compared to the same program running on 3.2.1 version?.

Does this impact the performance ?.

Thanks in Advance

Arun

Antonio Piccolboni

unread,

Mar 28, 2015, 2:53:03 PM3/28/15

to rha...@googlegroups.com

I doubt that a bugfix version would be remarkably faster or slower than the previous one. You can always try your specific use case with each of them or analyze the differences. Check out the source and enter

git diff 3.2.0..3.2.1

Reply all

Reply to author

Forward