Performance issue of hdfs.read() of rhdfs 1.0.8

145 views
Skip to first unread message

James Chang

unread,
May 23, 2014, 5:18:25 AM5/23/14
to rha...@googlegroups.com
Hi ,

    I found using hdfs.read() come with rhdfs 1.0.8 to read CSV file on HDFS has very poor performance issue.
My cluster (Hortonworks HDP 2.0.6.0 cluster with 2 Name Node, 4 Data Node and 1 RStudio Web Server)

    I use the following code to test the throughput of hdfs.read()
===================================================================================
Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-101.jar")
Sys.setenv(HADOOP_COMMON_LIB_NATIVE_DIR="/usr/lib/hadoop/lib/native/")
library(rmr2);
library(rhdfs);
hdfs.init();
f = hdfs.file("/bigdata/rawdata/201312.csv","r",buffersize=104857600);
start.time <- Sys.time();
repeat {
  m = hdfs.read(f)
  duration <- as.numeric(difftime(Sys.time(), start.time, unit = "secs"))
  print(length(m) /  duration)
  start.time <- Sys.time()
}

===================================================================================
The average result is between 450KB/sec ~ 465KB/sec

On the same server (RStudio Web Server), I use command "hadoop fs -get /bigdata/rawdata/201312.csv"
The result as following :
[root@RStudio ~]# time hadoop fs -get /tmp/TMC201303.csv /tmp/work/

real    2m38.491s
user    0m8.462s
sys    0m13.465s
[root@RStudio ~]# ls -al /tmp/201303.csv
-rw-r--r-- 1 root root 8596758123 2014-05-23 09:54 /tmp/201303.csv

The average throughput is 46640KB/sec

    I am curious about the reason why performance down so much when use hdfs.read() come with rhdfs 1.0.8 ?

    Could anyone has encountered the same issue and know how to let the performance become better?

Thanks in advance!
James

Antonio Piccolboni

unread,
May 23, 2014, 1:37:31 PM5/23/14
to RHadoop Google Group
Hi James,
would you mind repeating your test run after and Rprof() call, then look for the Rprof.out file and share it (I guess send it as an attachment in an email to rha...@revolutionanalytics.com)? Thanks


Antonio


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Chang

unread,
May 26, 2014, 11:35:54 PM5/26/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

     I am willing to do Rprof(). I should modify my code as following? Am I right?
==============================
=====================================================
Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-101.jar")
Sys.setenv(HADOOP_COMMON_LIB_NATIVE_DIR="/usr/lib/hadoop/lib/native/")
library(rmr2);
library(rhdfs);
hdfs.init();
f = hdfs.file("/bigdata/rawdata/201312.csv","r",buffersize=104857600);
start.time <- Sys.time();
repeat {
  m = hdfs.read(f)
  duration <- as.numeric(difftime(Sys.time(), start.time, unit = "secs"))
  print(length(m) /  duration)
  start.time <- Sys.time()
}
Rprof()
===================================================================================

Thanks in advance!



Antonio Piccolboni於 2014年5月24日星期六UTC+8上午1時37分31秒寫道:

James Chang

unread,
May 27, 2014, 12:50:49 AM5/27/14
to rha...@googlegroups.com, ant...@piccolboni.info, rha...@revolutionanalytics.com
Hi Antonio,

    I use a 500MB csv file for profiling my code,
following is the result

> Rprof()
> summaryRprof(tmp2)
$by.self
                                    self.time self.pct total.time total.pct
".Call"                               1316.80    75.49    1316.94     75.49
"raw"                                  401.62    23.02     401.62     23.02
"hdfs.read"                              3.76     0.22    1742.54     99.89
".External"                              3.56     0.20       3.56      0.20
"standardGeneric"                        1.00     0.06       7.96      0.46
"FUN"                                    0.92     0.05       3.70      0.21
".jcall"                                 0.86     0.05      17.20      0.99
"match"                                  0.86     0.05       1.96      0.11
"initialize"                             0.70     0.04       6.88      0.39
".identC"                                0.62     0.04       0.88      0.05
"validObject"                            0.52     0.03       4.04      0.23
"possibleExtends"                        0.52     0.03       1.38      0.08
"is"                                     0.50     0.03       2.92      0.17
"el"                                     0.50     0.03       0.50      0.03
"getClassDef"                            0.48     0.03       1.16      0.07
"structure"                              0.48     0.03       0.68      0.04
"ls"                                     0.44     0.03       1.02      0.06
":"                                      0.40     0.02       0.40      0.02
"tryCatch"                               0.36     0.02      20.14      1.15
".getClassFromCache"                     0.36     0.02       0.88      0.05
"lapply"                                 0.34     0.02       4.02      0.23
"rev"                                    0.34     0.02       0.36      0.02
"doTryCatch"                             0.32     0.02      20.02      1.15
"gsub"                                   0.32     0.02       0.32      0.02
"$"                                      0.30     0.02       2.64      0.15
"substr"                                 0.30     0.02       0.46      0.03
"anyDuplicated"                          0.30     0.02       0.36      0.02
"seq_along"                              0.28     0.02       0.28      0.02
"inherits"                               0.26     0.01       1.96      0.11
"assign"                                 0.26     0.01       0.26      0.01
"names<-"                                0.26     0.01       0.26      0.01
"regexpr"                                0.26     0.01       0.26      0.01
".jcast"                                 0.24     0.01       1.08      0.06
"as.POSIXct"                             0.24     0.01       0.74      0.04
"=="                                     0.22     0.01       0.22      0.01
".jrcall"                                0.18     0.01      17.50      1.00
"identical"                              0.18     0.01       0.40      0.02
"loadMethod"                             0.16     0.01       0.98      0.06
"getClass"                               0.16     0.01       0.36      0.02
"deparse"                                0.16     0.01       0.34      0.02
".jarray"                                0.14     0.01     766.48     43.94
"isJavaArray"                            0.14     0.01       1.30      0.07
"environment"                            0.14     0.01       0.14      0.01
"tryCatchList"                           0.12     0.01      20.06      1.15
"new"                                    0.12     0.01       7.82      0.45
"elNamed"                                0.12     0.01       0.56      0.03
"c"                                      0.12     0.01       0.12      0.01
"class"                                  0.12     0.01       0.12      0.01
"tryCatchOne"                            0.10     0.01      20.02      1.15
"getNativeSymbolInfo"                    0.10     0.01       1.96      0.11
"try"                                    0.10     0.01       1.16      0.07
"-"                                      0.10     0.01       0.10      0.01
"character"                              0.10     0.01       0.10      0.01
"is.list"                                0.10     0.01       0.10      0.01
"print.default"                          0.10     0.01       0.10      0.01
".classEnv"                              0.08     0.00       1.38      0.08
"%in%"                                   0.08     0.00       1.34      0.08
".requirePackage"                        0.08     0.00       1.30      0.07
"match.arg"                              0.08     0.00       0.50      0.03
"@<-"                                    0.08     0.00       0.08      0.00
"is.na"                                  0.08     0.00       0.08      0.00
"length"                                 0.08     0.00       0.08      0.00
"names"                                  0.08     0.00       0.08      0.00
"parent.frame"                           0.08     0.00       0.08      0.00
"tojniSignature"                         0.06     0.00       0.62      0.04
"isPrimitiveTypeName"                    0.06     0.00       0.18      0.01
".jcheck"                                0.06     0.00       0.12      0.01
"as.double.difftime"                     0.06     0.00       0.10      0.01
"getNamespace"                           0.06     0.00       0.10      0.01
"allNames"                               0.06     0.00       0.06      0.00
"any"                                    0.06     0.00       0.06      0.00
"anyDuplicated.default"                  0.06     0.00       0.06      0.00
"as.integer"                             0.06     0.00       0.06      0.00
"match.fun"                              0.06     0.00       0.06      0.00
"nchar"                                  0.06     0.00       0.06      0.00
"pmatch"                                 0.06     0.00       0.06      0.00
".jnew"                                  0.04     0.00       1.64      0.09
"difftime"                               0.04     0.00       1.40      0.08
".jidenticalRef"                         0.04     0.00       0.48      0.03
"isJavaArraySignature"                   0.04     0.00       0.40      0.02
"._java_valid_objects_list"              0.04     0.00       0.20      0.01
"as.numeric"                             0.04     0.00       0.14      0.01
".deparseOpts"                           0.04     0.00       0.12      0.01
"formals"                                0.04     0.00       0.06      0.00
"as.name"                                0.04     0.00       0.04      0.00
"as.vector"                              0.04     0.00       0.04      0.00
"attributes"                             0.04     0.00       0.04      0.00
"slot<-"                                 0.04     0.00       0.04      0.00
"hasField"                               0.02     0.00       1.62      0.09
"hasJavaMethod"                          0.02     0.00       0.66      0.04
".POSIXct"                               0.02     0.00       0.58      0.03
"isPrimitiveArraySignature"              0.02     0.00       0.30      0.02
"print"                                  0.02     0.00       0.14      0.01
"tojni"                                  0.02     0.00       0.08      0.00
"slot"                                   0.02     0.00       0.06      0.00
"._must_be_character_of_length_one"      0.02     0.00       0.04      0.00
"getDim"                                 0.02     0.00       0.04      0.00
"!"                                      0.02     0.00       0.02      0.00
"!="                                     0.02     0.00       0.02      0.00
"/"                                      0.02     0.00       0.02      0.00
">"                                      0.02     0.00       0.02      0.00
"all"                                    0.02     0.00       0.02      0.00
"as.POSIXct.default"                     0.02     0.00       0.02      0.00
"attr"                                   0.02     0.00       0.02      0.00
"dim"                                    0.02     0.00       0.02      0.00
"list"                                   0.02     0.00       0.02      0.00
"mode"                                   0.02     0.00       0.02      0.00
"rev.default"                            0.02     0.00       0.02      0.00
"sprintf"                                0.02     0.00       0.02      0.00
"sum"                                    0.02     0.00       0.02      0.00
"sys.function"                           0.02     0.00       0.02      0.00

$by.total
                                    total.time total.pct self.time self.pct
"hdfs.read"                            1742.54     99.89      3.76     0.22
".Call"                                1316.94     75.49   1316.80    75.49
".jarray"                               766.48     43.94      0.14     0.01
".jevalArray"                           551.36     31.61      0.00     0.00
"raw"                                   401.62     23.02    401.62    23.02
"tryCatch"                               20.14      1.15      0.36     0.02
"tryCatchList"                           20.06      1.15      0.12     0.01
"doTryCatch"                             20.02      1.15      0.32     0.02
"tryCatchOne"                            20.02      1.15      0.10     0.01
".jrcall"                                17.50      1.00      0.18     0.01
"<Anonymous>"                            17.50      1.00      0.00     0.00
".jcall"                                 17.20      0.99      0.86     0.05
"standardGeneric"                         7.96      0.46      1.00     0.06
"new"                                     7.82      0.45      0.12     0.01
"initialize"                              6.88      0.39      0.70     0.04
"validObject"                             4.04      0.23      0.52     0.03
"lapply"                                  4.02      0.23      0.34     0.02
"FUN"                                     3.70      0.21      0.92     0.05
".External"                               3.56      0.20      3.56     0.20
"is"                                      2.92      0.17      0.50     0.03
".jsimplify"                              2.68      0.15      0.00     0.00
"$"                                       2.64      0.15      0.30     0.02
"._java_class_list"                       2.30      0.13      0.00     0.00
".jclass"                                 2.28      0.13      0.00     0.00
"match"                                   1.96      0.11      0.86     0.05
"inherits"                                1.96      0.11      0.26     0.01
"getNativeSymbolInfo"                     1.96      0.11      0.10     0.01
".jnew"                                   1.64      0.09      0.04     0.00
"hasField"                                1.62      0.09      0.02     0.00
"difftime"                                1.40      0.08      0.04     0.00
"possibleExtends"                         1.38      0.08      0.52     0.03
".classEnv"                               1.38      0.08      0.08     0.00
"%in%"                                    1.34      0.08      0.08     0.00
"isJavaArray"                             1.30      0.07      0.14     0.01
".requirePackage"                         1.30      0.07      0.08     0.00
"getClassDef"                             1.16      0.07      0.48     0.03
"try"                                     1.16      0.07      0.10     0.01
".jcast"                                  1.08      0.06      0.24     0.01
"ls"                                      1.02      0.06      0.44     0.03
"loadedNamespaces"                        1.02      0.06      0.00     0.00
"loadMethod"                              0.98      0.06      0.16     0.01
".identC"                                 0.88      0.05      0.62     0.04
".getClassFromCache"                      0.88      0.05      0.36     0.02
"is.jnull"                                0.78      0.04      0.00     0.00
"as.POSIXct"                              0.74      0.04      0.24     0.01
"structure"                               0.68      0.04      0.48     0.03
"hasJavaMethod"                           0.66      0.04      0.02     0.00
"tojniSignature"                          0.62      0.04      0.06     0.00
".POSIXct"                                0.58      0.03      0.02     0.00
"Sys.time"                                0.58      0.03      0.00     0.00
"elNamed"                                 0.56      0.03      0.12     0.01
"el"                                      0.50      0.03      0.50     0.03
"match.arg"                               0.50      0.03      0.08     0.00
".jidenticalRef"                          0.48      0.03      0.04     0.00
"substr"                                  0.46      0.03      0.30     0.02
":"                                       0.40      0.02      0.40     0.02
"identical"                               0.40      0.02      0.18     0.01
"isJavaArraySignature"                    0.40      0.02      0.04     0.00
"rev"                                     0.36      0.02      0.34     0.02
"anyDuplicated"                           0.36      0.02      0.30     0.02
"getClass"                                0.36      0.02      0.16     0.01
"deparse"                                 0.34      0.02      0.16     0.01
"eval"                                    0.34      0.02      0.00     0.00
"gsub"                                    0.32      0.02      0.32     0.02
"isPrimitiveArraySignature"               0.30      0.02      0.02     0.00
"seq_along"                               0.28      0.02      0.28     0.02
"checkAtAssignment"                       0.28      0.02      0.00     0.00
"assign"                                  0.26      0.01      0.26     0.01
"names<-"                                 0.26      0.01      0.26     0.01
"regexpr"                                 0.26      0.01      0.26     0.01
"=="                                      0.22      0.01      0.22     0.01
"._java_valid_objects_list"               0.20      0.01      0.04     0.00
"isPrimitiveTypeName"                     0.18      0.01      0.06     0.00
"environment"                             0.14      0.01      0.14     0.01
"as.numeric"                              0.14      0.01      0.04     0.00
"print"                                   0.14      0.01      0.02     0.00
"c"                                       0.12      0.01      0.12     0.01
"class"                                   0.12      0.01      0.12     0.01
".jcheck"                                 0.12      0.01      0.06     0.00
".deparseOpts"                            0.12      0.01      0.04     0.00
".difftime"                               0.12      0.01      0.00     0.00
"hdfs.file"                               0.12      0.01      0.00     0.00
"-"                                       0.10      0.01      0.10     0.01
"character"                               0.10      0.01      0.10     0.01
"is.list"                                 0.10      0.01      0.10     0.01
"print.default"                           0.10      0.01      0.10     0.01
"as.double.difftime"                      0.10      0.01      0.06     0.00
"getNamespace"                            0.10      0.01      0.06     0.00
"@<-"                                     0.08      0.00      0.08     0.00
"is.na"                                   0.08      0.00      0.08     0.00
"length"                                  0.08      0.00      0.08     0.00
"names"                                   0.08      0.00      0.08     0.00
"parent.frame"                            0.08      0.00      0.08     0.00
"tojni"                                   0.08      0.00      0.02     0.00
"allNames"                                0.06      0.00      0.06     0.00
"any"                                     0.06      0.00      0.06     0.00
"anyDuplicated.default"                   0.06      0.00      0.06     0.00
"as.integer"                              0.06      0.00      0.06     0.00
"match.fun"                               0.06      0.00      0.06     0.00
"nchar"                                   0.06      0.00      0.06     0.00
"pmatch"                                  0.06      0.00      0.06     0.00
"formals"                                 0.06      0.00      0.04     0.00
"slot"                                    0.06      0.00      0.02     0.00
"as.name"                                 0.04      0.00      0.04     0.00
"as.vector"                               0.04      0.00      0.04     0.00
"attributes"                              0.04      0.00      0.04     0.00
"slot<-"                                  0.04      0.00      0.04     0.00
"._must_be_character_of_length_one"       0.04      0.00      0.02     0.00
"getDim"                                  0.04      0.00      0.02     0.00
"isArraySignature"                        0.04      0.00      0.00     0.00
"!"                                       0.02      0.00      0.02     0.00
"!="                                      0.02      0.00      0.02     0.00
"/"                                       0.02      0.00      0.02     0.00
">"                                       0.02      0.00      0.02     0.00
"all"                                     0.02      0.00      0.02     0.00
"as.POSIXct.default"                      0.02      0.00      0.02     0.00
"attr"                                    0.02      0.00      0.02     0.00
"dim"                                     0.02      0.00      0.02     0.00
"list"                                    0.02      0.00      0.02     0.00
"mode"                                    0.02      0.00      0.02     0.00
"rev.default"                             0.02      0.00      0.02     0.00
"sprintf"                                 0.02      0.00      0.02     0.00
"sum"                                     0.02      0.00      0.02     0.00
"sys.function"                            0.02      0.00      0.02     0.00
"._isPrimitiveReference"                  0.02      0.00      0.00     0.00
"isTRUE"                                  0.02      0.00      0.00     0.00

$sample.interval
[1] 0.02

$sampling.time
[1] 1744.44



Antonio Piccolboni於 2014年5月24日星期六UTC+8上午1時37分31秒寫道:
Rprof.out

Antonio Piccolboni

unread,
May 27, 2014, 3:56:57 AM5/27/14
to RHadoop Google Group

Thanks,  this is valuable data. I can't work on it right away but I'll get back to you as soon as I know something.

James Chang

unread,
May 28, 2014, 9:54:31 PM5/28/14
to rha...@googlegroups.com, ant...@piccolboni.info
Hi Antonio,

    I found two warning message after execute hdfs,init(),
> hdfs.init()
14/05/29 09:46:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/29 09:46:08 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

    I have set the following environment variable in my R program

> Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop")
> Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-101.jar")
> Sys.setenv(HADOOP_COMMON_LIB_NATIVE_DIR="/usr/lib/hadoop/lib/native/")

   Could it be the root cause?

Best Regards,
James

Antonio Piccolboni於 2014年5月27日星期二UTC+8下午3時56分57秒寫道:

Thanks,  this is valuable hi data. I can't work on it right away but I'll get back to you as soon as I know something.

Antonio Piccolboni

unread,
Jun 16, 2014, 2:22:10 PM6/16/14
to rha...@googlegroups.com, ant...@piccolboni.info


On Wednesday, May 28, 2014 6:54:31 PM UTC-7, James Chang wrote:
Hi Antonio,

    I found two warning message after execute hdfs,init(),
> hdfs.init()
14/05/29 09:46:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/29 09:46:08 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

    I have set the following environment variable in my R program
> Sys.setenv(HADOOP_CMD="/usr/lib/hadoop/bin/hadoop")
> Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-101.jar")
> Sys.setenv(HADOOP_COMMON_LIB_NATIVE_DIR="/usr/lib/hadoop/lib/native/")

   Could it be the root cause?

 Could be related, but not my first hypothesis for major performance issues. I took a look at the data attached and it doesn't look like the data in the tmp2 file you call RprofSummary on. I need that file if possible. Could you please check? Thanks


Antonio

 
"<Anonymous>"    &nb
...

Antonio Piccolboni

unread,
Aug 27, 2014, 3:20:19 PM8/27/14
to RHadoop Google Group
My bad, the file looks fine

Antonio Piccolboni

unread,
Aug 27, 2014, 4:01:37 PM8/27/14
to rha...@googlegroups.com, ant...@piccolboni.info
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages