Error in FUN(X[[2L]], …) : Sorry, parameter type `NA' is ambiguous or not supported

174 views
Skip to first unread message

Ohelig Pojke

unread,
Oct 30, 2014, 5:07:11 PM10/30/14
to rha...@googlegroups.com

I have copied from https://stackoverflow.com/questions/25237452/error-in-funx2l-sorry-parameter-type-na-is-ambiguous-or-not-supp

I'm facing same problem. Hoping i'll find solution here in official group.

I am trying the below R script to built logistic regression model using RHadoop (rmr2, rhdfs packages) on an HDFS data file located at "hdfs://:/somnath/merged_train/part-m-00000" and then testing the model using a test HDFS data file at "hdfs://:/somnath/merged_test/part-m-00000".

We are using CDH4 distribution with Yarn/MR2 running parallel to MR1 supported by Hadoop-0.20. And using the hadoop-0.20 mapreduce and hdfs versions to run the below RHadoop script as Sys.setenv commands shown below.

However, whenever I am running the script, I am facing the below error with very little luck to bypass it. I would appreciate if somebody point me to the possible cause of this error which seems to be due to wrong way of lapply call in R without handling NA arguments.

[root@kkws029 logreg_template]# Rscript logreg_test.R
Loading required package: methods
Loading required package: rJava

HADOOP_CMD=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop

Be sure to run hdfs.init()
14/08/11 11:59:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
NULL
NULL
[1] "Starting to build logistic regression model..."
Error in FUN(X[[2L]], ...) :
  Sorry, parameter type `NA' is ambiguous or not supported.
Calls: logistic.regression ... .jrcall -> ._java_valid_objects_list -> lapply -> FUN
Execution halted

Below is my R-script :

#!/usr/bin/env Rscript


Sys.setenv(HADOOP_HOME="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce")
Sys.setenv(HADOOP_CMD="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin/hadoop")


Sys.setenv(HADOOP_BIN="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/bin");
Sys.setenv(HADOOP_CONF_DIR="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/conf");
Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar")
Sys.setenv(LD_LIBRARY_PATH="/usr/lib64/R/library/rJava/jri")


library(rmr2)
library(rhdfs)

.jinit()
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-auth-2.0.0-cdh4.3.0.jar")
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.3.0.jar")
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hadoop/hadoop-common-2.0.0-cdh4.3.0.jar")


hdfs.init()
rmr.options( backend = "hadoop", hdfs.tempdir = "/tmp" )

logistic.regression =
        function(hdfsFilePath, iterations, dims, alpha) {
                r.file <- hdfs.file(hdfsFilePath,"r")

                #hdfsFilePath <- to.dfs(hdfsFilePath)

                lr.map =
                  function(.,M) {
                    Y = M[,1]
                    X = M[,-1]
                    keyval(
                        1,
                        Y * X *
                          g(-Y * as.numeric(X %*% t(plane))))}

                lr.reduce =
                  function(k, Z)
                    keyval(k, t(as.matrix(apply(Z,2,sum))))

                plane = t(rep(0, dims))
                g = function(z) 1/(1 + exp(-z))
                for (i in 1:iterations) {
                        gradient =
                                values(
                                        from.dfs(
                                          mapreduce(
                                            input = as.matrix(hdfs.read.text.file(r.file)),
                                            #input = from.dfs(hdfsFilePath),
                                            map = function(.,M) {
                                                Y = M[,1]
                                                X = M[,-1]
                                                keyval(
                                                        1,
                                                        Y * X *
                                                                g(-Y * as.numeric(X %*% t(plane))))},
                                            reduce = lr.reduce,
                                            combine = T)))
                        plane = plane + alpha * gradient

                        #trace(print(plane),quote(browser()))
                 }
                return(plane) }

#validate logistic regression
logistic.regression.test =
        function(hdfsFilePath, weight) {
                r.file <- hdfs.file(hdfsFilePath,"r")
                lr.test.map =
                        function(.,M) {
                          keyval(
                             1,
                             lapply(as.numeric(M[,-1] %*% t(weight)),function(z) 1/(1 + exp(-z))))}

                probabilities =
                     values(
                             from.dfs(
                                mapreduce(
                                  input = as.matrix(hdfs.read.text.file(r.file)),
                                  map = function(.,M) {
                                        keyval(
                                                1,
                                                lapply(as.numeric(M[,-1] %*% t(weight)), function(z) 1/(1 + exp(-z))))}
                        )))
        return(probabilities) }

out = list()
prob = list()
rmr.options( backend = "hadoop", hdfs.tempdir = "/tmp" )

print("Starting to build logistic regression model...")

  out[['hadoop']] =
## @knitr logistic.regression-run
    logistic.regression(
       "hdfs://XX.XX.XX.XX:NNNN/somnath/merged_train/part-m-00000", 5, 5, 0.05)
  write.csv(as.vector(out[['hadoop']]), "/root/somnath/logreg_data/weights.csv")


print("Building logistic regression model completed.")


  prob[['hadoop']] =
    logistic.regression.test(
       "hdfs://XX.XX.XX.XX:NNNN/somnath/merged_test/part-m-00000", out[['hadoop']])
  write.csv(as.vector(prob[['hadoop']]), "/root/somnath/logreg_data/probabilities.csv")


stopifnot(
  isTRUE(all.equal(out[['local']], out[['hadoop']], tolerance = 1E-7)))

NOTE: I have set following environment variables for HADOOP as follows in root ~/.bash_profile

# Hadoop-specific environment and commands

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce
export HADOOP2_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
#export HADOOP_CMD=${HADOOP_HOME}/bin/hadoop
#export HADOOP_STREAMING=/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming.jar
#export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export LD_LIBRARY_PATH=${R_HOME}/library/rJava/jri #:${HADOOP_HOME}/../hadoop-0.20-mapreduce/lib/native/Linux-amd64-64


# Add hadoop-common jar to classpath for PlatformName and FsShell classes; Add hadoop-auth and hadoop-hdfs jars

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${HADOOP2_HOME}/client-0.20/* #:${HADOOP_HOME}/*.jar:${HADOOP_HOME}/lib/*.jar:${HADOOP2_HOME}/hadoop-common-2.0.0-cdh4.3.0.jar:${HADOOP_HOME}/../hadoop-hdfs/hadoop-hdfs-2.0.0-cdh4.3.0.jar:${HADOOP_HOME}/hadoop-auth-2.0.0-cdh4.3.0.jar:$HADOOP_STREAMING

PATH=$PATH:$R_HOME/bin:$JAVA_HOME/bin:$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/mahout:/opt/cloudera/parcels/CDH/lib/hadoop:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce:/var/lib/storm-0.9.0-rc2/lib #:$HADOOP_CMD:$HADOOP_STREAMING:$HADOOP_CONF_DIR

export PATH

SAMPLE TRAIN DATASET

0,-4.418,-2.0658,1.2193,-0.68097,0.90894
0,-2.7466,-2.9374,-0.87562,-0.65177,0.53182
0,-0.98846,0.66962,-0.20736,-0.2895,0.002313
0,-2.277,2.492,0.47936,0.4673,-1.5075
0,-5.4391,1.8447,-1.6843,1.465,-0.71099
0,-0.12843,0.066968,0.02678,-0.040851,0.0075902
0,-2.0796,2.4739,0.23472,0.86423,0.45094
0,-3.1796,-0.15429,1.4814,-0.94316,-0.52754
0,-1.9429,1.3111,0.31921,-1.202,0.8552
0,-2.3768,1.9301,0.096005,-0.51971,-0.17544
0,-2.0336,1.991,0.82029,0.018232,-0.33222
0,-3.6388,-3.2903,-2.1076,0.73341,0.75986
0,-2.9146,0.53163,0.49182,-0.38562,-0.76436
0,-3.3816,1.0954,0.25552,-0.11564,-0.01912
0,-1.7374,-0.63031,-0.6122,0.022664,0.23399
0,-1.312,-0.54935,-0.68508,-0.072985,0.036481
0,-3.991,0.55278,0.38666,-0.56128,-0.6748
....

SAMPLE TEST DATASET

0,-0.66666,0.21439,0.041861,-0.12996,-0.36305
0,-1.3412,-1.1629,-0.029398,-0.13513,0.49758
0,-2.6776,-0.40194,-0.97336,-1.3355,0.73202
0,-6.0203,-0.61477,1.5248,1.9967,2.697
0,-4.5663,-1.6632,-1.2893,-1.7972,1.4367
0,-7.2339,2.4589,0.61349,0.39094,2.19
0,-4.5683,-1.3066,1.1006,-2.8084,0.3172
0,-4.1223,-1.5059,1.3063,-0.18935,1.177
0,-3.7135,-0.26283,1.6961,-1.3499,-0.18553
0,-2.7993,1.2308,-0.42244,-0.50713,-0.3522
0,-3.0541,1.8173,0.96789,-0.25138,-0.36246
0,-1.1798,1.0478,-0.29168,-0.26261,-0.21527
0,-2.6459,2.9387,0.14833,0.24159,-2.4811
0,-3.1672,2.479,-1.2103,-0.48726,0.30974
1,-0.90706,1.0157,0.32953,-0.11648,-0.47386
...
asked Aug 11 at 6:59

    
This is quite convoluted code. It will take a highly motivated person to go through all this. Please break it down to the offending line. –  Roman Luštrik Aug 11 at 8:35
    
@RomanLuštrik: Since the error is specific to .jrcall in rJava package which I am not calling directly in my code, I am assuming that the problem is in mapreduce(input = as.matrix(hdfs.read.text.file(r.file)),map = function(.,M) { keyval(1,lapply(as.numeric(M[,-1] %*% t(weight)), function(z) 1/(1 + exp(-z))))} ))) where the input to map is a matrix M read from a file stored in HDFS. Most probably the call to lapply may not be getting the expected input from the matrix. I have added sample train and test data inputs from HDFS files to explain better –  somnathchakrabarti Aug 11 at 8:57
    
@RomanLuštrik: I tried running the below input command from R console and received the same error. So the way I am reading HDFS file as matrix is giving the error: > input = as.matrix(hdfs.read.text.file(r.file)) Error in FUN(X[[2L]], ...) : Sorry, parameter type `NA' is ambiguous or not supported. –  somnathchakrabarti Aug 11 at 10:17
    
In R, traceback(), debug(), debugonce() and browser() can yield insightful. –  Roman Luštrik Aug 11 at 10:25
    
yes I replicated the code line-by-line in R console to understand which specific line was giving the error and found out that the input line is giving the error message. If you can provide some insights on how to read a HDFS file as input matrix that would be helpful. –  somnathchakrabarti Aug 11 at 10:27

Antonio Piccolboni

unread,
Oct 30, 2014, 7:01:10 PM10/30/14
to rha...@googlegroups.com
Hi,
you apparently took the logistic regression example and modified it so that it doesn't work. I would recommend to undo all your changes, and it will work again. Regards


Antonio


</d
...
Reply all
Reply to author
Forward
0 new messages