sandy
unread,Oct 24, 2011, 10:28:57 AM10/24/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Bangalore R Users - BRU
I was trying to apply lm method for data . data and related map reduce
script i am giving below and i am seeing following error in when i am
running the script.
Error Message:-
-------------------------
Error in `[[<-.data.frame`(`*tmp*`, i, value = c(177L, 272L, 39L,
177L, :
replacement has 5076 rows, data has 141
R ERROR END
===========
at org.godhuli.rhipe.RHMRHelper
$MRErrorThread.run(RHMRHelper.java:391)
at
org.godhuli.rhipe.RHMRHelper.checkOuterrThreadsThrowable(RHMRHelper.java:
236)
at org.godhuli.rhipe.RHMRReducer.run(RHMRReducer.java:68)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
408)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
R script:-
-------------#! /usr/bin/env Rscript
library(Rhipe)
rhinit(TRUE, TRUE)
map <- expression({
# For each input record, parse out required fields and output new
record:
extractDeptDelays = function(line) {
fields <- unlist(strsplit(line, "\\,"))
valueline<-paste(line,",","\n")
rhcollect(fields[1],valueline)
}
# Process each record in map input:
lapply(map.values, extractDeptDelays)
})
reduce <- expression(
pre = {
delays <- numeric(0)
},
reduce = {
line <- c(line,reduce.values)
},
post = {
line<-unlist(strsplit(paste(reduce.values), "\n"))
countline<-length(line)
word<-unlist(strsplit(line[1],","))
col<-length(word)
clustermat<-matrix(0, nrow = countline, ncol = col-1)
colval<-NULL
for(i in 1:length(line)){
words<-NULL
words<-unlist(strsplit(line[i],","))
colval<-NULL
wordlen<-length(words)-1
for(j in 1:wordlen){
colval<-cbind(colval,words[j])
}
clustermat[i,]=colval
}
clustmatdf= data.frame()
clustmatdf = clustermat
z<-NULL
for(k in 3:length(clustmatdf[1,]))
{
z<-cbind(z,clustmatdf[,k])
}
x<-c(clustmatdf[,2])
reg_cluster<-lm(x~z)
rhcollect(reduce.key,paste(length(words),length(line),length(z[1,]),length(z[,
1]),length(clustmatdf),length(clustmatdf),clustmatdf[1,2],x[1]))
}
)
inputPath <- "/regressioninput"
outputPath <- "/regressionout"
# Create job object:
z <- rhmr(map=map, reduce=reduce,
ifolder=inputPath, ofolder=outputPath,
inout=c('text', 'text'), jobname='Regression',
mapred=list(mapred.reduce.tasks=2))
# Run it:
rhex(z)
# Get the results from HDFS and use to create a dataframe:
results <- rhread(paste(outputPath, "/part-*", sep = ""), type =
"text")
write(results, file="regout.dat")
Sample Input Data:-
----------------------------
Cluster yld/ftg std ln footage Factor 1 Factor 2 Factor 3 Factor 4
Factor 5 Factor 6 Factor 7 Factor 8 Factor 9 Factor 10 Factor 11
Factor 12 Factor 13 Factor 14 Factor 15 Factor 16 Factor 17 Factor 18
Factor 19 Factor 20 Factor 21 Factor 22 Factor 23 Factor 24 Factor 25
Factor 26 Factor 27 Factor 28 Factor 29 Factor 30 Factor 31 Factor 32
Factor 33 Factor 34 Factor 35
2 -0.51 0.52 1.81 -0.93 -0.49 0.19 0.32 1.1 0.46 -0.44 -0.26 -0.94
0.82 0.46 0.35 -0.13 -0.79 -0.76 0.45 0.08 -0.33 -0.71 0.97 -0.15
0.73 -0.15 0.43 -0.5 -0.06 0.06 0.17 -0.58 0.38 -0.03 0.26 -0.53 -0.11
3 -0.53 0.94 -0.73 5.22 -0.39 -1.25 1.66 -0.53 0.99 -2.18 -0.33 0.6
1.15 -0.31 1.11 1.06 -0.89 -1 0.22 1 -0.32 -0.33
0.19 -0.47 -0.12 -0.18 -0.04 0.75 -0.33 -0.11 0.19 -0.47 -0.1 -0.22
0.27 0.35 0.05
2 0.44 0.52 2.2 4.67 -3.18 -3.33 1.62 0.11 1.62 -2.47 0.02 1.45
1.77 -1.22 1.38 0.97 -0.05 -1.51 1.07 0.57 -0.66 1.15 0.46
0.09 -0.37 -0.56 -0.46 0.02 -1.02 -0.44 0.52 -0.23 -0.3 0.31 0.17 0.19
0.92
3 -1.34 -0.43 -0.32 6.32 0.18 -2.14 1.51 -1.07 -0.36 -2.24 1.6 0.45
1.2 0.36 0.56 0.46 -0.04 -0.78 0.86 0.47 -0.64 -1.21 1.02 -0.4 0.42
1.52 0.05 -0.39 -0.66 0.1 -0.37 -0.09 0.11 -0.69 0.55 0.3 -0.06
2 -1.3 -0.43 0.92 2.91 4.32 0.29 1.74 1.28 1.86 -2.78 -0.42 0.58 0.54
0.77 1.3 0.56 0.33 -1.42 0.57 0.86 -0.4 -0.66 0.78 -1.15 0.76
1.41 -0.05 -0.68 -0.04 -0.4 0.22 -0.02 0.21 -0.29 0.7 0.1 0.04
3 -0.28 0.11 -1.26 6.55 -0.69 -3.84 1.63 -1.74 -0.99 -1 3.02 -1.09
0.46 0.23 0.38 0.21 -0.54 -0.23 0.21 -0.1 -1.06 -0.74 0.84 -0.32 0.24
0.9 0.09 -0.37 -0.37 -0.19 -0.24 -0.12 0.3 -0.38 0.47 0.81 0.01
1 1.01 -1.58 -4.66 -3.42 4.69 6.58 -1.94 -1.18 0 -0.33 0.18 1.86 0.44
0.14 0.23 0.05 0.07 0.18 0.38 0.33 -0.58 -0.02 0.4 -0.61 0.06
0.38 -0.57 0.45 0.42 0.02 0.28 0 -0.2 -0.38 0.3 0.21 0.22
3 0.4 -0.43 -2.15 1.21 6.92 8.02 0.19 0.38 -0.51 -0.17 0.87 2.45 1.13
0.49 -0.09 -0.42 -0.23 -0.13 0.64 0.37 -0.36 -0.44 0.33 -0.62 0.27
0.31 -0.09 0.05 0.66 -0.12 -0.01 -0.17 -0.69 -0.3 -0.27 0.31 0.39
3 -1.28 0.52 -5.43 3.19 -1.52 1.77 -1.43 -1.64 -0.54 1.17 -0.29 0.91
0.99 0.21 0.42 -0.01 0.64 -0.41 0.55 0 0.63 -0.12 0.24 -0.22 0.11
0.6 -0.18 0.26 -0.12 0.1 0.1 -0.13 -0.19 -0.08 0.17 0.1 0.32
in mapper script i am applying key as cluster id and sending it to
reducer , in reducer i am getting it as cluster wise and applying lm
function on yld/ftg column and whole dataset. Am seeing the error
which i have pasted above , any help will be greatly approachable.
Thanks in advance
Sandeep