I am trying to run a rather large rmr job where the file data is partitioned as follows:
/data/folder/partition1
/data/folder/partition2
/data/folder/partitionN
When running the following job I am getting a not a file error.
library(rmr2)
large.job.mapper <- function( key, values )
{
output.key = key
output.value = data.frame( OrderCount = 1 )
keyval( output.key, output.value )
}
large.job.mr <- function (inputPath, outputPath = NULL )
{
mapreduce( input = inputPath,
output = outputPath,
map = large.job.mapper,
verbose=T
)
}
result = large.job.mr ( '/data/folder/' )
OUTPUT
14/03/18 15:52:46 INFO mapred.JobClient: Cleaning up the staging area hdfs://pxpmhwtmn001.gid.gap.com:8020/user/bdload/.staging/job_201403071300_22423
14/03/18 15:52:46 ERROR security.UserGroupInformation: PriviledgedActionException as:bdload cause:java.io.IOException: Not a file: Not a file: hdfs://pxpmhwtmn001.gid.gap.com:8020/data/folder/partiton1
14/03/18 15:52:46 ERROR streaming.StreamJob: Error Launching job : Not a file: hdfs://pxpmhwtmn001.gid.gap.com:8020/data/folder/partiton1
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 5
Is there any way to set an input filter in rmr2?
Thank you--
-me
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.