I have set up sqoop to fetch data from a distant database , and it works fine , i can see data splitted in blocks in my hdfs . NOw i want to explore this data with RHadoop , i have my algorithm when i try it on a csv file it works , so the problem i have is how i can make the mapreduce make its job on the hdfs blocks :
because when i put a csv on hdfs , the mapreduce takes soo much time , 4 minutes for million raw .. the problem is that this csv is only an example , the real data will be millions of raws
the blocks are in /Data which is a hdfs directory hdfs.data.root=('/Data')
hdfs.data=file.path(hdfs.data.root)
#makes a mapreduce function
job <- mapreduce(input = hdfs.data,
map = function(k, v)
{keyval(v[6], 1)},
reduce = function(k, v)
{keyval(k, length(v))})
in the end it fails .
I have set-up Sqoop to fetch data from a distant database, and it works fine, i can see data divided in blocks in my hdfs. Now I want to explore this Data with RHadoop, i have my algorithm when i try it on a csv file it works, so the problem i have is how i can make the mapreduce make STI job on the hdfs blocks:
Because when i put to csv on hdfs, the mapreduce Takes soo much time, 4 million minutes for raw .. the problem is this csv That is only an example, the current data will be millions of raws
the blocks are in / Data directory Which is to hdfs
hdfs . data . root = ( '/ Data' )
hdfs . data = file . path ( hdfs . data . root )
#makes to mapreduce function
job <- mapreduce ( input = hdfs . data ,
map = function ( k , v )
{ keyval ( v [ 6 ], 1 )},
reduce = function ( k , v )
{ keyval ( k , length ( v ))})