Hi everyone ,
I am trying to extract the data from HDFS to R.. after importing the data to R, I can see only 7,40,726 no. of rows only, but the actual no. of rows in the data is around 11,55,000 and the size of data is 105MB only.
below is the code i have written to extract the data.
Sys.setenv("HADOOP_PREFIX"="/opt/hadoop")
Sys.setenv("HADOOP_CMD"="/opt/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/opt/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar")
library(rmr2)
library(rhdfs)
hdfs.init()
library(rJava)
# To read the data from HDFS.
hdfs.defaults()
f = hdfs.file("/tmp/projectdata/churndata/Apr_data1.csv","r",buffersize=104857600)
m = hdfs.read(f)
c = rawToChar(m)
data = read.table(textConnection(c), sep = ",",fill = TRUE);
Kindly let me know what should I do, in order to extract the complete data(11,55,000 rows).
Waiting for your kind response on the same.
Thank you,
Chandan