New to Rhadopp and big da

80 views

Skip to first unread message

Ajay Garg

unread,

May 1, 2015, 4:14:27 PM5/1/15

to rha...@googlegroups.com

I am new to Rhadoop and want to run a simple function using a large file the file has a structure as follows

YMMKey Trim InternetPrice Mileage Certified SingleOwner

2015ChevroletSuburban LT 0 10 0 0

2014ChevroletEquinox LT 23620 10 0 0

2014ChevroletSilverado 1500 LTZ 41695 10 0 0

2014ChevroletMalibu LT 0 10 0 0

2014ChevroletVolt 36605 10 0 0

2015ChevroletTahoe LT 56480 10 0 0

2015ChevroletSilverado 3500HD LTZ 59145 10 0 0

2014ChevroletSilverado 1500 LTZ 44365 10 0 0

How can i import this file into hdfs and read it using rscript when the file has large number of rows(2 million)

also can someone also show me how to run a simple map function and a reduce function on the data set which i have imported.

When i use a simple r function to load the file

f= hdfs.file("/example/data/vipInputs.csv","r",buffersize=104857600)

m= hdfs.read(f)

c= rawToChar(m)

reader = hdfs.line.reader("/example/data/vipInputs.csv")

x = reader$read()

it fails as it runs out of buffer space after 200000 lines

Thanks in anticipation

Ajay

Antonio Piccolboni

unread,

May 22, 2015, 1:06:35 PM5/22/15

to rha...@googlegroups.com, ajayga...@gmail.com

I think you are misunderstanding the issues related to big data and hdfs. You are asking to import this file into hdfs then your script reads from hdfs as if the data were there already. There are 4 places for the data, in most settings: local memory, distributed memory, local disk, hdfs. It's 4, it's not a hundred. So you need clarity on

1) Where your data is

2) where it needs to go

3) whether it's going to fit where you want it to go

Without these three elements we are in the dark.

Antonio

Reply all

Reply to author

Forward

0 new messages