R Hadoop

70 views
Skip to first unread message

Rajendra Nigam

unread,
Jun 11, 2015, 5:15:20 PM6/11/15
to rha...@googlegroups.com
How to load a tab delimited file to Rhadoop from HDFS. The input file has spaces in some record'

 make.input.format(  "csv",     sep = "\t",    col.names = names(col.classes),    colClasses = col.classes)

Can someone please help with code? thanks in advance.

Antonio Piccolboni

unread,
Jun 11, 2015, 5:31:30 PM6/11/15
to rha...@googlegroups.com, rajendr...@gmail.com
make.input.format accepts the same options as read.table, with a couple of exclusions. So what you may need to add here is probably option quote, but from a look at your other message the default should work. I would suggest to try and read a small-medium subsample of your data locally with read.table. Once that works, most likely you can use the same options to create the correct input format for rmr.

In general, please provide additional information replying to your original message instead of starting a separate thread, and please make an effort to provide meaningful subject lines. This is our knowledge base and every message is in public view for the foreseeable future. We'd better keep it tidy and be proud of each message. Also, even moderators have very limited editing capabilities, for instance, I can't fuse two threads, without recreating it. Thanks


Antonio
Reply all
Reply to author
Forward
0 new messages