set the newliner of files when IMPORT DATA FROM

2 views
Skip to first unread message

jingchao

unread,
Jun 3, 2010, 3:48:57 AM6/3/10
to CloudBase
I want to parse some log files generated in Windows and use IMPORT
DATA FROM to import files.
Here comes the problem: how can I make CloudBase use '\r\n' as
newliner when IMPORT DATA FROM?

Tarandeep Singh

unread,
Jun 3, 2010, 6:33:53 AM6/3/10
to cloudba...@googlegroups.com
I *think* it should work.
CloudBase uses Hadoop's TextInputFormat to read text files and I am sure TextInputFormat can handle \r\n.

Did you try running a simple query on your data set?


--
You received this message because you are subscribed to the Google Groups "CloudBase" group.
To post to this group, send email to cloudba...@googlegroups.com.
To unsubscribe from this group, send email to cloudbase-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cloudbase-users?hl=en.


荆超

unread,
Jun 3, 2010, 9:40:07 AM6/3/10
to cloudba...@googlegroups.com
Thanks for your reply and I'm quit with you.
 I have tried to find the definition of newliner in Hadoop source files but no results. I've tried the keywords : 'newline', 'line.separator', 'line' .

Tarandeep Singh

unread,
Jun 4, 2010, 4:48:33 PM6/4/10
to cloudba...@googlegroups.com
TextInputFormat uses LineRecordReader which internally uses LineReader-
$HADOOP_HOME/src/core/org/apache/hadoop/util/LineReader.java

this takes care of '\r\n' as well. Take a look at the source code.

-Tarandeep
Reply all
Reply to author
Forward
0 new messages