A lot of Small Files

21 views
Skip to first unread message

dmnk

unread,
Jul 27, 2011, 5:14:14 AM7/27/11
to cloudba...@googlegroups.com
It's again me.

I'm trying to use cloudebase to analyse big data - but my problem is that this data is in lot of small files (<200kb) and i would like to know if there is a good solution for this? 

Can I use Hadoop Archive (HAR)? I have to concat files by hand? Is there any ready solution? 

// Sorry for my English

Tarandeep

unread,
Jul 27, 2011, 5:23:29 AM7/27/11
to CloudBase
Cloudbase supports only text files, so you can't use HAR file.
You can either concat them outside of Hadoop and then load into hadoop/
cloudbase or do it in Hadoop (preferred).

To concat the files using Hadoop, you can write map reduce job or use
cloudbase-

1) Create a table based on these small files
2) Use select into statement to create a new table-
SELECT * INTO table2 FROM table1 WHERE c1 < 10

Queries on table2 will run faster than queries on table1.

Dominik Wiernicki

unread,
Jul 27, 2011, 5:28:39 AM7/27/11
to cloudba...@googlegroups.com
Thanks for answer. i think i will try File Crusher
http://www.jointhegrid.com/hadoop_filecrush/index.jsp

2011/7/27 Tarandeep <tara...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "CloudBase" group.
> To post to this group, send email to cloudba...@googlegroups.com.
> To unsubscribe from this group, send email to cloudbase-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cloudbase-users?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages