A lot of Small Files

dmnk

unread,

Jul 27, 2011, 5:14:14 AM7/27/11

to cloudba...@googlegroups.com

It's again me.

I'm trying to use cloudebase to analyse big data - but my problem is that this data is in lot of small files (<200kb) and i would like to know if there is a good solution for this?

Can I use Hadoop Archive (HAR)? I have to concat files by hand? Is there any ready solution?

// Sorry for my English

Tarandeep

unread,

Jul 27, 2011, 5:23:29 AM7/27/11

to CloudBase

Cloudbase supports only text files, so you can't use HAR file.
You can either concat them outside of Hadoop and then load into hadoop/
cloudbase or do it in Hadoop (preferred).

To concat the files using Hadoop, you can write map reduce job or use
cloudbase-

1) Create a table based on these small files
2) Use select into statement to create a new table-
SELECT * INTO table2 FROM table1 WHERE c1 < 10

Queries on table2 will run faster than queries on table1.

Dominik Wiernicki

unread,

Jul 27, 2011, 5:28:39 AM7/27/11

to cloudba...@googlegroups.com

Thanks for answer. i think i will try File Crusher
http://www.jointhegrid.com/hadoop_filecrush/index.jsp

2011/7/27 Tarandeep <tara...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "CloudBase" group.
> To post to this group, send email to cloudba...@googlegroups.com.
> To unsubscribe from this group, send email to cloudbase-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cloudbase-users?hl=en.
>
>

Reply all

Reply to author

Forward