Groups
Sign in
Groups
Hadoop中国用户组(CHUG)
Conversations
About
Send feedback
Help
hadoop对于压缩文件的透明识别问题
166 views
Skip to first unread message
air
unread,
Aug 9, 2011, 2:31:47 AM
8/9/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Hadoop中文用户组
最近beidou与我在讨论hadoop对于压缩格式的透明识别问题。所谓透明就是对我们的MapReduce任务的执行是透明的,hadoop能够自动为我们将压缩的文件解压,而不用我们去关心。
如果我们压缩的文件有相应压缩格式的扩展名(比如lzo,gz,bzip2等),hadoop就会根据扩展名去选择解码器解压,如果压缩的文件没有扩展名,则需要在执行mapreduce任务的时候指定输入格式.
hadoop jar /usr/home/hadoop/hadoop-0.20.2/contrib/streaming/hadoop-streaming-0.20.2-CDH3B4.jar -file /usr/home/hadoop/hello/mapper.py -mapper /usr/home/hadoop/hello/mapper.py -file /usr/home/hadoop/hello/reducer.py -reducer /usr/home/hadoop/hello/reducer.py -input lzotest -output result4 -jobconf mapred.reduce.tasks=1
-inputformat org.apache.hadoop.mapred.LzoTextInputFormat
--
Knowledge Mangement .
李玉林
unread,
Aug 9, 2011, 2:34:58 AM
8/9/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hado...@googlegroups.com
嗯,这个确实是挺好的,我从NCDC上下载下来的都是gz的,一开始做小规模测试的时候还是自己一个个的解压,再弄过去,后来就想试试看,果然可以,瞬间感觉强大。
--
李玉林
北斗七
unread,
Aug 9, 2011, 3:41:03 AM
8/9/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hado...@googlegroups.com
好好,大家日志都是使用什么方式收集上来的呢?
air
unread,
Aug 24, 2011, 1:26:40 AM
8/24/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hado...@googlegroups.com
其实如果用Cloudera发行的hadoop的话,最好用flume来做收集,网上的分析也对flume非常赞。
--
Knowledge Mangement .
北斗七
unread,
Aug 24, 2011, 1:41:21 AM
8/24/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hado...@googlegroups.com
有链接吗?发来看看。印象中比scribe健壮,在高可用及管理上方便些。
air
unread,
Aug 24, 2011, 1:57:57 AM
8/24/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hado...@googlegroups.com
http://dongxicheng.org/search-engine/log-systems/
--
Knowledge Mangement .
tiangang Zhu
unread,
Aug 27, 2011, 3:51:40 AM
8/27/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hado...@googlegroups.com
用mapreduce建立倒排索引之后,如何利用lucene进行在索引上搜索呢?
air
unread,
Aug 27, 2011, 4:12:04 AM
8/27/11
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hado...@googlegroups.com
对于lucene不熟悉,不过好像lucene就是建立索引的工具吧。。。
2011/8/27 tiangang Zhu
<
tiang...@gmail.com
>
用mapreduce建立倒排索引之后,如何利用lucene进行在索引上搜索呢?
--
Knowledge Mangement .
Reply all
Reply to author
Forward
0 new messages