How do I read / download webpage content somewhere locally on HDD (text only, not Ima

12 views
Skip to first unread message

David Gkogkritsiani

unread,
Apr 24, 2013, 9:35:30 AM4/24/13
to chenn...@googlegroups.com
Hi all,

I have undertaken my diploma thesis on Hadoop MapReduce and I have been requested to I do an application written in MapReduce.
I found on internet this code and I ran the code :
How can I add in that code, to stores all text on webpages somewhere locally on HDD (text only, not Images) and then I have to be processed .;
 ie,
I should a Mapreduce code, which would download web pages from the web and store on the local file system and not the HDFS. 
After ,I run the quest-search (program) in order to not depend on network speed.
Because ,my network is so slow.
I do this to improvement performance.
I am running Hadoop Version 0.20.2 .
I am new to Hadoop and am kinda lost and any help would be greatly appreciated.

Sorry for my bad English.search.
Thanks in advance for any assistance !
Reply all
Reply to author
Forward
0 new messages