Does anybody ever use URIs as input or output to mapreduce?

Showing 1-1 of 1 messages
Does anybody ever use URIs as input or output to mapreduce? Antonio Piccolboni 7/18/12 3:09 PM
Hi,
 this question is not limited to rmr, as right now support for URIs in rmr is patchy and was not the result of a deliberate choice. I suspect the only thing that breaks though is the equijoin. So with the goal  of fixing the URI issue my question is whether people have any use case whereby they need to specify the protocol, server and host in mapreduce or if they always stick with defaults. In my experience, one does not and can not write a job that, reads from one cluster and writes to another, or reads from two different clusters, for instance:

mapreduce(input = "hdfs://fullserver:14142/big/juicy/data", output = "hdfs://emptyserver:12358/empty/space")

Any use cases that would make URIs useful with mapreduce? I get the use with hadoop fs -put or -get, I am interested in MR specifically. Thanks

Antonio