cmdenv Unrecognized option: -files

183 views
Skip to first unread message

Bernardo Hermont

unread,
Mar 2, 2015, 4:00:58 PM3/2/15
to rha...@googlegroups.com
Hi Antonio,

I saw an old post which says that there might be an issue with the cmdenv parameter to backend.parameters. Is this the case for version 3.3.0?
I'm testing this for Linux:


mappreduce(
  input = small.ints,
  map = function(k, v) cbind(v, v^2),
  backend.parameters = list(hadoop = list(archives = myfile, D = "mapred.compress.map.output=true", D = "mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec", D = "mapred.output.compress=true", D = "mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec", cmdenv = "PATH=files.zip/R/bin/")))

15/03/02 15:59:11 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
15/03/02 15:59:11 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
15/03/02 15:59:11 INFO Configuration.deprecation: mapred.map.output.compression.codec is deprecated. Instead, use mapreduce.map.output.compress.codec
15/03/02 15:59:11 INFO Configuration.deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
15/03/02 15:59:11 INFO Configuration.deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
15/03/02 15:59:11 ERROR streaming.StreamJob: Unrecognized option: -files
Usage: $HADOOP_PREFIX/bin/hadoop jar hadoop-streaming.jar [options]

I got the error Unrecognized option: -files. I without the cmdenv option the command works.

Thanks,

Bernardo

Antonio Piccolboni

unread,
Mar 2, 2015, 8:04:40 PM3/2/15
to RHadoop Google Group
I think it's the order of the backend options. It's an unfortunate quirk in streaming that there are two types of options, generic and "streaming" and one type goes in front of the other. To add insult to injury, the error message is usually very uninformative, actually, misleading. The relevant doc is here


with minor variation over the last few releases.  The problem is compounded by the fact that rmr has to set some options of both types for its own purposes and the user has the ability to set some as well. Without knowledge of which options are generic or specific embedded in rmr2, a simple concatenation of options is attempted. It has worked for a while, probably because you are the first person to need to specify a streaming option. I am not sure there is an easy way to fix this. One would be to let the user specify separately the two types of options, as in backend.parameters = list(hadoop = list(generic  =list(...), streaming = list(...))) which almost laughable in its complexity. The other would be to embed in rmr enough information to sort the cmd line appropriately, now that streaming is hardly changing at all it may be an option. I would enter an bug report on the rmr2 issue tracker and I will try to get to it. Of course, if you are in a hurry you may want to initiate a pull request. Anything that doesn't change the API and passes the tests would be looked at very favorable. The function in question is rmr.stream. The problem is that the final command line is assembled by forming parts and then concatenating them. The algorithm has to change to represent the cmd line as a named character vector until it's complete, sort the names according to the generic vs streaming criterion and turn it into a single string with paste.option. It's doable but a lot of busy work and I am not sure supporting unrestricted use of backend parameters is high on our priority list. Maybe if you described your use case that could change. Thanks



Antonio



--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages