install rhdfs 1.0.8 error: Environment variable HADOOP_CMD must be set before loading package

1,683 views
Skip to first unread message

Emilio Torres

unread,
Mar 27, 2014, 3:50:50 PM3/27/14
to rha...@googlegroups.com
I am trying to install rhdfs in a Sandbox HDP2 (virtual box), but it does not recognise the environment variable HADOOP_CMD (see below)
Any suggestion?
Thank you!
Emilio
# which hadoop
/usr/bin/hadoop
# ls /usr/bin/hadoop
/usr/bin/hadoop
# export HADOOP_CMD=/usr/bin/hadoop
# echo $HADOOP_CMD
/usr/bin/hadoop

# R
> Sys.getenv("HADOOP_CMD")
[1] "/usr/bin/hadoop"
# wget --no-check-certificate https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.8.tar.gz?raw=true
# sudo R CMD INSTALL rhdfs_1.0.8.tar.gz
* installing to library '/usr/lib64/R/library'
* installing *source* package 'rhdfs' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
converting help for package 'rhdfs'
finding HTML links ... done
hdfs-file-access html
hdfs-file-manip html
hdfs.defaults html
hdfs.file-level html
initialization html
rhdfs html
text.files html
** building package indices
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: loading failed
Execution halted
ERROR: loading failed
* removing '/usr/lib64/R/library/rhdfs'

Antonio Piccolboni

unread,
Mar 27, 2014, 7:13:51 PM3/27/14
to rha...@googlegroups.com
try sudo -E (not the default for security reasons, so be alert what you are exporting)

Antonio

Emilio Torres

unread,
Mar 30, 2014, 10:07:17 AM3/30/14
to rha...@googlegroups.com

Thanks you, Antonio.

Finally, I installed it from inside R, following these commands. Best regards. Emilio

$ sudo R

hcmd <-system("which hadoop", intern = TRUE)
hcmd
Sys.setenv(HADOOP_CMD=hcmd)
hstreaming <- system("find /usr -name hadoop-streaming*jar", intern=TRUE)
hstreaming
Sys.setenv(HADOOP_STREAMING= hstreaming[1])
Sys.getenv("HADOOP_CMD")
Sys.getenv("HADOOP_STREAMING")
system("wget --no-check-certificate https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.8.tar.gz?raw=true")
install.packages("rhdfs_1.0.8.tar.gz", repos = NULL, type="source")


Kiran Bhakre

unread,
Apr 3, 2014, 8:00:33 AM4/3/14
to rha...@googlegroups.com
This line will create problem in hadoop 2.0
"hstreaming <- system("find /usr -name hadoop-streaming*jar", intern=TRUE)"

so better you can give the path of hadoop streaming jar in contrib dir.

Antonio Piccolboni

unread,
Apr 3, 2014, 11:43:32 AM4/3/14
to RHadoop Google Group
Kiran,
that command will find it if the contrib dir is somewhere under /usr. man find if you have doubts about it. Not all distributions put streaming under contrib, so that's why I think Emilio cast a wide net. The problem is that sometimes there are two matches, if not more if you have multiple hadoop versions installed, and picking the first is just a guess.  I don't have a better solution.

Antonio


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages