I think I have figured out that one. But while loading data into HDFS. I got error again.
I am pasting errors and warning showing in RStudio: (I have attached R code as well)
> system("java -version")
java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
> Sys.setenv(HADOOP_CMD="/usr/local/hadoop/bin/hadoop")
> Sys.getenv("HADOOP_CMD") # HADOOP_CMD points to the main hadoop command
[1] "/usr/local/hadoop/bin/hadoop"
> Sys.setenv(HADOOP_STREAMING="/usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar")
> Sys.getenv("HADOOP_STREAMING") # points to the streaming jar, a file called something like hadoop-streaming*.jar that is part of most hadoop distributions
[1] "/usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar"
> library(rmr2)
Please review your hadoop settings. See help(hadoop.settings)
Warning message:
S3 methods ‘gorder.default’, ‘gorder.factor’, ‘gorder.data.frame’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
> library(rJava)
> library(rhdfs)
HADOOP_CMD=/usr/local/hadoop/bin/hadoop
Be sure to run hdfs.init()
> hdfs.init()
15/03/05 17:23:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> # function to read multiple json files
> library(jsonlite)
Attaching package: ‘jsonlite’
The following object is masked from ‘package:utils’:
View
> read_json <- function(x) {
+ json <- fromJSON(x, flatten = T)
+ json <- as.data.frame(json)
+ names(json)[names(json)=='Reviews.Ratings.Business.service..e.g...internet.access.'] <- 'Reviews.Ratings.Business.service'
+ return(json)
+ }
> library(plyr)
> reviews <- ldply(list.files(path = "json", full.names = T), read_json)
> reviews_lines <- to.dfs(reviews$Reviews.Content[1:20])
Exception in thread "main" java.lang.ClassNotFoundException: loadtb
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
Please provide some suggestions to remove these errors.
Thanks in advance