Error in h2o.importFolder function

248 views
Skip to first unread message

Anand Jay

unread,
Dec 25, 2015, 2:01:39 PM12/25/15
to h2os...@googlegroups.com
Hi,
 
I have previously posted an almost identical question regarding an error when using h2o.importFolder function with the pattern parameter. That problem was resolved in my macbook when I finally downloaded the version 3.6.0.8 of h2o (I was earlier using version 3.0.0.25) as well as providing the correct regex expression for the pattern.

However, strangely, I am now faced with a different error when using the same code in a Cloudera Quickstart VM (CentOS release 6.7). The dependent packages were all updated to reflect the same version as the macbook.
 
Sample code I use to recreate this error in the VM is as follows:
library(DMwR)
library(rmr2)
library(h2o)
data(algae)
 
# upload data to hdfs
algae.rh <- to.dfs(keyval(NULL,algae))
 
# this gives output files hdfs://localhost.localdomain:8020/user/cloudera/test1/part-00000 and hdfs://localhost.localdomain:8020/user/cloudera/test1/part-00001
 
a <- mapreduce(input=algae.rh,input.format = 'native',map= function(k,v){return(keyval(v$season,v))},output="hdfs://localhost.localdomain:8020/user/cloudera/test1",output.format=make.output.format("csv",sep=","))
 
h2oInstance <- h2o.init(ip = "localhost.localdomain", port = 54321,nthreads = -1)
 
# this will throw an error in Cloudera VM(CentOS) but works in a mac (after changing the path)
 
b.h2o <-  h2o.importFolder(path = "hdfs://localhost.localdomain:8020/user/cloudera/test1", pattern = 'part-[:digit:]{5}')
 
# this correctly inputs the data in part-00000
 
b.h2o <-  h2o.importFolder(path = "hdfs://localhost.localdomain:8020/user/cloudera/test1/part-00000")
 
# But even this throws an error
 
b.h2o <-  h2o.importFolder(path = "hdfs://localhost.localdomain:8020/user/cloudera/test1", pattern = 'part-00000')
 
There is probably a simple solution as the final line in the error message says “water.exceptions.H2OParseSetupException: Column separator mismatch. One file seems to use " " and the other uses ",".
Due to the above it seems to me that the h2o.importFolder function is unable to filter out the non-matching file names and hence tries to import other files in the folder along with the csv files, which is probably the reason for the error message.

I am appending below the sessionInfo() output.

Any help or advice would be much appreciated.

Thank you.
Anand
 

Full error message:
 
ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost.localdomain:54321/3/ParseSetup)
 
java.lang.RuntimeException
 [1] "water.MRTask.getResult(MRTask.java:505)"                                              
 [2] "water.MRTask.doAll(MRTask.java:399)"                                                  
 [3] "water.parser.ParseSetup.guessSetup(ParseSetup.java:212)"                              
 [4] "water.api.ParseSetupHandler.guessSetup(ParseSetupHandler.java:34)"                    
 [5] "sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"                          
 [6] "sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)"        
 [7] "sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"
 [8] "java.lang.reflect.Method.invoke(Method.java:606)"                                     
 [9] "water.api.Handler.handle(Handler.java:64)"                                            
[10] "water.api.RequestServer.handle(RequestServer.java:644)"                               
[11] "water.api.RequestServer.serve(RequestServer.java:585)"                                
[12] "water.JettyHTTPD$H2oDefaultServlet.doGeneric(JettyHTTPD.java:617)"                    
[13] "water.JettyHTTPD$H2oDefaultServlet.doPost(JettyHTTPD.java:565)"                       
[14] "javax.servlet.http.HttpServlet.service(HttpServlet.java:755)"                         
[15] "javax.servlet.http.HttpServlet.service(HttpServlet.java:848)"                         
[16] "org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)"               
 
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 
  water.DException$DistributedException: from /172.17.135.120:54321; by class water.parser.ParseSetup$GuessSetupTsk; class water.exceptions.H2OParseSetupException: Column separator mismatch. One file seems to use " " and the other uses ",".
 
sessionInfo() output:
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)
 
locale:
[1] C
 
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
 
other attached packages:
 [1] rJava_0.9-6       forecast_6.2      timeDate_3012.100 Rcpp_0.12.2       plyr_1.8.3       
 [6] functional_0.6    h2o_3.6.0.8       statmod_1.4.22    xts_0.9-7         zoo_1.7-11       
[11] data.table_1.9.6 
 
loaded via a namespace (and not attached):
 [1] rmr2_3.0.0       colorspace_1.2-4 lattice_0.20-33  quadprog_1.5-5   tools_3.2.2      nnet_7.3-10     
 [7] parallel_3.2.2   grid_3.2.2       tseries_0.10-34  bitops_1.0-6     RCurl_1.95-4.7   fracdiff_1.4-2  
[13] jsonlite_0.9.19  chron_2.3-45
 
Reply all
Reply to author
Forward
Message has been deleted
0 new messages