Would Spark also be an option to move data from R -> Spark -> h2o? or perhaps R -> Python -> h2o? Is there a path you would recommend?
//
// Here is an example of how to upload a file from the command line.
//
// curl -v -F "file=@allyears2k_headers.zip" "http://localhost:54321/PostFile.bin?destination_frame=a.zip"
//
// JSON Payload returned is:
// { "destination_frame": "key_name", "total_bytes": nnn }
//
I haven't quite tried all of this yet...but in your example of the curl request... in the Form argument (-F) that is it giving, it is a specific file…
how do we use it for R objects in RAM, as this is not a file that can be referenced?
curl -v -F "file=@allyears2k_headers.zip" "http://localhost:54321/PostFile.bin?destination_frame=a.zip"i.e in the above, I do not have an equivalent allyears2k_headers.zipFurthermore the even if logging the h2o.uploadFile() function...it would still be a file that is uploaded and not the R object in memory.Is my thinking correct?
for a 2GB, so that 2GB of data can be written as csv to /media/ramdisk and then a h2o.fileUpload() can can the run referencing that file?
Thanks,
Hiddi
Hi Tom,
I have had a bit more time looking at the app-consumer-loan example you provided here. https://github.com/h2oai/app-consumer-loan
It looks good and probably very close to what I might need.
A few hopefully quick things,
How do I adjust the gradle.build or any other relevant file in the github repo so that it only builds/compiles the http-server to accept curl requests, without having to build the front-end web-page that submits forms. And after that adjustment is made, I presume that the deployment of the war file would need to change so what would be the new function instead of ./gradlew jettyRunWar
Secondly, if i wanted to run several of the same models on different ports, how could I do that? This is so that I can send multiple curl requests (in parallel from R) to the different ports to get the predicted output for a specified input?
import hex.genmodel.easy.prediction.BinomialModelPrediction; import hex.genmodel.easy.prediction.RegressionModelPrediction; import hex.genmodel.easy.*;
static {
BadLoanModel rawBadLoanModel = new BadLoanModel();
badLoanModel = new EasyPredictModelWrapper(rawBadLoanModel);
InterestRateModel rawInterestRateModel = new InterestRateModel();
interestRateModel = new EasyPredictModelWrapper(rawInterestRateModel);
} private BinomialModelPrediction predictBadLoan (RowData row) throws Exception {
return badLoanModel.predictBinomial(row);
}
private RegressionModelPrediction predictInterestRate (RowData row) throws Exception {
return interestRateModel.predictRegression(row);
}Are there any thoughts on perhaps doing thousands of batches of say 200 rows? on which method (curl or standard batch processing) might be better?
In addition, I have noticed for h2o.importFile() it accepts s3 links as suggested here (https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/howto/H2O-DevS3Creds.md)In terms of the underlying function? does this download the file to disk first and then upload the file to h2o?