Hi Jeremy,
Yes, it is possible.
What you'll need to do is configure your H2O so it can read from S3 (over HDFS, so S3N).
If you use our ec2 scripts to start instances, then things will just work for you.
After that, use the Import HDFS Web UI menu item (with s3n:// instead of hdfs://).
Or you can start from R snippet below, which I modified from one of our HDFS unit tests.
Let me know if you have more questions.
(Note: I tested this on latest top of tree master. Depending on what release you have we
may need to tweak the R code if you're running from R.)
Thanks,
Tom
S3N Setup:
The following is adoped from these files, which we use to set up AWS instances.
h2o/ec2/h2o-cluster-distribute-aws-credentials.sh
h2o/ec2/ami/start-h2o-bg.sh
Command line options:
java -Xmx1g -jar h2o.jar -hdfs_config core-site.xml
core-site.xml config file:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!--
<property>
</property>
-->
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>${AWS_ACCESS_KEY_ID}</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>${AWS_SECRET_ACCESS_KEY}</value>
</property>
</configuration>
R Code:
Check out this example from: h2o/R/tests/testdir_hdfs/runit_s3n_basic.R
s3n_iris_dir <- "0xdata-public/examples/h2o/R/datasets"
url2 <- sprintf("
s3n://%s", s3n_iris_dir)
irisdir.hex <- h2o.importHDFS(conn, url2)