ElephantDB Server - Running against S3

89 views

Skip to first unread message

Phil Kallos

unread,

Jun 11, 2014, 8:13:36 PM6/11/14

to elephan...@googlegroups.com

Ok so I managed to get elephantdb-server running against my local HDFS. I have a scalding job that I'm running against local HDFS, writing ElephantDB. From there I was able to start the elephantdb-server using:

$ lein run /path/to/hdfs/global-conf.clj /path/to/local/local-conf.clj

Able to read data using the thrift interface, everything is great!

I am trying to bring this mechanism over to EC2/S3. I ran the same mapreduce job on EMR and configured it to write the ElephantDB output to s3://some/bucket/elephantdb/domain , as far as I can tell the data is there and is being written to S3 correctly.

However, I am having trouble getting elephantdb-server running against s3://some/bucket/elephantdb/domain. Certainly I am doing the wrong thing but at the moment it's unclear what i'm doing wrong.

I have uploaded a global config file to S3 at s3://some/bucket/elephantdb/global-conf.clj, which looks something like

{ :replication 1

:hosts ["ec2-some-host.compute-1.amazonaws.com"]

:port 3578

:domains {"daily-stats" "s3://some/bucket/elephantdb/domain"

}

and on ec2-some-host.compute-1.amazonaws.com I have a local config:

{:local-root "/tmp/elephantdb"

:download-rate-limit 1024

:update-interval-s 60 ;; check for domain updates every minute

:hdfs-conf {"fs.default.name" "s3n://hdfs"}}

I now try to start the elephantdb-server with

$ lein run s3n://some/bucket/elephantdb/global-conf.clj /path/to/local-conf.clj

But I am greeted with the following error:

00:05:38.788 [main] DEBUG org.apache.hadoop.conf.Configuration - java.io.IOException: config()

at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)

at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)

at hadoop_util.core$configuration.invoke(core.clj:53)

at hadoop_util.core$filesystem.invoke(core.clj:65)

at elephantdb.common.config$read_global_config.invoke(config.clj:76)

at elephantdb.keyval.core$_main.invoke(core.clj:246)

at clojure.lang.Var.invoke(Var.java:419)

at user$eval5.invoke(form-init7480547441345866642.clj:1)

at clojure.lang.Compiler.eval(Compiler.java:6619)

at clojure.lang.Compiler.eval(Compiler.java:6609)

at clojure.lang.Compiler.load(Compiler.java:7064)

at clojure.lang.Compiler.loadFile(Compiler.java:7020)

at clojure.main$load_script.invoke(main.clj:294)

at clojure.main$init_opt.invoke(main.clj:299)

at clojure.main$initialize.invoke(main.clj:327)

at clojure.main$null_opt.invoke(main.clj:362)

at clojure.main$main.doInvoke(main.clj:440)

at clojure.lang.RestFn.invoke(RestFn.java:421)

at clojure.lang.Var.invoke(Var.java:419)

at clojure.lang.AFn.applyToHelper(AFn.java:163)

at clojure.lang.Var.applyTo(Var.java:532)

at clojure.main.main(main.java:37)

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3n://some/bucket/elephantdb/global-conf.clj,

expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)

at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)

at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)

at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)

at elephantdb.common.config$read_clj_config.invoke(config.clj:40)

at elephantdb.common.config$read_global_config.invoke(config.clj:76)

at elephantdb.keyval.core$_main.invoke(core.clj:246)

at clojure.lang.Var.invoke(Var.java:419)

at user$eval5.invoke(form-init7480547441345866642.clj:1)

at clojure.lang.Compiler.eval(Compiler.java:6619)

at clojure.lang.Compiler.eval(Compiler.java:6609)

at clojure.lang.Compiler.load(Compiler.java:7064)

at clojure.lang.Compiler.loadFile(Compiler.java:7020)

at clojure.main$load_script.invoke(main.clj:294)

at clojure.main$init_opt.invoke(main.clj:299)

at clojure.main$initialize.invoke(main.clj:327)

at clojure.main$null_opt.invoke(main.clj:362)

at clojure.main$main.doInvoke(main.clj:440)

at clojure.lang.RestFn.invoke(RestFn.java:421)

at clojure.lang.Var.invoke(Var.java:419)

at clojure.lang.AFn.applyToHelper(AFn.java:163)

at clojure.lang.Var.applyTo(Var.java:532)

at clojure.main.main(main.java:37)

Clearly I have the understanding of a potato, and I have naively misconfigured this somehow. I suspect the issue is with the entry "fs.default.name" "s3n://hdfs" but I have no idea what the correct entry for that would be... What is the correct way to configure and start a cluster of elephantdb-server processes that are pointing to S3 for their HDFS storage?

Any advice is much appreciated.

Thanks,

Phil

Reply all

Reply to author

Forward

0 new messages