ElephantDB Server - Running against S3

89 views
Skip to first unread message

Phil Kallos

unread,
Jun 11, 2014, 8:13:36 PM6/11/14
to elephan...@googlegroups.com
Ok so I managed to get elephantdb-server running against my local HDFS. I have a scalding job that I'm running against local HDFS, writing ElephantDB. From there I was able to start the elephantdb-server using: 
$ lein run /path/to/hdfs/global-conf.clj /path/to/local/local-conf.clj

Able to read data using the thrift interface, everything is great!

I am trying to bring this mechanism over to EC2/S3. I ran the same mapreduce job on EMR and configured it to write the ElephantDB output to s3://some/bucket/elephantdb/domain , as far as I can tell the data is there and is being written to S3 correctly.

However, I am having trouble getting elephantdb-server running against s3://some/bucket/elephantdb/domain. Certainly I am doing the wrong thing but at the moment it's unclear what i'm doing wrong.

I have uploaded a global config file to S3 at s3://some/bucket/elephantdb/global-conf.clj, which looks something like
{ :replication 1
  :port 3578
  :domains {"daily-stats" "s3://some/bucket/elephantdb/domain"
            }
}

and on ec2-some-host.compute-1.amazonaws.com I have a local config:
{:local-root "/tmp/elephantdb"
 :download-rate-limit 1024
 :update-interval-s 60 ;; check for domain updates every minute
 :hdfs-conf {"fs.default.name" "s3n://hdfs"}}

I now try to start the elephantdb-server with
$ lein run s3n://some/bucket/elephantdb/global-conf.clj /path/to/local-conf.clj

But I am greeted with the following error:
00:05:38.788 [main] DEBUG org.apache.hadoop.conf.Configuration - java.io.IOException: config()
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)
        at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)
        at hadoop_util.core$configuration.invoke(core.clj:53)
        at hadoop_util.core$filesystem.invoke(core.clj:65)
        at elephantdb.common.config$read_global_config.invoke(config.clj:76)
        at elephantdb.keyval.core$_main.invoke(core.clj:246)
        at clojure.lang.Var.invoke(Var.java:419)
        at user$eval5.invoke(form-init7480547441345866642.clj:1)
        at clojure.lang.Compiler.eval(Compiler.java:6619)
        at clojure.lang.Compiler.eval(Compiler.java:6609)
        at clojure.lang.Compiler.load(Compiler.java:7064)
        at clojure.lang.Compiler.loadFile(Compiler.java:7020)
        at clojure.main$load_script.invoke(main.clj:294)
        at clojure.main$init_opt.invoke(main.clj:299)
        at clojure.main$initialize.invoke(main.clj:327)
        at clojure.main$null_opt.invoke(main.clj:362)
        at clojure.main$main.doInvoke(main.clj:440)
        at clojure.lang.RestFn.invoke(RestFn.java:421)
        at clojure.lang.Var.invoke(Var.java:419)
        at clojure.lang.AFn.applyToHelper(AFn.java:163)
        at clojure.lang.Var.applyTo(Var.java:532)
        at clojure.main.main(main.java:37)

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3n://some/bucket/elephantdb/global-conf.clj,
expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
        at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
        at elephantdb.common.config$read_clj_config.invoke(config.clj:40)
        at elephantdb.common.config$read_global_config.invoke(config.clj:76)
        at elephantdb.keyval.core$_main.invoke(core.clj:246)
        at clojure.lang.Var.invoke(Var.java:419)
        at user$eval5.invoke(form-init7480547441345866642.clj:1)
        at clojure.lang.Compiler.eval(Compiler.java:6619)
        at clojure.lang.Compiler.eval(Compiler.java:6609)
        at clojure.lang.Compiler.load(Compiler.java:7064)
        at clojure.lang.Compiler.loadFile(Compiler.java:7020)
        at clojure.main$load_script.invoke(main.clj:294)
        at clojure.main$init_opt.invoke(main.clj:299)
        at clojure.main$initialize.invoke(main.clj:327)
        at clojure.main$null_opt.invoke(main.clj:362)
        at clojure.main$main.doInvoke(main.clj:440)
        at clojure.lang.RestFn.invoke(RestFn.java:421)
        at clojure.lang.Var.invoke(Var.java:419)
        at clojure.lang.AFn.applyToHelper(AFn.java:163)
        at clojure.lang.Var.applyTo(Var.java:532)
        at clojure.main.main(main.java:37)

Clearly I have the understanding of a potato, and I have naively misconfigured this somehow. I suspect the issue is with the entry "fs.default.name" "s3n://hdfs" but I have no idea what the correct entry for that would be... What is the correct way to configure and start a cluster of elephantdb-server processes that are pointing to S3 for their HDFS storage?

Any advice is much appreciated.

Thanks,
Phil
Reply all
Reply to author
Forward
0 new messages