Alluxio on DCOS, write/read from remote host on cluster

519 views
Skip to first unread message

Kyro Zetera

unread,
Oct 12, 2016, 5:20:15 PM10/12/16
to Alluxio Users
I'm trying to get alluxio running on DCOS. Currently, I have a master and single worker up, but I am having trouble reading a file from another machine on the DCOS cluster.

The local machine does not have an Alluxio worker running. I have seen other threads here suggesting this is possible.
I am testing with the java client. My sample code:

package com.testapp;

import alluxio.client.file.FileSystem;
import alluxio.AlluxioURI;
import com.typesafe.config.ConfigFactory;

object TestApp {
    def main(args: Array[String]) {
        val uri = ConfigFactory.load("application.conf").getString("uri.value")

        val fs = FileSystem.Factory.get();
        val path = new AlluxioURI(s"alluxio://<remote master ip>:19998/test.csv");
        val in = fs.openFile(path)
    }
}

But I get this error:

Exception in thread "main" alluxio.exception.ConnectionFailedException: Failed to connect to FileSystemMasterClient master @ <local machine ip>:19998 after 29 attempts
    at alluxio.AbstractClient.connect(AbstractClient.java:186)
    at alluxio.AbstractClient.retryRPC(AbstractClient.java:322)
    at alluxio.client.file.FileSystemMasterClient.getStatus(FileSystemMasterClient.java:183)
    at alluxio.client.file.BaseFileSystem.getStatus(BaseFileSystem.java:175)
    at alluxio.client.file.BaseFileSystem.getStatus(BaseFileSystem.java:167)
    at alluxio.client.file.BaseFileSystem.openFile(BaseFileSystem.java:260)
    at alluxio.client.file.BaseFileSystem.openFile(BaseFileSystem.java:254)
    at com.koddi.testapp.TestApp$.main(TestApp.scala:13)
    at com.koddi.testapp.TestApp.main(TestApp.scala)


I'm not sure why it's attempting to connect to a service on the local node when given a remote host.
Do I need to install anything additional on the local node? Or any additional dependencies in my app? I'm pulling in the dependency with the following in build.sbt:

("org.alluxio" % "alluxio-core-client" % "1.2.0")
          .exclude("commons-beanutils", "commons-beanutils-core")
          .exclude("commons-collections", "commons-collections")
          .exclude("commons-logging", "commons-logging")
          .exclude("org.apache.hadoop","hadoop-yarn-common")

Bin Fan

unread,
Oct 13, 2016, 3:04:16 AM10/13/16
to Alluxio Users
Hi Kyro,

The error you see is because the your application doesn't really pick up the correct hostname and port (from the message "Exception in thread "main" alluxio.exception.ConnectionFailedException: Failed to connect to FileSystemMasterClient master @ <local machine ip>",  it is still connecting to the local machine.

To solve this problem you need to configure the correct master address correctly. One way is to create `alluxio-site.properties` (you can find examples in your $ALLUXIO_HOME/conf/ dir) with alluxio.master.hostname set, then put this file into your CLASSPATH of your application. The hostname name embedded in your URL "alluxio://<remote master ip>:19998/" is not respected in your case. In fact, I think the hostname:port part in URL more or less for future use. For now, every configuration should go through properties.


- Bin

Kyro Zetera

unread,
Oct 19, 2016, 11:39:52 AM10/19/16
to Alluxio Users
Thanks Bin Fan, that clears it up. It's very confusing that it accepts that URI format, but ignores it. It works perfectly using the properties file. Thank you.

Another issue we've run into deals with running alluxio in docker containers, behind an internal marathon-lb instance. If anyone has done this before, it would be a great help to hear the setup.

We have one master and one worker and when I run the above code, it will now hit the master correctly through the load balancer, but the master returns the docker instance id for the worker, then the client attempts to connect to that.
Of course this fails as it can't resolve that host name. We are calling it from within a VPC on AWS and due to the nature of DCOS and marathon, the IPs will not remain static in the case of failover, so in order to hit these instances, we have to send the requests through the load balancer to access them from within the VPC.

Is there a workaround for this use case, or any way to get this running on DCOS/Marathon? I saw some Mesos specific setup information in the docs, but did not think that applied here because of the marathon containerization. But maybe there's some configuration I'm missing here.

Perhaps our Dockerfile should kick off vagrant with the mesos framework?

and...@alluxio.com

unread,
Oct 20, 2016, 8:16:25 PM10/20/16
to Alluxio Users
Hi Kyro,

I agree that accepting the URI format but ignoring the hostname/port is super confusing. Do you mind creating a JIRA ticket for improving this?

Could you try setting the alluxio.worker.hostname configuration parameter to something that the client will be able to connect to? When it's unset it will default to whatever the system calls itself, which must be the docker instance id.

Hope that helps,

Andrew

Kyro Zetera

unread,
Oct 21, 2016, 2:39:36 PM10/21/16
to and...@alluxio.com, Alluxio Users
Thanks, Andrew,


I've created a JIRA here <https://alluxio.atlassian.net/browse/ALLUXIO-2396> for the URI issue.

Will get back to you once I test the configuration.

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/EvFLZ42KNV8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to alluxio-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages