Status of the Java client in version 3

Ari Miller

unread,

Jun 12, 2018, 2:07:47 PM6/12/18

to genie

Is the Java client for genie still in use/still relevant at Netflix?

If not, what is the pattern when accessing genie from Java (or Groovy, or Kotlin)? e.g. direct REST calls, command line calls to the python client, etc?

Context:

I noticed that the python genie client has been broken out, and is now featured in the demo: https://netflix.github.io/genie/docs/3.3.12/demo/

The version 3 java client still exists in source, but it doesn't appear there are any demos on how to use it.

Version 2 of genie apparently had some documentation: https://liviutudor.com/2015/04/09/using-the-netflix-genie-client-in-java/#sthash.t33mTjCz.dpbs -- that includes a specific reference to the samples directory.

https://github.com/Netflix/genie/commits/7c7ec2216af30ce85e01dfb11ba76efb5f57210d/genie-client/src/main/java/com/netflix/genie/client/sample/ExecutionServiceSampleClient.java

Thanks,

Ari

Python client sample from the demo (is there any java equivalent):

# Create a job instance and fill in the required parameters

job = pygenie.jobs.HadoopJob() \

.job_name('Genie Demo HDFS Job') \

.genie_username('root') \

.job_version('3.0.0')

# Set cluster criteria which determine the cluster to run the job on

job.cluster_tags(['sched:' + str(sys.argv[1]), 'type:yarn'])

# Set command criteria which will determine what command Genie executes for the job

job.command_tags(['type:hdfs'])

# Any command line arguments to run along with the command. In this case it holds

# the actual query but this could also be done via an attachment or file dependency.

job.command("dfs -ls input")

# Submit the job to Genie

running_job = job.execute()

print('Job {} is {}'.format(running_job.job_id, running_job.status))

print(running_job.job_link)

# Block and wait until job is done

running_job.wait()

print('Job {} finished with status {}'.format(running_job.job_id, running_job.status))

Tom Gianos

unread,

Jun 12, 2018, 2:21:45 PM6/12/18

to genie

Hi Ari,

Yes the Java client is still used/supported. There may not be an example anymore cause we just defer to our tests for examples.

Here is some code for the latest release client submitting a job using the client:

https://github.com/Netflix/genie/blob/v3.3.12/genie-client/src/test/java/com/netflix/genie/client/JobClientIntegrationTests.java#L106

Let us know if that helps.

Tom

Ari Miller

unread,

Jun 12, 2018, 3:34:57 PM6/12/18

to tgi...@netflix.com, geni...@googlegroups.com

Thanks, Tom, that helped.

I attached the Java equivalent of the ./run_hdfs_job.py from the demo, based on the java client classes in the integration test you pointed me to.

It appears to result in the same output as run_hdfs_job.py test.

run_hdfs_job.py has an extra tag submitted to the REST api, {"tags":["type:hadoop"]}

I'm assuming that is a side effect of HadoopJob() in the python code.

I'm not sure if that is deliberate/a necessary component on top of the existing type:yarn tag that is already explicitly added.

Inlining the attached java code for easier searching:

import com.google.common.collect.Lists;

import com.google.common.collect.Sets;

import com.netflix.genie.client.JobClient;

import com.netflix.genie.client.configs.GenieNetworkConfiguration;

import com.netflix.genie.common.dto.ClusterCriteria;

import com.netflix.genie.common.dto.Job;

import com.netflix.genie.common.dto.JobExecution;

import com.netflix.genie.common.dto.JobMetadata;

import com.netflix.genie.common.dto.JobRequest;

import com.netflix.genie.common.dto.JobStatus;

import com.netflix.genie.common.exceptions.GenieTimeoutException;

import java.io.IOException;

import java.util.List;

import java.util.Set;

import java.util.UUID;

/**

* Does the equivalent of https://github.com/Netflix/genie/blob/master/genie-demo/src/main/docker/client/example/run_hdfs_job.py

* Assumes you've already gone through the demo directions (e.g. https://netflix.github.io/genie/docs/3.3.12/demo/) and have the containers running locally

*/

public class RunHdfsJob

{

// Hardcoding this for easy of use

private static final String TARGET_SCHED = "test";

// Set up by running the demo

private static final String BASE_URL = "http://localhost:8080/";

private static final String JOB_NAME = "Genie Demo HDFS Job";

private static final String JOB_USER = "root";

private static final String JOB_VERSION = "3.0.0";

private static final String JOB_DESCRIPTION = "Genie 3 Test Job";

public static void main(String[] args) throws IOException, InterruptedException, GenieTimeoutException

{

final GenieNetworkConfiguration genieNetworkConfiguration = new GenieNetworkConfiguration();

genieNetworkConfiguration.setReadTimeout(20000);

JobClient jobClient = new JobClient(BASE_URL, null, genieNetworkConfiguration);

final String jobId = UUID.randomUUID().toString();

final List<ClusterCriteria> clusterCriteriaList

= Lists.newArrayList(new ClusterCriteria(Sets.newHashSet("sched:" + TARGET_SCHED, "type:yarn")));

final Set<String> commandCriteria = Sets.newHashSet("type:hdfs");

final Set<String> configs = Sets.newHashSet();

final Set<String> dependencies = Sets.newHashSet();

final List<String> commandArgs = Lists.newArrayList(

"dfs",

"-ls",

"input"

);

final JobRequest jobRequest = new JobRequest.Builder(

JOB_NAME,

JOB_USER,

JOB_VERSION,

clusterCriteriaList,

commandCriteria

)

.withId(jobId)

.withCommandArgs(commandArgs)

.withDisableLogArchival(true)

.withConfigs(configs)

.withDependencies(dependencies)

.withDescription(JOB_DESCRIPTION)

.build();

final String id = jobClient.submitJob(jobRequest);

final JobStatus jobStatus = jobClient.waitForCompletion(jobId, 600000);

final Job job = jobClient.getJob(id);

final JobRequest jobRequest1 = jobClient.getJobRequest(jobId);

final JobExecution jobExecution = jobClient.getJobExecution(jobId);

final JobMetadata jobMetadata = jobClient.getJobMetadata(jobId);

System.out.println(String.format("Job %s finished with status %s", id, jobStatus.toString()));

}

--
You received this message because you are subscribed to the Google Groups "genie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genieoss+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

RunHdfsJob.java

Reply all

Reply to author

Forward