Remote execution of jobs inside the docker

rafael valenzuela moraleda

unread,

Jun 8, 2016, 12:59:35 PM6/8/16

to Pentaho Community

Hi all,

I'm using a docker for created a developer environment. But I have a question . I think is a simple question but I don't see the answer .

My intention is to do a remote execution of jobs , any idea? maybe with carter but any easier way?.

The problem with use Carter or use cluster mode is that the production environment doesn't use carter and I do not know if this affects ETLs.

Dan

unread,

Jun 8, 2016, 1:01:29 PM6/8/16

to pentaho-...@googlegroups.com

Have you looked at how Carte works?

Don't get confused between remote execution via carte, and clustering. 2 different things!

If you simply want to run on a known remote server, carte is your man. spin up a server and give it a try.

Dan

--
You received this message because you are subscribed to the Google Groups "Pentaho Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pentaho-commun...@googlegroups.com.
To post to this group, send email to pentaho-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pentaho-community/232eeb94-aac6-4250-8be1-ed5f6bca7324%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rafael valenzuela moraleda

unread,

Jun 8, 2016, 5:11:26 PM6/8/16

to Pentaho Community

Hi Dan,
Thank for your reply.My comments below.
Thanks

El miércoles, 8 de junio de 2016, 19:01:29 (UTC+2), Dan Keeley escribió:

Have you looked at how Carte works?

Yes one of the links is a Carte documentation .

Don't get confused between remote execution via carte, and clustering. 2 different things!

Yes i know They are two different things, but i want to see all possibilities

Dan

unread,

Jun 9, 2016, 2:22:15 AM6/9/16

to pentaho-...@googlegroups.com

Then I'm afraid I don't understand the question. The 'product' answer to remote execution is carte.

You need to give more info about why you don't want to use it?

Sent from my phone

To view this discussion on the web visit https://groups.google.com/d/msgid/pentaho-community/61c10180-46e5-490a-9840-0d3bb5270f10%40googlegroups.com.

rafael valenzuela moraleda

unread,

Jun 13, 2016, 1:14:41 PM6/13/16

to Pentaho Community

Hi Dan,
You're right!. I should have given more details .
I doing it a simple development environment with PDI,MySQl and Vertica. In others word download the dockers and ready for use don't need to configure anything or as little as possible.
For PDI, I using a Diethard's Docker. I have read the Carte documentation.

I know is the correct option to run a job or transformation from outside the container, but i have a couple questions:

I have the datadases connections and configuration's file. Do we need to shared a volume for this or i need to create the configurations file inside the container?.
can i to run a job via command or is mandatory using spoon for that?

Thanks alot.

Dan

unread,

Jun 17, 2016, 2:23:12 AM6/17/16

to pentaho-...@googlegroups.com

I believe but would have to check, if your connections are all parameterised, it sends the values from kettle.properties from whereever you kick off your job, over to the slave. There may even be a checkbox for that.

of course you can do it on the commandline, but not in the way you think. Rather annoyingly you have to call a job, which calls your remote job (so just a start entry, execute job on remote carte server and success). The trick to make it commandline friendly is to make the remote job name a parameter.

To view this discussion on the web visit https://groups.google.com/d/msgid/pentaho-community/c676a2fb-a096-43e4-9b00-aa20386b31bb%40googlegroups.com.

rafael valenzuela moraleda

unread,

Jul 20, 2016, 11:17:21 AM7/20/16

to Pentaho Community

Hi community ,

We are trying to create a unified environment for BI in the company in order to make much easier to create / test ETLs for the developers. Basically the idea is to have a docker environment (docker-compose) with a reduced replica of what we have in production so, we can assure that everything that works in this docker should run in production.

Inside docker we have setup pdi 6.1 using carte, and some db images. At first everything looked good, being able to run simple ETLs on the containers, having access to the databases, etc. but when we tried a real one everything crashed.

The first problem we found was that carte wasn't able to load the config for the ETL which was referenced as a file, even using the option "send resources to this server" in spoon. After some researching we saw that this option packs referenced jobs and transforms but not other files (sql, config files, etc) so that wasn't an option.

The next thing we tried was create a file repository in spoon, so we can share a "common storage" between our machine and the container. Everything looked good at first, but as we already had a directory hierarchy to classify our ETLs, i don't know why spoon automagically duplicated the job / transformation into the repository's root directory even if the job was already inside the repo. As we wanted to keep our directory structure, we moved to the next try.

After these trials, the next idea we had was to create an external volume in the docker container with all our sources, and try to find a way to map the path in our local machine to the one used inside the container.

At first we tried to mount it in the same folder having a mapping 1 to 1. (both in /home/XXXX/ETLs, for example) but we found this was still crashing because carte was trying to find dependant files in /. We didn't know why that was happening, but after some more research we found we were using internal.job.filename.directory everywhere and this is local to each spoon / carte environment and we didn't see another built-in option to share this.

Finally the solution we found and seems to be working fine, was to create a variable inside kettle.properties which contains the path were all our ETLs are. inside the docker is going to be pointing to our shared volume, and in the user's computer will be pointing to his ETL path. After that we had to modify all the ETLs to use our new ETL_PATH variable.

I suppose there is an easier / less intrusive way to make this work, as modifying all the jobs doesn't seems to be the way to go, but we didn't find it. We would appreciate so much if you can point us in the right direction.

Thanks a lot,

Enrico Maria Carmona

unread,

Jul 20, 2016, 12:34:13 PM7/20/16

to pentaho-...@googlegroups.com

Hi Rafael,

I use a similar environment with docker structured as follows:

The ETL files are structured with a main job and use relative references. For large projects of DWH I use the KFF framework. Each project is stored in a bitbucket repository. To launch the ETL the project is cloned into a docker volume (I created a docker environment to automate the updates from the repository, even private https://bitbucket.org/enricomariam42/docker-hgrepos-cloner).

In another docker volume is cloned the KFF framework.

The kettle execution environment is created in a temporary disposable container with the sole task of performing ETL processes. I do not use Carte (which is the default container command) but do a sh command with the appropriate parameters.

The docker image of kettle is similar to that which you are using, derived from abtpeople/pentaho-di. You can find my Dockerfile in https://hub.docker.com/r/enricomariam42/pentaho-di/~/dockerfile/.

In this way the execution of an ETL procedure consists in two docker commands:

1) Clone or update the ETL project in a dedicated volume

2) Execution of an ever new kettle image containers with the appropriate command line.

Real example:

# Create KFF container

docker run --volumes-from sshkey --name kff -e BITBUCKET_USER=enricomariam42 -e BITBUCKET_REPOS=kff#kff-kettle5 enricomariam42/hgrepos-cloner

# Create dwh-costi container (my ETL repository)

docker run --volumes-from sshkey --volumes-from kff --name dwh-costi -e BITBUCKET_USER=enricomariam42 -e BITBUCKET_REPOS=dwh-costi#kettle6 -e REPOS_DIR=/root/repos/kff/projects/hsg enricomariam42/hgrepos-cloner

# Next time update with:

docker start -i dwh-costi

# Launch kettle6 KFF dwh-costi --name di-kff-dwh-costi (execute ETL process)

docker run --rm -h PRD-server --volumes-from dwh-costi enricomariam42/pentaho-di sh /root/repos/kff/projects/hsg/dwh-costi/kff_kitchen_launcher.sh dummy.kjb

In your environment if you don't use KFF you better do

sh /path/to/kitchen.sh -file=/path/to/my/batch_launcher.kjb -param:my_param1="my_param_value" -param:my_param2="$my_param2"

Feel free to copy (hemmm, be inspired by) my work you can find in https://bitbucket.org/enricomariam42/ and https://hub.docker.com/u/enricomariam42/

Hope this help.

Enrico

To view this discussion on the web visit https://groups.google.com/d/msgid/pentaho-community/732e3f9b-f315-42d7-9a8f-e1d5a3160bd9%40googlegroups.com.

Reply all

Reply to author

Forward