Python virtual env for azkaban executor server

244 views
Skip to first unread message

trent....@trunkclub.com

unread,
May 29, 2018, 6:19:45 PM5/29/18
to azkaban
Hi All -

For our production azkaban environment, I'd like to use a python virtual environment for every call to python in an azkaban job. This creates a problem because we have many development azkaban jobs running that do not use a virtual environment that we would like to deploy to prod.

So I would prefer NOT to do something like: 
command=source path/to/venv/bin/activate & python myjob.py
type
=run_over_ssh_on_different_server #just a sample

In the .job files since we already have many, many .job files running various types of myjob.py files running in development. Since we already have jobs using custom job types, we also cannot create a custom job type. 

Instead I'd like to start the azkaban executor server using a virtual environment. In our startup shell scripts for the server, this would look something like: 
#!/bin/bash
# Shell script that starts the azkaban execution server on startup
# Executed as root
cd
/opt/azkaban/azkaban-exec-server/build/install/azkaban-exec-server
source path
/to/venv/bin/activate
. bin/azkaban-executor-start.sh #should inherit the virtual environment?
deactivate
#should not effect the child process


My understanding is that the executor server will be started as a child process containing the python virtual environment. 

I'm asking on a forum if you think this will work because this is a very expensive operation to test. It would require a pull request, someone else to review, then we would need to bounce the server and wait ~30 minutes to an hour to verify, then run a test in azkaban and see if the virtualenv is working.

So, will the execution server pickup the change that the call to activate makes? I have seen child processes mask environment variables in the past. 



Rex Gibson

unread,
May 30, 2018, 11:00:59 AM5/30/18
to azkaban
Fundamentally I wouldn't like this approach because the first time you have two workflows that depend on different library versions, you have a problem and you are back to your first solution.

We use a bash wrapper functions that are in a standard library that is included in all projects. (we also use a fork of the python project builder library
Then we have setup functions add a bash script that is setup_venv.sh and run_in_venv.sh the reason for this is that the power of bumping libraries sits with the developer not the systems engineer. 
So we have a workflow node that sets up the venv that has an argument for the venv and a path to a requirements.txt file. Then the bash file sets up the venv. The venv name is a property in global.properties the run_in_venv.sh has an venv path argument passed in from the runtime property and the python command escaped as a string that the runs.

Attached files. 

Then a job file: 
$cat setup_virtualenv.job
command=bash setup_venv_with_dependencies.sh ${LOCAL_VENV} requirements.txt https://pypi.knewton.net/simple
dependencies=IMPORT_STUDENT_INTERACTION_NEW
type=command

and a run
command=bash run_command_in_venv.sh ${LOCAL_VENV} "python process_sharded_table.py student_interactions_new_sharding.yml"
dependencies=setup_virtualenv
type=command

Hope thats helpful. 

We also have an approach of this to this that is a fork of aforementioned python builder that when a project is compiled by the builder we have a special set of compile time variables that get replaced in any scripts. That is easier to develop with because we find occasional Azkaban pipeline developers have a difficult time keeping straight run time Azkaban variables from bash variables AND compile time variables. The create-variables-for-different environment approach at build time we find the most easy to understand, but requires quite a bit of helper code. 
setup_venv_with_dependencies.sh
run_command_in_venv.sh

trent....@trunkclub.com

unread,
May 30, 2018, 12:42:16 PM5/30/18
to azkaban
Hmm you are are absolutely right about workflows using difference library versions. I suppose I will have to go with the first solution! Thanks Rex.
Reply all
Reply to author
Forward
0 new messages