Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Question about globus compute and conda environments

8 views
Skip to first unread message

Anthony Weaver

unread,
Feb 25, 2025, 11:03:55 AMFeb 25
to Discuss
I'm working on getting a globus compute enpoint set up for use with a globus flow.  The compute function I'm working on needs a conda environment. Note: For purposes of this discussion I am executing the python code directly from the command-line of the compute endpoint (e.g., python mycode.py) and mycode.py is using Executor to submit my function for execution

In my config.yaml I have
worker_init: |
      conda activate wildlife

but in my Python code it does not seem to recognize anything about this environment.
For example, the environment has an environment variable $DATADIR  In my python code I tried os.popen("echo $DATADIR").read() to echo the value, but nothing gets printed. Similarly if I also try os.popen("which run_detector).read() to show the primary executable in that environment, nothing prints. I even tried os.popen("conda activate wildlife") first, with no luck.

I am able to use os.popen("echo") to successfully echo out other things, including values of some python variables in my code so I don't think it's a problem with the python code per se.  Also, from the command line I can  conda activate wildlife then see both $DATADIR and the path to run_detector.

What might I be missing here?

As always, thank you immensely for your help

Tony


Anthony Weaver

unread,
Feb 25, 2025, 1:41:38 PMFeb 25
to Discuss, Anthony Weaver
Solution found:
os.popen (or for that matter subprocess.run) start new sub-processes that don't inherit the conda environment

You have to string your various commands together into a single popen or subprocess command.  For example:

def run_cameratrap(image_dir: str = '', output_dir: str = '',
                   json_file: str = '', threshold: str = '',
                   batch_size: str = ''):
    import subprocess
    import os

    base_dir = os.environ.get("DATADIR")

    images = os.path.join(base_dir, image_dir)
    output_dir = os.path.join(base_dir, output_dir)
    threshold = "--threshold " + threshold
    batch_size = "--batch_size " + batch_size

    init_cmd = 'eval "$(conda shell.bash hook)";'
    activate_cmd = " conda activate wildlife;"
    detector_cmd = "run_detector " + images + " " + output_dir + " " + json_file + " " + threshold + " " + batch_size
    final_cmd = init_cmd + activate_cmd + detector_cmd

    output = subprocess.run(final_cmd, capture_output=True, text=True, shell=True)
    return output

-------------------------------------------------------------
Comments about this solution

1. init_cmd has to be as given above.  conda init did not work.
    We also found that when we were using conda on our compute cluster, users
    had to put the same thing in their job scripts because conda init did not work in that situation either.

2. Initially we had the DATADIR environment variable set when the wildlife environment was activated
    but since we need it before the string of commands that actually runs conda we moved the
    variable into config.yaml in the worker_init section by doing export=DATADIR=/mydirectory

Hopefully someone from Globus sees this information and gets it into the online documentation
Reply all
Reply to author
Forward
0 new messages