Best practice to run development and production DAGs on the same environment

387 views
Skip to first unread message

hec...@iihnordic.com

unread,
Jun 18, 2018, 8:24:03 AM6/18/18
to cloud-composer-discuss
Hello everyone,

I'm looking for a simple and clean workflow to keep prod and dev versions of the same DAG running on the same Airflow environment. I'd like to be able to test the dev version of the code online, and deploy it into production when ready, without making changes to the dev code.

For single-file DAGs I have moved the location of dev/prod databases and other resources into json-formatted airflow variables, and I can deploy the dev code just by changing the name of the file. 

But this is unmanageable for versioning DAGs with several folders. My best (failed) attempt so far was creating a 'prod' and 'dev' subfolders inside the 'dags' folder. The issue is that the references to subfolders break with this solution (ImportError: No module named ...). I'd like to avoid adding dev/prod to the relative paths of the modules, or adding dev/prod or version numbers to the name of the files.

Can anybody recommend an alternative solution? 

Thanks in advance.
Héctor
Message has been deleted

Bob Muscovite

unread,
Jul 28, 2020, 10:43:56 AM7/28/20
to cloud-composer-discuss
Greetings Hector,

I have also zeroed in on a similar solution in this situation. However I encounter the same issue with my module imports not working if I put my DAG modules into an environment subfolder. They do work if I place the module in the root, i.e. dags/module works fine with all imports functioning normally, but dags/dev/module is broken. Any suggestions on how to fix this would be appreciated.

Best regards,

Boris
Message has been deleted

Bob Muscovite

unread,
Jul 28, 2020, 11:06:26 AM7/28/20
to cloud-composer-discuss
Greetings again,

To elaborate a little on the setup I have:

module/
    __init__.py
   dag.py
   utils/
       accessory_functions.py
       __init__.py

This entire folder structure is uploaded as is to gcs/dev on the bucket.
Now, in the dag.py I was accessing the module as follows:

from module.utils.accessory_functions import *

This returned a ModuleNotFoundError on the GUI but also when I logged into the Scheduler and went to /gcs/dags/dev/module and ran `python dag.py`.

Now I have switched to a relative import:

from utils.accessory_functions import *

This still returns a ModuleNotFoundError on the GUI, yet I am able to run the file via `python dag.py` without this error, which further confounds me.

Best regards,

Boris

Stefano Giostra

unread,
Jul 28, 2020, 11:14:21 AM7/28/20
to cloud-composer-discuss
Hi,
by my experience i can use modules, but dag object (DAG class) must be on the root of the path of the code.
Example:

from airflow import DAG
from lib import bb_airflow_utils, bb_utils
from lib.bb_dict_keys import *
from common.facile_ass_common_data import flow_common_data
# ----------------------------------------------------------------------------------------------------------------------


dag_name
= flow_common_data[FLOW_K]
yesterday
= bb_utils.get_yesterday()
dag_args
= bb_airflow_utils.get_dag_args(flow=flow_common_data[FLOW_K], owner='Facile.it', dag_start_date=yesterday)
with DAG(dag_id=dag_name, default_args=dag_args, schedule_interval='15 06 * * *') as dag:
    af_wrk
= bb_airflow_utils.AirflowTask.by_flow_dict_data(dag, flow_common_data)
    af_wrk
.run_dag_load_daily(yesterday.strftime("%Y%m%d"))



Message has been deleted

Bob Muscovite

unread,
Jul 28, 2020, 11:39:53 AM7/28/20
to cloud-composer-discuss
Looking into it further, it seems I can successfully import the module if I do:

from dev.module.utils.accessory_functions import *

So looks like to import with my structure the module path for the imported module has to be complete from the top level directory, which is gcs/ in this context.
This is suboptimal of course. It looks like in my particular setup (I generate the completed dag.py file from a template, substituting an environment parameter) I can leverage a dynamic import in order to do what I want like so:

accessories = importlib.import_module(f"{env.lower()}.module.utils.accessory_functions")

Where `env` is the environment variable I template in. Of course with this approach I cannot use a wildcard import, but this is more of a good thing than a bad thing.
Reply all
Reply to author
Forward
0 new messages