How DagBag is loaded

813 views
Skip to first unread message

jaso...@blueapron.com

unread,
Oct 16, 2015, 2:28:50 PM10/16/15
to Airflow
I noticed a pretty brittle and undocumented part of how DagBag's are loaded.  According to https://github.com/airbnb/airflow/blob/f8046512172f37ffde72fa99d787fc0e609f70ab/airflow/models.py#L166, if there isn't a term 'DAG' in the source file, it won't be able to recognize any valid DAG definition files. In my particular case, I've implemented a module that abstracts some of the default arguments into a factory method.  However, I don't explicitly refer to the string 'DAG' anywhere in the definition file, which will cause the system to not recognize it.

Wouldn't it be better to rely on reflection to see if a DAG type has been defined vs. a literal string check?

Maxime Beauchemin

unread,
Oct 16, 2015, 5:40:44 PM10/16/15
to Airflow
Hi,

The idea there was to avoid trying to systematically import (and execute) all Python files in the folder. I figured it was only a matter of time before someone dropped a large library in there or some script that just runs without a `if __name__ == '__main__':` clause. 

I'm not against removing it as it was a hacky safeguard. 

The real solution is to deprecated or offer an alternative to the `os.walk` approach and move to explicitly filling in the DagBag in code.

Max

jaso...@blueapron.com

unread,
Oct 17, 2015, 1:23:51 PM10/17/15
to Airflow
Thanks Max, that makes a lot more sense now.
Reply all
Reply to author
Forward
0 new messages