Composer recommended folder structure

1,671 views
Skip to first unread message

t...@vindicotech.com

unread,
Sep 26, 2018, 2:12:09 PM9/26/18
to cloud-composer-discuss
Hi,

Do you guys have any recommended for Composer folder/directories structure? The way it should be structured is different from the way our internal Airflow server is using right now.

- plugins/: Stores your custom plugins, operators, hooks
- dags/: store dags and any data the webserver needs to parse a dag.
- data/: Stores the data that tasks produce and use.

This is an example of how I organize my dags folder:

├── dags

│   ├── dag_1.py

│   ├── dag_2.py

│   ├── project_1

│   │   ├── dag_1.py

│   │   ├── dag_2.py

│   │   └── sql

│   │       ├── sql_1.sql

│   │       └── sql_2.sql

│   └── support

│       └── keys

│           └── gcp-keys

│               ├── key-01

│               │   └── gcp_key_01.json

│               └── key-02

│                   └── gcp_key_02.json

└── data


I had trouble before when I put the key.json file in the data folder and the dags cannot be parsed using the keys in the data/ folder?

Would the performance of the scheduler be impacted if I put the supported files (sql, keys, schema) for the dag in the dags/ folder? Is there a good use case to use the data/ folder?

It would be helpful if you guys can show me an example of how to structure the composer folder to support multiple projects with different dags, plugins and supported files.

Right now, we only have 1 Github for the entire Airflow folder. Is it better to have a separate git per project?

Thanks,
Tuan

Anthony Brown

unread,
Sep 27, 2018, 4:31:34 AM9/27/18
to cloud-compo...@googlegroups.com
Personally, I prefer having separate git repos per project. We have CI/CD pipelines to automatically release code changes when it is checked in, and different projects have different release cycles, so having the code in different repos allows us to easily manage this.
We upload all dags and plugins directly to the root of dags and plugins folders.
The only downside I have discovered is we have to be careful with naming of dags and plugins to ensure they do not clash between projects

We manage keys in the connection properties rather than storing them in a folder - ie we put the contents of the json file in the keyfile json property of the connection.
For schemas used when loading into bigquery, we store these in the same bucket as data files being loaded and reference that in the GoogleCloudStorageToBigQueryOperator operator that we call. We have the schemas in the same git repo as the DAG, but upload to the correct bucket as part of the CI/CD pipeline

For .sql files that need to be run against bigquery, you should be able to put these in the data folder, but I have not had to use that yet


--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/58300ae0-08cf-4641-83af-a8fcd4b4c972%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
-- 

Anthony Brown
Data Engineer BI Team - John Lewis
Tel : 0787 215 7305

**********************************************************************
This email is confidential and may contain copyright material of the John Lewis Partnership.
If you are not the intended recipient, please notify us immediately and delete all copies of this message.
(Please note that it is your responsibility to scan this message for viruses). Email to and from the
John Lewis Partnership is automatically monitored for operational and lawful business reasons.
**********************************************************************

John Lewis plc
Registered in England 233462
Registered office 171 Victoria Street London SW1E 5NN

Websites: https://www.johnlewis.com
http://www.waitrose.com
https://www.johnlewisfinance.com
http://www.johnlewispartnership.co.uk

**********************************************************************

Reply all
Reply to author
Forward
0 new messages