is the output of PostgresOperator or MysqlOperator task explicitly `return`ed? How to SELECT from a database.

4,545 views
Skip to first unread message

William Cox

unread,
Dec 14, 2015, 2:47:20 PM12/14/15
to Airflow
I'm puzzling over how to select information from a postgresql (or mysql) database. For instance, if I have a task like:

t1 = PostgresOperator(
task_id='count_table_lines',
sql=
"""
SELECT COUNT(*) FROM schema.table
""",
postgres_conn_id='postgres_conn_name',
autocommit=True,
dag=dag,
)


does t1 implicitly return the output of the query? A cursory check of the code would seem to indicate, "no". If that's the case, then in order to SELECT data from a database table, I will need to create my own operator - something akin to MySqlToHiveTransfer?

Thanks.
-William

Arthur Wiedmer

unread,
Dec 14, 2015, 3:18:21 PM12/14/15
to Airflow
Hi William,

You are correct, operators do not return their results. The goal is for the data processing to be done on the system (MySql, Postgres, Hadoop) and for transfer operators to be used if you need to get data across systems.

Depending on what you want to achieve you have different options. If you are producing a results table, then an insert overwrite into a (possibly partitioned) table might be your best bet. You can then transfer those results using a transfer operator you have written.

There is a facility called Xcom, in which you can communicate small amounts of metadata between tasks. For instance, if the rowcount is something that the next task can use as a parameter, you could use an Xcom to store it and the next task can then retrieve it. More info here : https://pythonhosted.org/airflow/concepts.html#xcoms and here for an example of how to use Xcom: https://github.com/airbnb/airflow/blob/master/airflow/example_dags/example_xcom.py

Best,
Arthur  
Reply all
Reply to author
Forward
0 new messages