SFTP/SCP operators for Sensor and File Transfer?

1,580 views
Skip to first unread message

Daniel Jin

unread,
Dec 7, 2015, 2:42:41 PM12/7/15
to Airflow

Our organization is still new in using Airflow.

From the api reference site: http://pythonhosted.org/airflow/code.html#operators, I noticed there is no sensor or transfer regarding sftp/scp, or even local files? Currently, most of our input feed files requirements are using sftp/scp: We need to check if a marker files existing before downloading the real feed files. The download will by sftp or scp.


I would think this shall be a basic feature. Are there any matured sftp/scp sensor or transfer operators? Or shall we develop our own ones by extend BaseSensorOperator or GenericTransfer?



Maxime Beauchemin

unread,
Dec 8, 2015, 12:31:36 AM12/8/15
to Airflow
The closest thing would be this:
http://pythonhosted.org/airflow/code.html#airflow.contrib.hooks.SSHHook

But I don't think it's what you're looking for. You could write PythonOperator that would use the paramiko lib or fabric, or call scp as a subprocess. A reusable sensor would be useful. I think the reason we don't have that is that most people will use HDFS or S3 for persistence and to move files across machines.

Note that you probably want your tasks to get the file locally and move it somewhere else as part of that one tasks since there's no guarantee that the next task will be executed on the same worker.
Reply all
Reply to author
Forward
0 new messages