Mount google cloud storage to KubernetesPodOperator

1,729 views
Skip to first unread message

jasper...@gmail.com

unread,
Mar 8, 2019, 10:50:40 AM3/8/19
to cloud-composer-discuss
Hi all, I am new to Airflow and also cloud composer. I would like ask a question that how can I share data between KubernetesPodOperator by using google cloud storage (which is mounted by workers from gke).

Feng Lu

unread,
Mar 20, 2019, 3:54:14 AM3/20/19
to jasper...@gmail.com, cloud-composer-discuss
You could write/read directly to the Composer managed GCS bucket by using gsutil or the cloud-storage library.
Please make sure that the GKE cluster you used to run KubernetesPodOperator or GKEPodOperator has access to the Composer bucket. 

On Fri, Mar 8, 2019 at 7:50 AM <jasper...@gmail.com> wrote:
Hi all, I am new to Airflow and also cloud composer. I would like ask a question that how can I share data between KubernetesPodOperator by using google cloud storage (which is mounted by workers from gke).

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/de58135f-1776-44a9-b390-d3f54fb1f6c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jasper...@gmail.com

unread,
Mar 20, 2019, 6:05:49 AM3/20/19
to cloud-composer-discuss
Hi Feng,

Thanks for reply. It is fine to use gcs directly but using "/home/airflow/gcs/data" is more convenient if possible. 

Jasper

Feng Lu於 2019年3月20日星期三 UTC+8下午3時54分14秒寫道:
You could write/read directly to the Composer managed GCS bucket by using gsutil or the cloud-storage library.
Please make sure that the GKE cluster you used to run KubernetesPodOperator or GKEPodOperator has access to the Composer bucket. 

On Fri, Mar 8, 2019 at 7:50 AM <jasper...@gmail.com> wrote:
Hi all, I am new to Airflow and also cloud composer. I would like ask a question that how can I share data between KubernetesPodOperator by using google cloud storage (which is mounted by workers from gke).

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.

Jerry Morrison

unread,
May 31, 2019, 1:14:13 AM5/31/19
to cloud-composer-discuss
Agreed!

Q. Is there a way to get Composer to mount "/home/airflow/gcs/data" into the container? KubernetesPodOperator has a `volume_mounts` argument but I don't see any documentation on whether it can mount GCS and what the name and sub_path parameters mean.

Q. If not, is the only alternative adding `gcsfuse` to the container image, running it in the container, and being limited to 10GB container storage?

Stefanos Mousafeiris

unread,
Jul 9, 2019, 8:15:19 AM7/9/19
to cloud-composer-discuss
Has anyone figured out the best way to go about mounting the cloud storage? I'm running tasks on separate KuberenetesPodOperators and need a way to pass the intermediate data between them. The gcsfuse approach seems to be the only viable one as the volume_mounts documentation is quite sparse.

Berk Coker

unread,
Mar 21, 2020, 9:08:05 PM3/21/20
to cloud-composer-discuss
Has anyone figured this out? I would like to do the same thing

Jerry Morrison

unread,
Mar 22, 2020, 12:32:14 AM3/22/20
to cloud-composer-discuss
We gave up on Cloud Composer after this roadblock. Cloud Composer also seemed overcomplicated and underdocumented.

So we adapted FireWorks, which is straightforward, flexible, and in widespread use. The only server it needs is a MongoDB instance. But nobody was running FireWorks in Google Cloud so we wrote a Firetask that can run a payload in a Docker container, fetch its input files from GCS, store its output files to GCS, and log to StackDriver.

It's not yet documented for external use, and it could use some unit tests, but it works fine and it's not much code.

Marcel Gongora

unread,
Apr 25, 2020, 6:46:29 AM4/25/20
to cloud-composer-discuss
Use google-cloud-storage [1], it should be enough for most of the requirement that you have. Create a staging bucket with proper permissions to the service account you used to provision composer environment:

gcloud composer environments describe <ENV_NAME> --location=<REGION> --format='value(config.nodeConfig.serviceAccount)'

[1]
Reply all
Reply to author
Forward
0 new messages