Dataproc Unable to Save Jupyter Notebooks Encrypted with CMEK

69 views
Skip to first unread message

Joshua Herrera

unread,
May 7, 2020, 10:04:02 AM5/7/20
to Google Cloud Dataproc Discussions
I'm encountering a strange issue using Dataproc that seems inexplicable. After following the documentation guide located here: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/customer-managed-encryption, I cannot save .ipynb files encrypted with a CMEK. 

I have a key within a keyring, which is in the same zone as the bucket I wish to use as the Dataproc Staging Bucket.
The Compute Engine Default Service Account and Compute Engine Service Agent both have encrypt/decrypt permissions.
The key has been authorized with the Cloud Storage Service Account.

I set the Staging Bucket to have bucket-wide encryption using the CMEK I authorized. Files I upload within this bucket are now encrypted with the CMEK. When I create a cluster with this bucket as the staging bucket, I can read this encrypted data, and write data to this bucket which is encrypted by the bucket-wide encryption. The problem is that I cannot save my work in progress with jupyter.

With the classic jupyter notebook, and error appears in the top when attempting to save and create a checkpoint:

Screen Shot 2020-05-06 at 17.49.23.png



I realized at the end of the day that none of my work is saving. 

Within the JupyterLab environment, the error is identified as:

File Save Error for [Notebook].ipynb
Invalid response: 500

Screen Shot 2020-05-06 at 17.51.07.png



After, deleting the notebooks/ and .ipynb_checkpoints/ folders within the staging bucket, and removing bucket-wide encryption. I can create and save notebooks as normal.

Why is saving an .ipynb file throwing errors, but .csv files can be written to the same bucket with the same encryption without issue?

I've tried using a different CMEK within the same keychain and this error is still present.
Is there an additional service account that needs encrypt/decrypt permissions beyond those outlined in the documentation guide? The guide suggests doing exactly as I did here as number 4 in Dataproc>Documentation>CMEK

"To use CMEK on the Cloud Storage bucket used by Cloud Dataproc to read/write cluster and job data, create a bucket with CMEK. Note: Use the key created in Step 1 when adding the key on the bucket. Then, pass the bucket name to the gcloud dataproc clusters create command when you create the cluster."

Any help or insight into this issue would be greatly appreciated!





Joshua Herrera
Pandera Systems, Data Scientist








Reply all
Reply to author
Forward
0 new messages