Question about PPML feature of Analytics Zoo

64 views
Skip to first unread message

Chanshin Kwak

unread,
Aug 6, 2021, 3:58:12 AM8/6/21
to User Group for BigDL and Analytics Zoo
Hello

While trying the PPML demos of analytics zoo, I come up with question!

In Trusted Big Data ML case, it just distribute the data to the workers, and launch LibOS to run application. And it does not include the process to encrypt the data  (in this demo, the MNIST data)before distribution, and decrypt it "inside" SGX enclave to use it. Which means, in my opinion, the data is exposed to untrusted environment.

Is this intentional? How can I use encrypted data for analtyics zoo ppml?


Thank you!

Qiyuan Gong

unread,
Aug 7, 2021, 9:36:55 AM8/7/21
to User Group for BigDL and Analytics Zoo
Hi

Thank you for raising this issue! 

Yes. It's intentional. As you feel confused about this part, I think we need to add explanation for this party. :)

How we protect data used in PPML workload?

In real-world big data application, file encryption is transparently handled by file system, e.g., amazon S3 (https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html) and HDFS (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html) etc. Files stored on encrypted file zone are encrypted by file system, and they can be transparently read by client or workers. Our demo is based on this background. In this settings, we just place dataset, e.g, mnist, in pre-configured encrypted zone. Then, PPML workload can read them transparently

If existing file systems don't support Transparent Encryption, we need to encrypted data before feed them to PPML workload. Meanwhile, we will have to add more settings, such as key configurations (we protect models in this way), access control etc for safety. That's a bit far away from our main idea.

Have a nice day!
Qiyuan

glor...@gmail.com

unread,
Aug 8, 2021, 10:44:46 PM8/8/21
to User Group for BigDL and Analytics Zoo
In addition, you can also leverage secured transfer protocol HTTPS to transfer data in and out of analytics zoo PPML.
Reply all
Reply to author
Forward
0 new messages