Hi,
Thank you for raising this question! :)
Q: What is the difference between distributed training in Orca and GPU/CPU in keras or pytorch
A: The main difference is that Orca use Apache Spark to handle distributed data pipeline, while Keras and Pytorch use Python or self-design modules to handle this part. This difference brings 2 advantages:
1. Ability to handle very large data set 1TB/PB or even more dataset stored on hundreds/thousand servers.
2. Training/inference on large data set with high efficiency.
These 2 advantages in Orca empower end user with big data to build & add AI/Deep learning applications to their existing big data cluster.
Q:The dvantage of BigDL
BigDL has lots of features/modules, let's focus on Orca first.
Orca enable end user/developers to build their prototype in laptop (with tensorflow or pytorch) and migrate this prototype (with orca) to their big data cluster without code change & data move.
Advantages of Orca:
1. Reduce code change and debug pipeline.
2. Re-use existing big data cluster (apache spark). Avoid large scale data movement.
Have a nice day!
Qiyuan