I am building python image recognition model in Spark / BigDL. I work on ImageNet dataset. My problem is how to build RDD for machine learning in BigDL. I combed the internet and all examples I can find python/spark/(BigDL), somehow omit the process of creating the dataset out of images.
Is there any ready DL script on python/spark/(BigDL)/imageNet or similar dataset so I can reverse engineer. Or some examples of how to build RDD from ImageNet. Maybe a chunk of code.
(There is an example of RDD out of the mnist dataset. It is 10MB and loaded straight to the RDD, I am not sure if this is the way with ImageNet).
What is the proper way to build RDD from ImageNet (or similar) dataset?
Maybe it is a vanilla question but at this point, my thinking is experimenting with RDD build from 200GB of data, wait and learn that it was all wrong, would be a massive waste of resources.