Is it possible ?
Hi Hemanta,
Yes, as long as the cluster has network access to your MongoDB instance.
I would also recommend to check out MongoDB Connector for Spark
For Python example, you could load data from your MongoDB instance into a Spark DataFrame as below:
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
See also Spark Connector Python Guide for examples and tutorials.
But my problem is that I want to transfer my whole data to the spark cluster.
Once you have loaded the collection data to Spark’s RDD or DataFrame, you could store into your HDFS.
See also: pyspark.RDD and pyspark.sql.DataFrame
Regards,
Wan.