




Sir I am using
1. MongoDB 3.4.5
2. Spark 2.1.1
3. Hadoop 2.6.4
4. pymongo_spark
In terminal I am using the command
./bin/pyspark --jars mongo-hadoop-spark.jar --jars mongo-java-driver.jar --driver-class-path mongo-hadoop-spark.jar --driver-class-path mongo-java-driver.jar --py-files /home/hduser/mongo-hadoop/spark/src/main/python/pymongo_spark.py --conf "spark.mongodb.input.uri=mongodb://
127.0.0.1/marketdata.minbars?readPreference=primaryPreferred" --conf "spark.mongodb.output.uri=mongodb://
127.0.0.1/marketdata.people" --packages org.mongodb.spark:mongo-spark-connector_2.10:1.1.0
I have put both the jar files in Spark home and for pymongo_spark I set the path for pymongo_spark.py . I am using Spark in a cluster of 3 nodes (1 master and two slaves) . I am running the above command from the name node terminal .
Spark python shell create successfully.
I want to import a collection stored in my minbars collection from marketdata database to an RDD and save back the RDD contents to another collection people in marketdata database
in my pyspark shell I ran this .
>>> import pymongo_spark
>>> pymongo_spark.activate()
>>> rdd = sc.mongoRDD('mongodb://
127.0.0.1:27017/marketdata.minbars')
>>> rdd.saveToMongoDB('mongodb://
127.0.0.1:27017/marketdata.people')
while I call rdd.saveToMongoDB() , it gives me the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/spark-5faf68c7-4628-4fb4-924a-1254338f73eb/userFiles-fe40d5dc-634c-4e28-98c4-c92424576732/pymongo_spark.py", line 26, in saveToMongoDB
sample = self.first()
File "/usr/local/spark/python/pyspark/rdd.py", line 1366, in first
raise ValueError("RDD is empty")
ValueError: RDD is empty
I have also attached my collection snapshots . my output collection "people " is empty here.