mongodb pyspark

252 views
Skip to first unread message

Aishwarya Lingannagari

unread,
Jul 26, 2018, 9:13:31 PM7/26/18
to mongodb-user
Hello
 Can any one tell me how to establish conncetion betwen Pyspark and Mongodb
and store the pyspark dataframe in Mongo

Wan Bachtiar

unread,
Jul 30, 2018, 2:06:02 AM7/30/18
to mongodb-user

Can any one tell me how to establish conncetion betwen Pyspark and Mongodb

Hi,

To get started with MongoDB Spark Connector and Python please see Spark Connector Python Guide

If you have further specific questions, please provide:

  • MongoDB server version
  • MongoDB Spark Connector version
  • Apache Spark version
  • Python version
  • How are you connecting to MongoDB
  • What’s the error that you’re getting

Regards,
Wan.

Aishwarya Lingannagari

unread,
Jul 30, 2018, 5:10:54 AM7/30/18
to mongodb-user
Hi ,
Python 3.6.5 
spark -version 2.3.1
MongoDB server version: 4.0.0

can u tell me  how to download the mongodb-spark-connector package.i have downloaded the jar file mongo-spark-connector_2.11-2.1.2  nd moved it to jars folder of spark 
i want the procedure how to connect mongodb with spark.

how to  store the pyspark dataframe in the mongodb 

Wan Bachtiar

unread,
Jul 30, 2018, 9:21:12 PM7/30/18
to mongodb-user

can u tell me how to download the mongodb-spark-connector package.

Hi,

Depends on how you’re trying to do, to get started with Python Spark Shell you can just specify --packages option on pyspark to download MongoDB Spark Connector package. Again, please review Spark Connector Python Guide.

i want the procedure how to connect mongodb with spark

You can specify --conf option on pyspark to set spark.mongodb.input.uri value. Here’s a list of input configuration.

how to store the pyspark dataframe in the mongodb

Please see MongoDB Spark Connector: Write To MongoDB fore more information.

Regards,
Wan.

Aishwarya Lingannagari

unread,
Jul 31, 2018, 1:45:36 AM7/31/18
to mongod...@googlegroups.com
Hi 
Where should i specify the --package option.
i tired in the pyspark cmd i have got name error.

please check the attachment below.where i executed.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c0fa6195-95a2-4956-9362-e370d02ef9b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Screenshot (83).png

Wan Bachtiar

unread,
Aug 1, 2018, 9:14:11 PM8/1/18
to mongodb-user

Hi Aishwarya,

That’s because you’re specifying --package within the PySpark shell. The parameter option should be specified when you’re invoking PySpark.

Within the Spark Connector Python Guide that was also mentioned previously, you should be able to see an example:

./bin/pyspark --conf "spark.mongodb.input.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" \
              --conf "spark.mongodb.output.uri=mongodb://127.0.0.1/test.myCollection" \
              --packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.3

Please review the suggested resources from the previous posts.

Regards,
Wan.

Aishwarya Lingannagari

unread,
Aug 3, 2018, 1:36:08 AM8/3/18
to mongod...@googlegroups.com
Hi Wan,

Thank you got the output.☺
Reply all
Reply to author
Forward
0 new messages