Mongodb in spark- write to mongodb

1,168 views
Skip to first unread message

Meenakshi

unread,
Jul 7, 2018, 2:29:28 AM7/7/18
to mongodb-user

Hello,

I have created a data frame in pyspark shell. When I tried to write to the output uri in mongodb using the save() method it causes error.
Am working in Ubuntu. Installed Spark and working with python and would like to connect to mongodb to read and write to mongodb.


Following is the command used:

>>>people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

where people is already loaded.
Following are the error:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/meenakshi/spark/python/pyspark/sql/readwriter.py", line 593, in save
    self._jwrite.save()
  File "/home/meenakshi/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/home/meenakshi/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/meenakshi/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o51.save.
: java.lang.NoClassDefFoundError: com/mongodb/ConnectionString
    at com.mongodb.spark.config.MongoCompanionConfig$$anonfun$4.apply(MongoCompanionConfig.scala:278)
    at com.mongodb.spark.config.MongoCompanionConfig$$anonfun$4.apply(MongoCompanionConfig.scala:278)
    at scala.util.Try$.apply(Try.scala:192)
    at com.mongodb.spark.config.MongoCompanionConfig$class.connectionString(MongoCompanionConfig.scala:278)
    at com.mongodb.spark.config.WriteConfig$.connectionString(WriteConfig.scala:37)
    at com.mongodb.spark.config.WriteConfig$.apply(WriteConfig.scala:158)
    at com.mongodb.spark.config.WriteConfig$.apply(WriteConfig.scala:37)
    at com.mongodb.spark.config.MongoCompanionConfig$class.apply(MongoCompanionConfig.scala:124)
    at com.mongodb.spark.config.WriteConfig$.apply(WriteConfig.scala:37)
    at com.mongodb.spark.config.MongoCompanionConfig$class.apply(MongoCompanionConfig.scala:113)
    at com.mongodb.spark.config.WriteConfig$.apply(WriteConfig.scala:37)
    at com.mongodb.spark.sql.DefaultSource.createRelation(DefaultSource.scala:81)
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.mongodb.ConnectionString
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 38 more




Can any one help?

Wan Bachtiar

unread,
Jul 11, 2018, 3:41:43 AM7/11/18
to mongodb-user

Caused by: java.lang.ClassNotFoundException: com.mongodb.ConnectionString

Hi Meenakshi,

Based on the error you’ve posted, looks like it has failed to find a MongoDB module com.mongodb.ConnectionString.
How did you load the MongoDB Spark Connector ?

For example:

./bin/pyspark --conf "spark.mongodb.input.uri=mongodb://127.0.0.1/test.collectionInput" \
              --conf "spark.mongodb.output.uri=mongodb://127.0.0.1/test.collectionOuput" \
              --packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.3

where people is already loaded.

Did you load people from MongoDB or from another source?

Could you also provide the following :

  • MongoDB Spark Connector version
  • Apache Spark version

See also MongoDB Spark Connector: Python writing to MongoDB

Regards,
Wan.

Meenakshi V

unread,
Jul 12, 2018, 8:05:09 PM7/12/18
to mongod...@googlegroups.com
Hello,

Thank you for your response.

Spark Version: 2.2.1 built for Hadoop 2.7.3
mongo spark connector version: 2.2.3

The above mentioned version are used.

people is created using the createDataFrame() function.

./bin/pyspark --conf "spark.mongodb.input.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" \
              --conf "spark.mongodb.output.uri=mongodb://127.0.0.1/test.myCollection" \
              --packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.3
is used to start the pyspark shell command line.
Since I am new to this platform, kindly guide me through this.

Thank you,

Meenakshi V


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/261c07da-13e3-40d0-9a2e-f6d70f3b42f3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
meenakshi

Manish Pal

unread,
Jul 13, 2018, 7:39:02 AM7/13/18
to mongod...@googlegroups.com
hi,

try by changing the configuration of 'mongo spark connector' to 2.0.0
.
./bin/pyspark --conf "spark.mongodb.input.uri=mongodb://127.0.0.1/test.collectionInput" \
              --conf "spark.mongodb.output.uri=mongodb://127.0.0.1/test.collectionOuput"
 \
              --packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0
thanks

Meenakshi

unread,
Jul 15, 2018, 1:12:44 AM7/15/18
to mongodb-user
Hi,

Am getting the same problem with save() method when I changed to 2.0.0

Hope to solve it soon.

Thank you,
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.



--
meenakshi

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

paulo Leite

unread,
Jul 15, 2018, 8:43:08 AM7/15/18
to mongod...@googlegroups.com
Hi,

With save it’s a bit different, because document size has a limit of 16Mb. 

Have you got nested documents ? Have you tried removing those nested documents out into a new collection ?

Depending on what you want to do, there’s also GridFS.


In my case I am querying the database with an aggregation , and it is failing somewhere in a phase probably holding  data for the result in a single document , even using allowDiskUse and out().



For more options, visit https://groups.google.com/d/optout.
--
Cumprimentos,

Paulo Leite
Reply all
Reply to author
Forward
0 new messages