Hi Shubham,
You can start by following the documentation on MongoDB Spark Connector.
If you already have a Spark standalone and a MongoDB instance running, you could start by testing by invoking spark-shell
as below example:
spark-shell --conf "spark.mongodb.input.uri=mongodb://<HOST>:<PORT>/<DATABASE>.<COLLECTION>" --conf "spark.mongodb.output.uri=mongodb://<HOST>:<PORT>/<DATABASE>.<COLLECTION>" --packages org.mongodb.spark:mongo-spark-connector_<SCALA_VERSION>:<MONGODB_SPARK_CONNECTOR_VERSION>
(Replace the values in <
and >
with something relevant for your environment)
If you’re using Scala, you could run below simple example code to read from a MongoDB collection:
val rdd = MongoSpark.load(sc)
println("Number of documents read from collection : " + rdd.count)
If you have any specific question, please provide:
You may also find the Spark quickstart docs a useful reference for basic operations in Spark.
Regards,
Wan.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.propertiesSetting default log level to "WARN".To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).17/07/06 11:53:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable17/07/06 11:54:09 WARN ObjectStore: Version information not found in metastore.hive.metastore.schema.verification is not enabled so recording the schema version 1.2.017/07/06 11:54:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException17/07/06 11:54:10 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectExceptionWelcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.1 /_/
Using Python version 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016 20:42:59)SparkSession available as 'spark'.
When i run spark shell as you have specified, spark loads up but shows following log messages:
Hi Shubham,
Those are warning messages, you should still be able to use MongoDB Spark Connector from the (Python) Spark shell.
Are you having specific issue with the MongoDB Spark Connector via Spark shell ? are you seeing error messages ? If so, please provide the action that you're attempting and the error message.
If you want to, you could also change the logging level by changing conf/log4j.properties
file. See also log4j v1.2 Level
Regards,
Wan.