The documentation on the python connector seems to indicate that the mongo documents to be read into Spark using the python connector must have a defined schema.
Hi,
You don’t have to define a schema. For example, in PySpark you can execute as below:
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource")
.option("spark.mongodb.input.uri", "mongodb://host:port/dbname.collection")
.load()
# Print first record
df.first()
# Get RDD from Dataframe
myRDD = df.rdd
I’ve tried out the MongoDB Spark connector and have run into issues with the python connector.
If you have an issue using MongoDB Spark connector (Python), please provide:
Regards,
Wan.