--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d7127bbb-7f8f-4294-a596-c42a7121d39d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
val rdd = MongoSpark.load(sc)
rdd.count()
I throws a exception like
com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast OBJECT_ID into a ConflictType (value: BsonObjectId{value=5570245373af6caaed4efe02}
Hi all,
I think we can leverage the lazy mechanism to handle the dynamic schema during reading phase. However, due to the limit support in Mongo-Spark-Connector, we can't write RDD which supports dynamic schema to Mongo.
How can I get the support from MongoDB-Spark?
*Dynamic Schema Challenge*
----Workaround for Read phase, completed
1. read Mongo documents to DF
2. dump data to Json String
3. transfer it to TD Spark application
----Blocking issue in Write phase, pending on Mongo Spark team
For write, we parse the string to dynamic schema dictionary into RDD, however we can't push it to connector without transfer to DataFrame.
I think we need to consulting with Mongo Spark Team, once Mongo Spark can support RDD writing, we can migrate all codes to Python.
Issue History:
1. RDD approach has been deprecated in mongo-hadoop project March 2016.
RDD saveAsNewAPIHadoopFile which used to write data into MongoDB has been deprecated.
rdd.saveAsNewAPIHadoopFile(
path='file:///this-is-unused',
outputFormatClass='com.mongodb.hadoop.MongoOutputFormat',
keyClass='org.apache.hadoop.io.Text',
valueClass='org.apache.hadoop.io.MapWritable',
conf=
)
Announced @: https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage
2. objectID issue: resolved.... when converting to DataFrame found: "TypeError: not supported type: <class 'bson.objectid.ObjectId'>"
tracking by: https://jira.mongodb.org/browse/HADOOP-277
Schema related issues:
3. StructType issue: "com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast ARRAY into a StructType"
tracking by: https://groups.google.com/forum/#!topic/mongodb-user/lQjppYa21mQ
4. repartition issue:
"Cannot cast ARRAY into a StructType(StructField(0,StringType,true), StructField(1,StringType,true), StructField(2,StringType,true), StructField(3,StringType,true), StructField(4,StringType,true)) (value: BsonArray{values=[BsonString
, BsonString
{value='Logic ICs'}]})"
tracking by: https://groups.google.com/forum/#!topic/mongodb-user/lQjppYa21mQ
Hi Jason,
I believe this question has also been posted and responded on SPARK-146.
Please keep the discussion there for continuity.
Regards,
Wan.
com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast STRING into a IntegerType (value: BsonString{value=’userId’})
Hi Murtaza,
This is likely that the field was found to contain different data types that cannot be coerced into a unifying type. In other words, the field userId
contains varying types of data. e.g. integers and strings.
Note that in MongoDB Connector For Spark v2 the base type for conflicting types would be in strings.
If you have further questions, please open a new discussion thread with your relevant environments below:
Regards,
Wan.