Spark MongoDb

25 views
Skip to first unread message

ayan guha

unread,
Jun 19, 2018, 5:41:29 AM6/19/18
to mongodb-user
Hi Guys

I have a large mongodb collection with complex document structure. I an facing an issue when I am getting error as

Can not cast Array to Struct. Value:BsonArray([])

The target column is indeed a struct. So the error makes sense.

I am able to successfully read from another collection with exactly same structure but subset of data. 

I am suspecting some documents are corrupted at mongodb.

Question:
1. Is there any way to filter out such documents in mongodb connector?
2. I tried to exclude the column from a custom select statement but did not work. Is it possible?
3. Is there any way to suppress errors to a certain amount? I do not want to stall the load of 1M record if 1 record is bad. 

I am using 2.2.2 (Latest from maven)

best
Ayan

ayan guha

unread,
Jun 20, 2018, 1:22:43 AM6/20/18
to mongodb-user
Anybody? Anything? :) 

Wan Bachtiar

unread,
Jun 21, 2018, 12:11:40 AM6/21/18
to mongodb-user

I am suspecting some documents are corrupted at mongodb.

Hi Ayan,

It’s possible that there are few documents in the collection that doesn’t have the same value types as you expected. (inconsistent values)
You can try to utilise $type to query and find those documents.

I tried to exclude the column from a custom select statement but did not work. Is it possible?

You can exclude certain fields to be mapped by defining a schema. See also Explicitly declare a schema

Is there any way to suppress errors to a certain amount? I do not want to stall the load of 1M record if 1 record is bad.

If you’re referring to errors where a field have inconsistent value types, you can try to define the field for the schema to be nullable.
Although this is more related to Apache Spark itself, I would suggest to post a question on StackOverflow:Spark to reach wider audience.

Regards,
Wan.

ayan guha

unread,
Jun 21, 2018, 1:23:08 AM6/21/18
to mongod...@googlegroups.com
Thanks a lot. I will give it a shot by excluding the offending field from the schema itself. 

it is not a spark issue (though I have posted in Spark user group for wider reach), as the connector should support predicate pushdown or error suppression, to be honest.

Best
Ayan

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b29a4a4c-734f-426a-be3b-373c3db492d0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Best Regards,
Ayan Guha
Reply all
Reply to author
Forward
0 new messages