Enter code here...val event_coll_schema = new StructType().add("first_name", DataTypes.StringType, nullable = true).add("middle_name", DataTypes.StringType, nullable = true).add("last_name", DataTypes.StringType, nullable = true)
I tried to apply the following filter.
Enter code here...val df = sqlContext.read.mongo(event_coll_schema);df.filter(col("middle_name").isNotNull)
The above filter cannot filter out documents which contains null middle_name.Then I tries to apply the following filterEnter code here...val df = sqlContext.read.mongo(event_coll_schema);df.filter(col("middle_name")!=="null")the above filter works!!!So looks like the null string is being recognised as "null" string.Does anyone have the same issue?
I am using mongo-spark-connector-1.1.1
Hi danyang,
Do you mean mongo-spark v1.1.0 ? The next release up is version 2.0.0.
In my mongo collection, there is a field called “middle_name” and value could be null or a string. I manually defined the collection schema like the following:
I’ve ran a test under environment of mongo-spark v1.1.0, MongoDB v3.2.x, and Apache Spark 1.6.2, where there are three documents in collection names as below:
{"first_name": "Feisty", "middle_name": "String", "last_name": "Fawn"}
{"first_name": "Lucid", "middle_name": "String", "last_name": "Lynx"}
{"first_name": "Maverick", "middle_name": null, "last_name": "Meerkat"}
Using Scala code example below:
> val readConfigNames: ReadConfig = ReadConfig(Map("uri"-> "mongodb://host:27017/dbName.collName"))
> val schema = new StructType().add("first_name", DataTypes.StringType, nullable=true).add("middle_name", DataTypes.StringType, nullable=true).add("last_name", DataTypes.StringType, nullable=true)
> val names = sqlContext.read.mongo(schema, readConfigNames)
> names.filter(col("middle_name").isNotNull).foreach(print)
[Feisty,String,Fawn][Lucid,String,Lynx] // filtering out 'Maverick'
It's successfully able to query the value of middle_name that is not Null. You may also find some of the examples on mongo-spark: spark-sql useful.
Does anyone have the same issue?
If you have further question, could you provide the following information:
mongo-sparkregards,
Wan.