Sorting on Arrays

33 views
Skip to first unread message

Need Help

unread,
Sep 3, 2015, 6:28:19 PM9/3/15
to mongodb-user
We have a situration that we are sorting on the f.a.refgene.g field, which is of type List<String>:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> db.VARIANTKB1.find({"_id.p":2706398}, {"_id":1, "f.a.refgene.g":1})
{ "_id" : { "c" : 1, "p" : 2706398, "t" : [ "NOCALL" ] }, "f" : [ {  }, { "a" : { "refgene" : [ { "g" : "TTC34" } ] } }, { "a" : { "refgene" : [ { "g" : "TTC34" } ] } }, {  }, { "a" : { "refgene" : [ { "g" : "TTC34" } ] } } ] }
> db.VARIANTKB1.find({"_id.p":324100}, {"_id":1, "f.a.refgene.g":1})
{ "_id" : { "c" : 1, "p" : 324100 }, "f" : [ { "a" : { "refgene" : [ { "g" : "LOC100132287" }, { "g" : "LOC100132062" }, { "g" : "LOC100133331" } ] } }, {  }, { "a" : { "refgene" : [ { "g" : "LOC100132062" }, { "g" : "LOC100133331" }, { "g" : "LOC100132287" } ] } }, { "a" : { "refgene" : [ { "g" : "LOC100133331" }, { "g" : "RP4-669L17.10" } ] } }, {  }, { "a" : { "refgene" : [ { "g" : "LOC100133331" }, { "g" : "RP4-669L17.10" } ] } } ] }

By sorting ascendingly, we expect {"_id":{"c":1,"p":32100}} to come first, because the values are ["LOC100132287", "LOC100132062", "LOC100133331",  "LOC100132062", "LOC100133331", "LOC100132287", "LOC100133331", " RP4-669L17.10", "LOC100133331", "RP4-669L17.10"], where as the values for {"_id":{"c":2706398}} are ["TTC34", "TTC34", "TTC34"].

However, mongodb returns just the opposite.
--------------------------------------------------------------------------------------------------------------
> db.VARIANTKB1.find({}, {"_id":1}).sort({"f.a.refgene.g":1}).limit(4)
{ "_id" : "meta" }
{ "_id" : { "c" : 23, "p" : 2699520, "ep" : 154931044, "t" : [ "CNV" ] } }
{ "_id" : { "c" : 1, "p" : 2706398, "t" : [ "NOCALL" ] } }
{ "_id" : { "c" : 1, "p" : 324100 } }

How does mongodb sort Arrays?  Is there any workaround to get desired results? 

Thanks in advance! 

Wan.Bachtiar

unread,
Sep 9, 2015, 10:28:25 PM9/9/15
to mongodb-user
Hi there,  


Based on the data examples you have provided, the fields "f" and "refgene" are arrays. 
With arrays as nested fields, you need to specify the index of the array elements to refer to them. 
Therefore, instead of sorting using "f.a.refgene.g", it should have been "f.<index of array>.a.refgene.<index of array>.g", for example: f.0.a.refgene.0.g. 

This approach will not work as expected if any of the arrays or elements are missing in a document. Missing values will be treated as null.

For reference on how MongoDB compares different field types, please see: Comparison/Sort Order. For predictability you should sort on simple types (i.e. strings) rather than arrays or objects.

I would suggest adding or using another field to sort. A field that is not a nested array will be easier to index and give a clearer outcome.


As an example of this suggestion, perhaps you could add a field called "prominentGene". 

{

 
"_id": {
   
"c": 1,
   
"p": 2706398,
   
"t": [
     
"NOCALL"
   
]
 
},

 
"prominentGene": "TTC34",

 
"f": [
   
{},
   
{
     
"a": {
       
"refgene": [
         
{
           
"g": "TTC34"
         
}
       
]
     
}
   
},
     
{
     
"a": {
       
"refgene": [
         
{
           
"g": "TTC34"
         
}
       
]
     
}
   
},

 
]
}



Regards, 

Wan

Reply all
Reply to author
Forward
0 new messages