categoryDistinct and ratedDistinct are coming back as I would expect, but the countryDistinct and awardDistinct are not, they are coming back in a nested sub array record form.
Hi,
This is because cast is an array, and when the group access it as cast.country and cast.award those fields are an array value of each of the respective fields. Thus, adding those array values to a set would return an array of arrays.
You can try to insert an $unwind stage after the $match and before the $group stage. For example:
db.movies.aggregate([
{"$match": {"released": {"$gte": 1984, "$lte": 1988}}},
{"$unwind": "$cast"},
{"$group": {
"_id": null,
"categoryDistinct": {"$addToSet":"$category"},
"ratedDistinct": {"$addToSet":"$rated"},
"countryDistinct": {"$addToSet":"$cast.country"},
"awardDistinct": {"$addToSet":"$cast.award"},
}
}
])
I know I didn’t specify a sort anywhere so if you can give a pointer that will not impact speed, that would be greatly appreciated
Similarly, you can insert a $sort stage after the $unwind and before $group to sort the documents before the grouping stage. Any extra operations will impact speed, although at various tolerable degrees. Based on the use case and requirements, you can then perform adjustments. i.e. scale up, change document schema, etc.
You may also find Aggregation Pipeline Optimization a useful reference.
Regards,
Wan.
MongoDB Enterprise > show collections
ucwmovies
MongoDB Enterprise > db.ucwmovies.createIndex( { "released" : 1 } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
MongoDB Enterprise > db.ucwmovies.createIndex( { "released" : -1 } )
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
Hey Bob!
When I ran separate db.collection.distinct() on each of these fields, it was actually faster to run multiple distinct queries rather than the aggregation pipeline. I know that db.collection.distinct() is a limited / special purpose aggregation pipeline function
Hi,
Data in MongoDB has a flexible schema. Flexible schema allows you to focus on your application design and let the database design conform for the benefit of the application. If your application has a requirement for specific queries, it would be beneficial to alter the document schema to boost the query performance.
See Data Modelling Introduction and Data Model Examples and Patterns
for more information and examples.
User is limited to the amount of indexes, I think it was somewhere around 64, some power of 2. This was a drawback to me because if the app has dynamic queries where it allows the user to search by various criteria (9-ways-from-Sunday)
As you have mentioned in the thread, excessive number of indexes may affect write operation performance. This is because for every write operation, those indexes would have to be updated as well (not rebuild entirely).
Depending on the use case of your application, you can always ensure the application add an indexed field to limit the scope of the query and simplify query operations. See also Indexing Strategies.
You are creating an ascending and descending index on the year. I thought I read that mongodb was smart enough to know read an index forward or backward
Yes, you don’t need both ascending or descending indexes on a Single Field Index because MongoDB can traverse the index in either direction. Although this may not be the case with Compound Indexes.
When you created the indexes, it looked like you used the mongo command line tool and it was probably a blocking/synchronous operation that waited until completion before you got the command line prompt back?
There are two types of index creation, foreground and background. The default for mongo shell is to create in the foreground (blocking). Users have the option to specify background to build an index in the background so the operation does not block other database activities. See also db.collection.createIndex()
So with roughly 100k documents, with indexes on the released year, running distincts on the fields separately seemed to be around 6 seconds.
I would suggest to perform cursor.explain() on the queries to analyse further what’s causing the slowness. See also Explain Results.
Also, you could perform some monitoring on the server to identify bottlenecks. See also MongoDB Monitoring Tools
Regards,
Wan.