Hi,
Our team is deciding whether MongoDB might be a good fit for our application. One of the use cases is querying for the predecessor commits of a given commit in a version control system, taking into account merges, branch creations, etc.... We have a commits collection that has a field that stores the parent commits for that given commit. Each commit except for the very first commit has at least one parent. Mongo's $graphLookup function works nicely here. However, we've hit two limitations. Maybe with your expertise you can help us out:
One is if we start at the say 50, 000 commit and we try to walk back the commit tree, we exceed the 16mb document size:
db.commits.aggregate([{ $match: { "_id": "5aeadb26b2ac4334dc138e5b@polaris_dev/50000" } }, {$graphLookup: {"from": "commits", "startWith": '$parent', "connectFromField": "parent", "connectToField": "_id", "as": "predecessors"}}]).pretty()
assert: command failed: {
"ok" : 0,
"errmsg" : "BSONObj size: 16852277 (0x1012535) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: \"5aeadb26b2ac4334dc138e5b@polaris_dev/50000\"",
"code" : 10334,
"codeName" : "Location10334"
} : aggregate failed
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:16:14
assert.commandWorked@src/mongo/shell/assert.js:403:5
DB.prototype._runAggregate@src/mongo/shell/db.js:260:9
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1212:12
Maybe there's a way to squeeze more into the document if we just store the commit IDs in the 'predecessors' array as opposed to the whole commit record. I'm not sure how to do that yet; I'm still refamiliarizing myself with MongoDB. . However, is there way for the $graphLookup to return something so that each commit record or part of one doesn't have to be in a single document?
Another issue is that if the $graphLookup function traverses enough records, it exceeds a 100mb pipeline limitation:
db.commits.aggregate([{ $match: { "_id": "5aeadb26b2ac4334dc138e8d@polaris_dev/2554999" } }, {$graphLookup: {"from": "commits", "startWith": '$parent', "connectFromField": "parent", "connectToField": "_id", "as": "predecessors"}}]).pretty()
assert: command failed: {
"ok" : 0,
"errmsg" : "$graphLookup reached maximum memory consumption",
"code" : 40099,
"codeName" : "Location40099"
} : aggregate failed
_getErrorWithCode@src/mongo/shell/utils.js:25:13
doassert@src/mongo/shell/assert.js:16:14
assert.commandWorked@src/mongo/shell/assert.js:403:5
DB.prototype._runAggregate@src/mongo/shell/db.js:260:9
DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1212:12
It looks like there is a allowDiskUse parameter but it doesn't work with the $graphLookup function..... I also see this issue:
Because of these issues I might start leaning towards an Oracle solution. Apparently Oracle SQL provides a 'CONNECT BY' operator which can traverse a graph similar to the $graphLookup. However, my team wanted me to open an official support case, in case there are tricks to getting Mongo to work. If there aren't, then we might not have to start over with our design.
Thanks,
Ryan