Hi Thomas
Usage of Regular expressions in this type of form for the aggregation framework has been an issue for some time now and is still yet to be resolved. There are however alternate approaches.
1. With MongoDB 3,4 or later you can possibly use
$indexOfCP or possibly $indexOfBytes depending on your actual use case. This can actually compare and project a matched index value from a field expression. You can either do this in separate projection or a $redact pipeline stage, or even using $expr from MongoDB 3.6 or later where available.
{ $match: {
$expr: {
$ne: [
-1
]
}
}}
The basic premise being that the aggregation operator returns the matched index position of the comparison string ( or field expression ) or -1 where not present. If needed then other aggregation expressions such as
$toLower can be used to normalize the case of strings for comparison.
Note that this is not a replacement for full Regular Expression functionality, but is a reasonable approach for simply matching comparison strings within different properties of a document.
2. Again depending on your actual use case, then JavaScript expression evaluation can also be applied. This can be done within MapReduce where actual aggregation is required or simply applied within a $where clause of a standard query:
.find({
$where: function() {
return regex.test(this.clues); // test the expression against the other field
}
})
Note that in other languages such as Perl the argument applied to
$where is typically a "string" which would be applied as a code block in execution of the BSON statement or some drivers even provide a wrapper for such a JavaScript code block. But typically it's just a string.
Note also that there may be constraints on your deployment which could possibly block the usage of JavaScript expressions in execution on the server, and where possible it's generally better to use the aggregation approach as you can.
The open issue for this is on
SERVER-11947 and has been open for some time, though there is quite a bit of input on it. The wider issue is for usage of more "Regular expression specific" features. But for the general comparison cases you are putting forward here the two existing approaches outlined above should be sufficient.
Of course if this proves to be compute intensive, then consider placing flags within the document to indicate where terms in the two document properties are actually matching as you store them, instead of computing at run time.